In this fourth and last post in the cluster analysis series, we provide a couple of examples showing how cluster analysis operates in practice on player data, as well as how to present the results of cluster analysis, in order to make them actionable. Grab a cup of coffee and sit down in a comfy chair for this one!
We have talked ad libitum about the foundations of cluster analysis, the multitude of models and algorithms and the specific challenges associated with running such analysis on behavioral data from games. Given the heavy emphasis on human decision-making, running a cluster analysis can seem like a daunting prospect, the time has come for an example to show how this process can operate in practice. This should hopefully indicate that with a bit of preparation and reading, running a cluster analysis is possible even for non-specialists. By the way, this is also the case for prediction analysis, as documented by Dmitry Nozhnin.
We will in this post use examples from this paper, which focuses on the MMORPG Tera Online, and uses a baseline algorithm, k-means clustering, which we introduced in the second post in this series. In brief, k-means clustering tries to partition all the observations made into k clusters, in which each observation belongs to the cluster with the nearest mean value, which forms the prototype of that cluster (for a rigorous walkthrough of k-means consult Daniel MacKay’s book “Information Theory, Inference and Learning Algorithms”).
For additional examples, this paper looks at how to use neural nets for developing player profiles in a previous Tomb Raider game; and this paper employs cluster analysis as a function of time. Finally, here we show a visual example of players migrating between clusters as a function of time, using D3 and Sankey diagrams.
This post is going to be a bit extensive, as we want to properly document the different steps taking place before and after running the actual cluster algorithms on the data –expect about a lunch break’s time or so. We will not cover the math itself (generally we can let software handle the calculations), for that please refer to this paper and the references contained therein.
Let us start by introducing Tera Online. As mentioned in previous posts, understanding the game being analyzed is pretty vital to understanding the results of analytics processes. Tera (abbreviation for The Exiled Realm of Arborea) is an MMORPG that was released by Enmasse Entertainment in South Korea in January 2011, and in North America/Europe the following year. The game is currently free-to-play, and has typical MMORPG features such as a questing system, crafting and player vs. player action, as well as an integrated economy. Players generate one or more characters, which fall within one of seven races (e.g. Aman, Baraka, Castanic). In addition, players choose a class (e.g. Warrior, Lancer, Beserker), each tuned to specific roles in the game (e.g. having a high damage output or being able to absorb high amounts of damage).
A player can have multiple characters in Tera; therefore, the dataset will probably represent a number of players lower than the number of characters. From the perspective of behavior clustering, the discrepancy between number of players and characters is not important in this case, where we are interested in behavioral groups as a function of characters, not players.
The dataset from Tera is from the game’s open beta (character levels 1-32 only), and contains the following behavioral variables (or features in data mining terminology):
Quests completed: Number of quests completed
Friends: Number of friends in the game
Achievements: The number of achievements earned
Skill levels: Level in the Mining and Plants skill, respectively
Monster kills: The number of AI-controlled enemies killed by the character (combining small, medium and large monsters in one feature)
Deaths by monsters: The number of times the character has been killed by AI-controlled enemies
Total items looted: The total number of items the character has picked up during the game
Auctions house use: The combined number of times the character has either created an auction or purchased something from an auction
Character level: Ranges from level 1 to 32. We will in this example focus on level 32 players (if we just used all the players the cluster analysis will, of course neatly give us clusters that are level dependent, given how the values of the different variables changes with character level. I.e. a level 32 character will have completed, say 1000 quests, a level 1 character, 2).
Data preparation and analysis
Behavioral telemetry sadly has a tendency to be both dirty and noisy. On top of that, feature selection, as also discussed in the third post in this series, is potentially highly challenging. These are key problems we face as analysts trying to streamline analytics processes – and eventually remove the analyst as a link in the most processes (as recently discussed by Nils Pihl). In the current case, we carefully went through the potential behavioral features in the dataset and selected those close to the key mechanics of the game, as the goal here was to gain a general understanding of how the players handled their high level characters. We were notably interested in class imbalances, or if there were groups of players who performed very badly (or … suspiciously … well). Any incomplete records were removed and various types of analyses were performed on the data to find any weird outliers and to check the distribution of the data for each feature (behavioral variable).
Normalizing input data
A typical problem of behavioral analysis in games is the mixing of data types. This is incredibly common as soon as we want to analyze multiple variables (or behavioral features) at the same time, e.g. multiple regression. Doing so often requires the adoption or normalization strategies, which is however, a step in the analysis process that is often overlooked, with potential disastrous results.
This feeds back to the point we have made in earlier posts that these days it is easy to run e.g. a cluster analysis using standard analytics or statistics packages, but that these are not so automated processes. Data mining (still) requires human input and decision-making.
Normalization strategies have names such as Min-Max and variance normalization (or zero mean normalization, ZMN). This may sound arcane [analysts love making simple things sound complex ] but what it actually means is that we try to take data that varies in their ranges and types and convert them into a format that will not mess up the cluster analysis. For example, if we had a range of variables that ranged in value from 3-10, and one variable ranging from 0-1, the latter can have a disproportionate impact because it is a binary value. In this case, it can be an advantage to normalize all variables into a 0-1 range. The data mining literature is brimming with ways of normalizing data, but to take a few examples, ZMN normalizes the field values according to the mean and the standard deviation values. Min-max normalization transforms the data into a defined range normalized min value (α)and normalized max value (β). Seehere for more on these processes. While the choice or normalization strategy is very case dependent, we find that Min-Max normalization is pretty sensitive to outliers, so we recommend variance normalization when you are dealing with datasets with outliers. Below, we present the results using variance normalization (there was not much difference in the results using the two techniques).
A key element of human decision-making in a cluster analysis is deciding how to determine the number of clusters. The thing is that a cluster algorithm can give you as many or as few clusters as you want, up to the point where each data point is a cluster! It depends on the threshold value that we decide should determine when something is a cluster or not. There are ways to obtain an idea about the best number of clusters in an analysis, notably mean squared error estimates, cross validation and the popular Scree plots; but essentially, deciding how many clusters there are in a dataset is ultimately up to the human running the analysis. Sometimes it makes sense to use the number of clusters a mean squared error estimate tells us is the best “fit”, sometimes it gives us better information to use fewer or more clusters. It depends on the specific goal of the analysis. In practice, we will be trying out different numbers of clusters, to see what gives us the best and most interpretable results. Most analytics or statistical packages will generate several “fitness” plots automatically when you run a cluster analysis that helps enormously with interpretation. For Tera, irrespective of which level range we looked at, we found that 6-7 clusters provided the best fit. These clusters do not adhere to character classes, but rather to specific behavior ranges. This is perhaps not surprising given how much freedom you have as a player to impact the playstyle of any character class. More on this below.
Clusters of behavior in Tera
The k-means analysis resulted in 6 clusters. We see one cluster with the highest values across all or most of the behavioral variables, and another with abysmal values across the board. The remaining four clusters contain players who perform averagely but have different sets of high or low scores, i.e. different things they emphasize in the game.
Elite players: These had the highest scores across all features, but were not killed often by AI opponents. Also these players had very low skill levels in Plants and Mining. This indicates players focused on performance, without interest in skills not impacting their performance (Plants and Mining provide access to resources and equipment, however, resources can also be obtained via solving quests or auctioning off found items). These players are of direct interest not only because they are dedicated, but also because of their strong social networks (high number of friends): retaining themassists with ensuring a sustainable community.
Stragglers: the players with the lowest score for all features (including deaths from monsters), comprising 39.4% of the players. These players, even though they have reached level 32, perform rather badly in the game, and possibly a group that is at risk of churning.
Next to these interesting clusters there are two clusters with successively better scores,Average Joes and The Dependables, the latter with the highest scores except for the Elite. Investigating other features such as playtime in connection with this cluster might help gaining insights into how these players can be helped progressing into the Elite profile. Both of these groups of players exhibit low Plants and Mining skills; however, they are matched by the last two groups, the Worker I and Worker II. These have scores similar to the Average Joes and The Dependables respectively, but with high Mining and Plants skill, and comparably higher loot values, i.e. they have looted more items.
The fact that only two of six clusters of players appear to spend time on learning non-combat skills could indicate a design problem for Tera (keep in mind these are data from the beta and thus not representative of the current game). Resource-gathering skills like these are fundamental to the economy of an MMORPG, and with only roughly 25-35% (depending on the level bin) of the player base having high values in these skills, the flow of new raw materials may not be sufficient. Additionally, from a cost-benefit perspective, core gameplay features such as the non-combat skills should be utilized by most of the player base.
[This post was written in collaboration with Christian Bauckhage and Rafet Sifa.]
We are indebted to several colleagues for sharing their insights and feedback on this post, including but not limited to Christian Thurau, Fabian Hadiji and Shawn Connor.