Data Analytics for League of Legends: July 2015

Sunday, July 26, 2015

Are /r/leagueoflegends users really all challenger players? A quick analysis using self-reported flair data

There is a common meme on the /r/leagueoflegends subreddit that everyone is a Challenger (or more succinctly, challenjour). But what exactly are the compositions of /r/leagueoflegends users based on ladder ranking?

I mined flair data on /r/leagueoflegends for the last three days and got a sample of 1276 valid flairs from users (valid as in the flair is in the format <[summoner name] (region)>). After removing an additional 324 flairs in which the summoner name does not actually exist (or in a region without Riot API) and 146 flairs in which summoner name exists but has no ranked records, I am left with 806 flairs. The ladder distribution of these users are as follows:

In comparison, this is the ladder distribution for the NA server a few weeks ago:

It seems that /r/leagueoflegends users are mostly in Gold or Platinum. Comparably most players on the NA server are in Silver. This is probably not a surprising result since we can reasonably infer that the users on a discussion board for a game are more skilled than the average player population.

However, as with all analyses, it's good to note the limitation of the data at hands. Unlike /r/summonerschool, /r/leagueoflegends do not have a verification process built in place, so theoretically a user can masquerade as someone else with ease. Furthermore, many users choose to not report their summoner name as their flair so there is non-response bias within the data. Overall, there is room for a better study in the future.

Saturday, July 18, 2015

Technical Note on: How Long Does It Take for a Season 4 Gold Player to Return to Gold in Season 5?

This is a technical side note on How Long Does It Take for a Season 4 Gold Player to Return to Gold in Season 5?. This is written separately to provide some deeper insight on the analysis as well as giving me an opportunity to rant about certain things.

First of all, I don't actually have data on the exact number of wins when a Season 4 Gold player returns to Gold. All I have are snapshots of the entire ladder ranking at certain points in time - usually one snapshot per month. In other words, my data looks somewhat like this:

In this case, we see that this player was Platinum 5 on January 11th (Season 4) with 255 wins. On March 1st (Season 5), he is Gold 1 with 40 wins. 5 days later he is Platinum 5 with 52 wins. This means that this player managed to retain his Platinum rank within 40 to 52 wins.

It's not always possible to find this range for each player. Sometimes, the left hand end point does not exist because the player was never seen in a rank lower than his Season 4 rank. For example, for this player below:

He is a Platinum player in Season 4 and was never seen (by me) to be lower than Platinum in Season 5. Therefore, he retains his Platinum rank within 0 to 43 wins.

On the other hand, some players do not have the right hand end point because he simply haven't played enough games. For example:

This player is Platinum in Season 4, but only has 5 wins and currently in Silver in Season 5. Therefore, the number of wins he needs to retain Platinum is between 5 to infinity.

Overall, we see that the data I have gathered are far less than ideal. There are several reasons for this:

1. It's not possible to track more than one million players on a game-by-game basis since the Riot API and I both have limited bandwidth.

2. Even if it is possible to do this (I technically can since I have a production key now), it takes too much effort and space to store and manage the data.

3. The ranking data from the Riot API is actually slightly "bugged"; as far as I understand, due to some deep level architectural design, the data pulled can behave in unexpected ways.

That being said, I am a strong believer that statisticians should always be ready to work with less-than-ideal data, since "ideal data" does not require statistics to analyze. To this end, it's fairly easy to see that this data can be analyzed using an interval-censoring model - which is exactly what I have done here. I do need to assume that the censoring is independent of the time-to-event, which is probably not exactly true since many players stop playing after achieving Gold 5; however, it is an assumption which I am personally comfortable with.

There were more problems, however. To my best knowledge, R has a package for interval censored data called interval(see the published paper for this package here). Unfortunately, this package seems to be written with survival analysis in mind and works very slowly with large amount of data. The fact that it uses bootstrapped CI makes CI computation all but impossible for my purpose. It is probably possible to remedy this situation by avoiding bootstrapping (my impression after some cursory reading on interval-censored data is that it doesn't need to be bootstrap), but it may take a lot of time and effort. Therefore, CI is not plotted on the diagrams.

Solo Queue Ladder Analysis: How Long Does It Take for a Season 4 Gold Player to Return to Gold in Season 5?

One way to examine the effectiveness of the ladder is to look at how well a player can retain his rank after a reset. For example, if a player who is Gold is actually worthy of being Gold, we should expect to see this player to keep being Gold (or perhaps higher) after a ladder reset. Today, I am going to present some statistics on how the players' ranking have moved from late January 2015 (Season 4) to early July 2015 (mid-way of Season 5) - roughly a six months period.

First let's see how the players have moved. The table below presents the ladder transition from Season 4 to Season 5 for ~1 million players on the NA server who have ranked solo queue ladder records in both seasons.

For example, we see that after six months, 80.83% of Season 4 Bronze players remain in Bronze in Season 5; on the other hand, 53.94% of Season 4 Platinum players remain in Platinum whereas 10.6% of them are now Diamond or above.

Below is a more visual representation of the transition of these players.

Therefore, we see that more than half of the players who have records in Season 4 and played in Season 5 have retained their original rank (or better) after about six months. While this is interesting by itself, it does not take into account of the fact that many players are below their Season 4 ranking because they haven't played much. So for the next study, we will also take into the number of wins (number of games would be better, but they are not as readily available as number of wins via Riot API) into account. We are interested in, for example, how many wins it takes for a Season 4 Gold player to return to Gold (or higher) in Season 5?

As it turns out, the amount of wins required to return to Gold heavily depends on if the player was, say, Gold 5 or Gold 1. As you will observe below, a Gold 1 player from Season 4 will retain Gold in Season 5 far quicker than a Gold 5 player.

This is time-to-event plot similar to the one I used for the AFK analysis where our event of interest is to retain the same rank (e.g. Gold) from Season 4. The plot shows that the chance of a Season 4 Gold player staying below Gold in Season 5 (vertical axis) decreases as the player wins more games (horizontal axis). In particular, it suggests that a Gold 5 player in Season 4 has about 40% chance of staying below Gold after 100 wins (purple line); for Gold 1 players, this percentage is only about 2% (red line). Overall, as the player goes up within the Gold tier, he is able to more quickly retain the Gold tier after a ladder reset. I believe this is good evidence that the League system is actually working fairly well.

For some of the technical notes on this entry, click here.

Monday, July 6, 2015

A Quick Note on AP Ezreal with Runeglaive

TL;DR: AP Ezreal's win rate is 7% higher with Runeglaive. Ranger's or Stalker's seems to be the more optimal choices than Skirmisher's (2% difference) for Ezreal's Smite.

Patch 5.12 introduced a new jungle item - Runeglaive - which has the interesting effect of improving AP Ezreal dramatically. I found ~10k games with mid lane AP Ezreal in Platinum-Diamond level NA ranked solo queue and found the following:

Playstyle	# of Games	# of Wins	Win Rate
No Smite	681	282	40.29%
Smite + Runeglaive	9654	4571	47.35%
Smite + No Runeglaive	86	27	31.40%

A simple two-sided proportions test of (No Smite) vs (Smite + Runeglaive) gives the p-value at 0.003062, which suggests the result is statistically significant despite the relatively tiny sample for No Smite.

Out of the players who did build Runeglaive, their choice for Smite is as follows:

Smite Choice	# of Games	Popularity	Win Rate
Ranger	3043	31.54%	48.04%
Stalker	2249	23.31%	48.33%
Skirmisher	4357	45.15%	46.36%

It seems that Skirmisher's does not perform as well as Stalker's or Ranger's (46% vs 48%); but Skirmisher's is definitely the more popular choice. Poacher's was used for less than ten games; the interested reader may attempt to compute the number of Poacher's bought using the data above.

It's not surprising that AP Ezreal is doing a lot better with Runeglaive. Runeglaive gives AP Ezreal an AOE spell that can hit minions, thus vastly improving his waveclear - his traditional weakness. But it is quite amusing to see how a single item can dramatically change a champion from obscurity to LCS limelight.

Note:

1. The lane choice for the Ezreal player is determined using Riot's data.

2. The detection of AP Ezreal (as opposed to AD Ezreal) is by the proportions of magic and physical damage dealt to champions.