Monday, April 20, 2015

How I mine and analyze Riot's data

I seem to get this question a lot, so I figure I should write a short piece about this.

Disclaimer: I am not a computer scientist, software developer, or a DBA. I just try to make things work with my hacky ways.

First of all, the Riot API provides a way to download game data from Riot's server. You need to read their documentations to figure out what kind of data are available, but the short answer is that there are a LOT of data available and it does take some bandwidth if you mine continuously just below Riot's threshold.

All of Riot's data are in the JSON format. There are a lot of tools which will allow you to parse JSON, but I am currently using a Java package called ulti. With this and my Java code, I request data from Riot's servers while staying barely below the threshold they set out.

On a side note, I am sure other programming languages like Python can do the job just as well, if not better. It just happened that I have fairly strong roots in C/C++ and I enjoy coding in C-like languages.

Once the data is collected, I save the data using a PostgreSQL database. If you are also mining data from Riot, I highly recommend that you set up a database for storage as the data can easily get out of hands if they are stored in plain text. Currently, my database is about 400GB in size; if entirely exported to CSV files, I will not be surprised if they will take more than 4TB of storage.

My disk usage from the data I mined since January, 2015
Databases also make retrieval easier, since SQL queries are fairly efficient.

When it comes to actual data analysis, I use a combination of SQL (simple loading and aggregation),  R, and Excel. Loading data and doing simple aggregations is simple; just write some SQL queries.




R, on the other hand, is slower but it is packed with more statistical tools that allows more insightful analysis.


Excel is another tool that I often use for quick-and-dirty plotting. Not every plot I make is sophisticated; sometimes I just want a quick bar chart and Excel does the job really well.


So there you have it. I hope it helps!



Wednesday, April 15, 2015

Flash on D or F - revisited

Let us revisit this question. Flash on D or on F? It seems that Flash on D is slightly more popular. But perhaps more interestingly, it is possible that players who put Flash on D have different lane preference compared to players who put Flash on F. I will also give my theory on this phenomenon near the end of this blog entry.

I analyzed about 370k ranked solo queue games played on Patch 5.6 and found about 600k unique players. Here's the frequency table for these players:

Flash on D Flash on F Not Identifiable
# of Players 301269 290322 10947
Percentage 50.0% 48.2% 1.8%

In summary, there are slightly more players who use D for Flash than F. Note that the "not identifiable" 1.8% of the players are due to a combination of inconsistent use of D and F for Flash and/or running no Flash at all.

More interestingly, it is possible that players who use D for Flash have slightly different lane of preference compared to players who use F for Flash - although the difference in preference is fairly small. As you can see below, players with Flash on D chooses to play mid lane 19.8% of the time; for players with Flash on F, it's 20.5%.

Player \ Chance to play BOTTOM JUNGLE MIDDLE TOP
Players with Flash on D 40.2% 20.6% 19.8% 19.5%
Players with Flash on F 40.0% 19.8% 20.5% 19.6%

Again, the difference is small, but a chi-squared test of independence shows that the difference is significant so the choice of D or F for Flash and choice of lane are probably not independent[1]. 

I have a theory for the result above.

First of all, why do some players choose D and some players do F for Flash? Well, when you play your first game on a level 1 account, the game places Heal on D and Ghost on F by default. Flash is not available for a level 1 account; but over time you will gain the option to use Flash and you will be faced with the dilemma of assigning it to D or F.

I think at this point, the decision to use D or F for Flash depends on if the player thinks Flash is similar to Ghost or not. If the player treats Flash like a movement spell similar to Ghost, I think it is natural to keep Flash on F. On the other hand, if the player treats Flash differently from Ghost, it makes more sense to bind it on a different button.

Therefore, the choice of using D or F for Flash may be demonstrating different modes of thinking for players - and perhaps the button choice is a reflection of the player's personality. Because of this, we also see that players who choose D for Flash have slightly different lane choices compared to players who choose F for Flash.

Then again, this is just my theory. It would be cool to be able to do some personality tests on players and see how personality correlates with lane choices though!


Footnote
[1] Feel free to rip me apart on how I inappropriately used chi-squared test here.

Thursday, April 9, 2015

URF Win Rate 2015

95k games collected via Riot API.
Sorted by win rate highest to lowest.
For win rates, mirror match-ups are filtered out.

Champion Popularity Win Rate
Sona 27.04% 69.03%
Galio 5.27% 64.13%
Wukong 14.29% 58.76%
Malzahar 12.14% 56.79%
Skarner 2.92% 56.79%
Jax 6.67% 56.69%
Maokai 12.43% 55.58%
Diana 6.31% 55.36%
Evelynn 11.62% 55.23%
Karthus 11.36% 55.16%
Fiora 6.47% 55.02%
Hecarim 23.67% 54.97%
Zed 32.42% 54.97%
Orianna 4.67% 54.72%
Xin Zhao 9.03% 54.72%
Poppy 5.26% 54.49%
Vladimir 10.77% 54.12%
Shaco 27.51% 54.09%
Ahri 14.07% 53.65%
Zyra 4.17% 53.41%
Sivir 5.64% 53.32%
Master Yi 17.00% 53.29%
Soraka 8.08% 53.24%
Ezreal 39.49% 53.13%
Nami 7.57% 53.06%
Morgana 13.67% 53.03%
Jayce 11.85% 52.95%
Nocturne 1.65% 52.80%
Karma 8.43% 52.66%
Fizz 15.71% 52.63%
Tryndamere 5.72% 52.62%
Shen 6.40% 52.55%
Lulu 5.77% 52.48%
Kog'Maw 5.76% 52.29%
Urgot 7.59% 51.96%
Kennen 3.87% 51.85%
Syndra 6.49% 51.75%
Annie 9.86% 51.71%
Talon 7.36% 51.70%
Malphite 10.73% 51.56%
Kayle 5.05% 51.47%
Cho'Gath 8.81% 51.47%
Sejuani 2.85% 51.41%
Janna 2.99% 51.18%
Rumble 3.13% 51.07%
Twisted Fate 7.28% 50.83%
Yorick 3.85% 50.83%
Blitzcrank 23.23% 50.72%
Amumu 7.76% 50.60%
Irelia 2.82% 50.40%
Swain 6.20% 50.37%
Mordekaiser 4.80% 50.12%
Graves 4.10% 50.08%
Dr. Mundo 5.03% 50.02%
Nautilus 5.36% 50.02%
Gangplank 11.51% 50.00%
Katarina 17.83% 49.89%
Varus 3.60% 49.81%
Alistar 12.67% 49.80%
Azir 8.99% 49.65%
Darius 5.21% 49.65%
Lux 20.67% 49.40%
Nasus 13.10% 49.22%
Jarvan IV 3.04% 49.08%
Nidalee 30.39% 48.98%
Quinn 1.70% 48.60%
Warwick 4.10% 48.45%
Veigar 8.41% 48.36%
Heimerdinger 5.26% 48.29%
Zac 1.53% 48.15%
Taric 3.03% 48.07%
Draven 1.87% 48.05%
Teemo 19.35% 48.01%
Riven 14.79% 47.93%
Rammus 2.59% 47.82%
Ziggs 10.12% 47.76%
Ashe 10.57% 47.31%
Ryze 10.71% 46.44%
Volibear 1.92% 46.38%
Vi 6.12% 46.35%
Aatrox 1.53% 46.31%
Vel'Koz 8.72% 46.31%
Leona 8.39% 45.95%
Udyr 5.59% 45.77%
Rengar 5.01% 45.66%
Kha'Zix 4.40% 45.60%
Renekton 1.68% 45.45%
Shyvana 0.96% 45.42%
Lucian 13.78% 45.29%
Trundle 1.21% 45.19%
Pantheon 6.21% 44.96%
Lissandra 3.98% 44.95%
Miss Fortune 2.50% 44.93%
Fiddlesticks 6.01% 44.92%
Xerath 6.07% 44.65%
Braum 1.55% 44.59%
Caitlyn 3.22% 44.54%
Sion 4.49% 44.52%
Jinx 15.27% 44.35%
Brand 6.16% 44.17%
Singed 2.96% 44.07%
Viktor 3.91% 44.05%
Gragas 3.17% 43.35%
Olaf 1.87% 43.27%
Twitch 2.27% 43.21%
Tristana 3.26% 42.81%
Vayne 5.22% 42.78%
Kassadin 12.51% 42.61%
LeBlanc 20.49% 42.41%
Garen 3.64% 42.01%
Nunu 4.16% 41.96%
Akali 6.91% 41.71%
Rek'Sai 3.29% 41.59%
Lee Sin 7.02% 41.52%
Anivia 3.36% 41.43%
Corki 3.24% 40.90%
Elise 0.97% 39.96%
Yasuo 6.30% 39.44%
Cassiopeia 3.30% 38.82%
Zilean 4.50% 38.66%
Gnar 2.51% 38.66%
Kalista 2.08% 36.82%
Thresh 5.44% 34.32%
Bard 11.81% 30.75%


Blue side win rate: 52%

I think Riot deserves some praise here. They spent a lot of time tuning many champions individually so they do not need to be disabled. In URF 2014, 9 champions were either initially or later disabled due to their gross power on URF ( Kassadin, Ryze, Sona, Hecarim, Kayle, Soraka, Nidalee, and Alistar). This year, no champions were disabled (yet) after global healing/shield adjustments and some individual adjustments.

The URF mode win rate from last year can be found here.