Disclaimer: I am not a computer scientist, software developer, or a DBA. I just try to make things work with my hacky ways.
First of all, the Riot API provides a way to download game data from Riot's server. You need to read their documentations to figure out what kind of data are available, but the short answer is that there are a LOT of data available and it does take some bandwidth if you mine continuously just below Riot's threshold.
All of Riot's data are in the JSON format. There are a lot of tools which will allow you to parse JSON, but I am currently using a Java package called ulti. With this and my Java code, I request data from Riot's servers while staying barely below the threshold they set out.
On a side note, I am sure other programming languages like Python can do the job just as well, if not better. It just happened that I have fairly strong roots in C/C++ and I enjoy coding in C-like languages.
Once the data is collected, I save the data using a PostgreSQL database. If you are also mining data from Riot, I highly recommend that you set up a database for storage as the data can easily get out of hands if they are stored in plain text. Currently, my database is about 400GB in size; if entirely exported to CSV files, I will not be surprised if they will take more than 4TB of storage.
My disk usage from the data I mined since January, 2015 |
When it comes to actual data analysis, I use a combination of SQL (simple loading and aggregation), R, and Excel. Loading data and doing simple aggregations is simple; just write some SQL queries.
R, on the other hand, is slower but it is packed with more statistical tools that allows more insightful analysis.
Excel is another tool that I often use for quick-and-dirty plotting. Not every plot I make is sophisticated; sometimes I just want a quick bar chart and Excel does the job really well.
So there you have it. I hope it helps!