Saturday, July 30, 2016

Predicting ARAM Outcome Based on the Champions Selected

ARAM Outcome Predictor - Current on Patch 6.16

Enter the champions below, or to generate a random set of champions.

Champions on Blue Side:
Champions on Red Side:




Anyone with the slightest experience with ARAM knows that there are many good (and bad) champions on this map - in particular, ranged champions with long range poke and/or sustain tend to be favoured in ARAM. To quickly show that this is indeed the case, here are the top and bottom 5 champions in ARAM on Patch 6.15 by win rate after removing mirror matches:

Bottom 5:
ChampionWinrate
Ryze36.25%
Evelynn36.68%
LeBlanc37.80%
Rek'Sai38.39%
Kha'Zix38.78%

Top 5:
ChampionWinrate
Swain61.22%
Teemo61.32%
Galio62.86%
Sona63.30%
Ziggs64.10%

With such discrepancy in power between different champions and the fact that the champions chosen in ARAM are random, it is natural to ask how much of the game is decided as soon as the champions are selected. In other words, given the champions locked in for both sides and no other additional information, how well can we predict the outcome of the game?

By constructing a predictive model using machine learning techniques, I have discovered that I can predict the outcome of ARAM games in Patch 6.15 with around 66% accuracy. You can play with the my predictive model above, where you can enter the champions and see the predicted outcome.



Warning: technical descriptions of the model ahead.

As far as the methodology is concerned, it is very standard - I collected around 160k ARAM games from the NA server on Patch 6.15, split the data into a training and testing set (in 3:1 ratio), trained several machine learning models on the training set, and finally computed prediction error using the testing set.

Several different models were attempted, including logistic regression, random forest, XGB, and some simple MLP. Somewhat surprisingly, a logistic regression model performed remarkably well against the other models. A small amount of regularization was needed for the logistic regression since the covariate matrix was rank deficient.

The result from the testing set is as follows:

Confusion Matrix and Statistics

          Reference
Prediction  LOSS   WIN
      LOSS 13617  7174
      WIN   7371 14621
                                          
               Accuracy : 0.66            
                 95% CI : (0.6555, 0.6645)
    No Information Rate : 0.5094          
    P-Value [Acc > NIR] : <2e-16          
                                          
                  Kappa : 0.3197          
 Mcnemar's Test P-Value : 0.1041          
                                          
            Sensitivity : 0.6488          
            Specificity : 0.6708          
         Pos Pred Value : 0.6549          
         Neg Pred Value : 0.6648          
             Prevalence : 0.4906          
         Detection Rate : 0.3183          
   Detection Prevalence : 0.4860          
      Balanced Accuracy : 0.6598          
                                          
       'Positive' Class : LOSS 

Which seems very good. The ROC curve is as follows:


which may have room for improvement.