Why settle for just one realisation of this year’s UEFA Euro when you can let the tournament play out 10,000 times in silico?
Since I already had some code lying around from my submission to the Kaggle hosted 2010 Take on the Quants challenge, I figured I’d recycle it for the Euro this year. The model takes a simulation based approach, using Poisson distributed goals. The rate parameters in a given match are determined by the relative strength of each team as given by their ELO rating.
The advantage of this approach is that the tournament structure (which teams are assigned to which groups, how points are assigned, quarter final structure, etc) can be accounted for. If we wanted to take this into account when making probabilistic forecasts of tournament outcomes, we would need to calculate many conditional probability statements, following the extremely large number of possible outcomes. However, the downside is that it is only as reliable as the team ratings themselves. In my submission to Kaggle, I used a weighted average of ELO and FIFA ratings.
After simulating the tournament 10,000 times, the probability of victory for each team is just the number of times that team arose victorious divided by 10,000.
Feel free to get the code here and play around with it yourself. You can use your own rating system and see how the predicted outcomes change. If you are of a certain disposition, you might even find it more fun than the human version of the tournament itself!
It seems like the sum of those probabilities is above one…
Nope, they sum to 1.
> sum(y$density)
[1] 1
Reblogged this on chrisbeeleyimh and commented:
Get ready for Euro 2012… with simulations!
Hello – impressive: Unfortunately I get an error just at the beginning:
teams_f<-ratings$FIFA..Rating-min(ratings$FIFA..Rating-1)
Is there also the possibility of getting a chart of the most probable results of a single game?
Did you clone the whole repo? Not sure, but it looks like you might not have the data files.
Re most likely outcome for a given game: yes, but the way it is written, only the quarters, semis and final game outcomes are stored (in quarters, semis, and final_game respectively). From these, you could plot the outcome distributions.
I think the problem is that there is no such column as FIFA…
Oops, you’re probably running ‘single_match.R’. That has old code from the World Cup. Run ‘gr.R’ instead.
Ah, now it works – thank you!
Would it be hard to rewrite it that you can see histograms for all matches? This would be great!