polito.it
Politecnico di Torino (logo)

Markov Chain Model for Football Analytics

Emanuele Formento

Markov Chain Model for Football Analytics.

Rel. Enrico Bibbona. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Matematica, 2022

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (5MB) | Preview
Abstract:

This work arises from a thesis proposal launched by the company Deltatre, whose objective is to extract through a Markov chain model key performance indicators of a team or a player from the data collection carried out on a football competition. Each match is modeled as a Markov chain with suitable states and transitions. The Markov chain theory is used to create the model, while other mathematical tools as chi-square distribution and confidence intervals are used to check the goodness of results. The creation of statistics is inspired by the expected threat theory introduced by Karun Singh. The data preprocessing begins with understanding the data available and those useful for analysis: the most important are the ball position and the player and the team in ball possession, but also other information is used, as the type of event and the phase of the match. Then the states defining the model are choosen, composed by field areas plus two additional states, the goal and the lost ball; the number of field states depends on the subdivisions on both sides of the field, creating a m × n grid which can be represented using blurred and defined heatmaps. Once defined the states, a transition matrix is needed, or rather his estima- tion, to complete the Markov chain model; this is done computing the frequencies matrix, whose entries include the count of how many times the ball does a tran- sition from a state to another, and then normalizing it to obtain a probabilities matrix, having all sums on rows equal to 1. The transition matrices can be computed taking data from a single match or from the entire tournament; in this last case it is used also to compute the stationary distribution of the Markov chain. But these results come from estimates, so it is critical to explore their robust- ness through the Goodman method, obtaining heatmaps of absolute and relative confidence interval amplitudes. In the last chapter the idea of Karun Singh is applied to the model, obtaining the expected threat related to the tournament edition. Then it is used to find the dominance of teams during the match, which can be done in different ways and it gives a dynamic and clear idea of the behaviour of the match. Some possibilities are to group or to weight different values of the expected threat with respect to established criteria, for example subdividing the match in minutes or actions and by adding the expected threat in these parts. The idea of dominance is applied also to players, considering their expected threat both during a match or during the tournament. Then to make fairer players performances, two new improvements are introduced: a normalization for minutes played, so that those who play more minutes or matches are not more advantaged than others, and a contribution of players to the expected threat, computing their gains not just based on where they touch the ball, but also where they pass or steal it, thus encouraging even less offensive players.

Relatori: Enrico Bibbona
Anno accademico: 2022/23
Tipo di pubblicazione: Elettronica
Numero di pagine: 73
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Matematica
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-44 - MODELLISTICA MATEMATICO-FISICA PER L'INGEGNERIA
Aziende collaboratrici: deltatre s.p.a.
URI: http://webthesis.biblio.polito.it/id/eprint/24864
Modifica (riservato agli operatori) Modifica (riservato agli operatori)