Friday, September 12, 2014

Total Shots Ratio as a predictor of match outcome

In this post, I take a closer look at the granddaddy of advanced soccer stats, James Grayson's Total Shots Ratio (TSR). In particular, I will assess the potential utility of TSR as a predictor of match outcome in Scottish football. 

To do this, I downloaded some match results data for the Scottish top-flight (2000-2014) from my usual source, XMLSOCCER.COM and wrangled it into R using the XML package.

To begin, the plot below shows that there is a positive relationship between TSR and goal difference at the match level, as expected. This relationship is statistically significant with TSR explaining ~20% of the variation in goal difference at the match level.



Next, I was curious to see how well TSR predicted the outcome of a match  (i.e, which team wins). So I built a simple logistic regression model in R using the glm function.

First, I transformed the continuous variable home goal difference into a binary response variable for use in the logistic regression (1 = home win, 0 = home loss or draw). Then I split the sample into separate training (n=1911) and validation (n=1273) data sets.

The logistic regression was built using the training data, and then applied to the validation data to estimate prediction accuracy.

Here is the resulting confusion matrix for the validation data.

Observed loss/draw
Observed win
Predicted loss/draw
543
283
Predicted win
165
282

As you can see, the logistic regression model does not do a very good job of predicting match winners in the validation sample; it has poor sensitivity (282/565=0.50). However, the model does a much better job of correctly identifying losses/draws, which means it has high specificity (543/708=0.77).

Thus, the overall accuracy of the model is 65%, which is significantly better than chance. 

So as a predictor (or at least retrodictor) of the outcome of a Scottish top-flight football match, a simple logistic regression with TSR as the sole independent variable performs much better than flipping a coin. In fact, the model is right about 2/3 of the time.

Not bad.

An additional insight from this analysis is that having a relatively high TSR gives a team a 50/50 chance of winning, but having a relatively low TSR gives a team a >75% chance of NOT winning.

This highlights the strong role of randomness in football.

Monday, September 1, 2014

The best of the rest in Scotland 2000-2014, Part II: Home teams

Last week I looked at the away performances of Scottish top-flight clubs since 2000, which revealed some surprising results.

This week I will do the same analysis, but for home teams. As before, I downloaded the data from XMLSOCCER.COM's free demo API and wrangled it using the XML package in R. All subsequent analyses were also done in R.

I focused on two measures of performance, average home goal difference (GD) and average Total Shots Ratio (TSR) per match. The former measures both points earned and win quality. While the latter is a measure of the degree to which one team controls the ball (in a subsequent post, I will show that these two variables are correlated at the match level).

As with my previous analysis, I hypothesized that one of the bigger non-Old-Firm clubs would be the "best of the rest," for example Hearts, Hibs, Aberdeen, Dundee United, or Motherwell.

As you can see in the bar plot below, my hypothesis was confirmed as the Edinburgh club Hearts has the 3rd best average home GD, followed by Aberdeen and Hibs.



A similar pattern can be seen in the bar plot of average TSR below; Hearts is the best of the rest again.


Unlike my analysis of away team performances, the current analysis did not reveal any major surprises. Heart, Hibs and Aberdeen are all relatively big clubs by Scottish standards and one would expect a bigger club to have a better record over time than a smaller club with fewer resources.

However, in my previous analysis of away teams, Inverness Caledonian Thistle (ICT) and Falkirk were the best of the rest with regard to average GD and average TSR respectively, despite both clubs being relatively small by Scottish standards.

It is important to note that Hearts was near the top of the heap in the away team analyses too. Their best-of-the-rest rankings both home and away are as follows.


Best of the Rest Rankings for Hearts (2000-2014)
Average Home Goal Difference
1st
Average Away Goal Difference
2nd
Average Total Shots Ratio (Home)
1st
Average Total Shots Ratio (Away)
2nd


Thus, I think it's safe to say that between 2000 and 2014, Hearts was, on the whole, the best team in Scotland, outside of the Old Firm.

Too bad they got relegated...