Coaches Should Build Metrics : ShotSmash

Uptake of analytical metrics is below par. The main gatekeepers are the coaching staff. The more intuitive the metric, the wider the gate is opened. Yet fight against intuition and the gate swings shut. An understandable situation.
Increasing adoption and prising that gate open is important and can be achieved by:
  1. Making tools that improve the processes of coaching and game preparation
  2. Automatic generation of video clips based on metrics
  3. Improving our communication and interpersonal skills
  4. Including coaches in the development of metrics


The fourth can be very powerful, it increases two-way communication, application and ownership. It’s a sweet spot that is under utilised. Yet others would raise warning flags.

We need a theoretical and conceptual framework before we can make sense of this tracking dataThat’s what I say for the event data in the article you’re linking, but the need is even more critical in the case of tracking data. What are the important variables? What is the unit of analysis? We simply do not know. One way around the problem would be to borrow coaching concepts, but the work on event data was most successful when it challenged preconceived ideas about what’s important and what’s not, so I’m not a fan even though it would definitely help with the adoption

Marek Kwiatkowski
In Interview

I value the challenging of the preconceived. Gaining edges in the unknown or undervalued. Yet we will get there quicker with adoption and increased influence. Building in expert knowledge directly into models and metrics is a powerful tool. Although we may not discover football’s “Move 37“. 
Tracking data is heavy. Each game consisting of 1,835,000 lines and 137,825,000 characters. Clubs are only now starting to investigate it’s possibilities. Modelling insights from 5 seasons of such data is computationally expensive.
My hypothesis: Inject expert football knowledge into the core of tracking data modelling. This injection leads to quicker and computationally cheaper insights. ShotSmash was an experiment which gently prodded my hypothesis.


ShotSmash asked ‘experts’ to decide which of a pair of scoring opportunities was ‘better’. The voting results fed into an Elo Rating system which over the course of 12,000 votes gave each of the 939 shots had an final Elo Rating. Not a new idea, thanks Zucks.



Similar xG values but very different situations

For each shot I calculate the Expected Goal Value using Ben Torvaney’s xG model (there are more accurate models but this is the one I have access to).
Then let’s run a logistic regression with the goal result as the outcome and Elo and xG as predictors.


So both predictors are significant, but Elo Rating more so.  Let’s investigate if this difference is significant by calculating the odds ratio for both predictors.


Elo : 2.47596756
xG  : 1.24370437


Both are greater than 1, great! Let’s  calculate the confidence intervals of the odds ratios of both predictors.


2.5%: 1.76831294
95%: 3.53230726


2.5%: 0.97506327
95%: 1.57074806


There is no overlap therefore confirming a significant difference in predictive accuracy between the variables. Bingo!


In terms of model comparison; the Elo model is also significantly better than the xG model at predicting goal/no goals. The Elo model predictions being 23% more accurate than xG. This is predictable as the Elo model makes use of tracking data.


The Shotsmash model is far from being ready to implement and was just a bit of fun. Hopefully it sparks some thoughts in others of how they can incorporate coach expert knowledge into the process of building metrics.

Big thanks to Mladen Sormaz  for some help and guidance.


Expected Goals, Tracking Data & Data Accuracy : An Investigation

“Tracking data will lead to lush fields of new insights and knowledge”, is a common perception. The jump leads for public analytics they say. Yet oil covered mechanics extracts their heads from under the bonnet…. “You can get it started but.. there are problems ahead”


I am convinced that once tracking data becomes (more) available in football, we’ll be in for 2-5 years of absolute chaos before we understand how to use it. There will be wailing and gnashing of teeth and absolute dogshit visualisations everywhere.

Marek Kwiatkowski, New Kind of Analytics Inc.

Tracking data wont effect football decision making for two decades, if ever.

Paul Riley, Brand Excel

The advice of wise veterans is oft ignored by the adrenaline plugged ears of the masses – their logic, an unwanted dampening whilst we career into the ecstasy of the unexplored. A more careful entry would no doubt make for an easier ride to our destinations. But alas.
So… what challenges are ahead of us? 
Many… but at the very basic level… first up is…


Event data is coded by humans and with this comes error. Big thumbs, tired brains and even unknowingly racist eyes. Tracking data is algorithmically derived from multiple cameras, yet has some inherent error.


As you enter the unexplored, you will quickly notice a mismatch between the datasets.. excitedly you continue.. then you notice another.. and another.. then with a sinking feeling you realise this is a big issue.


Now you have two choices. One, ignore the mismatch, don’t tell anyone and hope it’s never mentioned again. Two, investigate further.


Let’s choose option two…


… For each event compare the distance between the player’s x,y positioning from the event data and the tracking data… for 202,719 events from 77 games. The difference in distance shows the dataset matching error.


The mean difference distance between the datasets was 470 cm. Alarming but let’s look closer at the distribution of the differences.


Distribution of Difference Distance


A high number of events are very accurately matched but there is a large distribution with a significant difference distance.
Let’s have a look if there is less difference depending on the ‘type’ of event.


Defensive Actions : 120cm
Goalkeeper Actions: 151cm
Shots: 216cm
Possession Based: 502cm


Some evident differences in the means, let’s have a look at the distributions.


The distributions clearly show the most amount of difference occurs when matching possession based actions such as passes and receives. Surely down to the volume of fast paced actions and a lower level of scrutiny of accuracy (compared to shots). Worryingly there were a number of shots that were mismatched by over 500cm.


Let’s see if there are any directional patterns in the differences, using the event data locations as the origin for all 202,719 events.


Visually not specific significant directional patterns can be identified. The error is a scary. In an xG model, two shots with a 100cm positional change could result in a large variance of xG values.

Lets see if there are any patterns in the direction of error in the subset of shots.


Once again there are no visually identified patterns in the direction of error. There appears to be a generalised error. My gut reaction would be this could have a large impact on Expected Goals (xG) values.

Impacts on xG 

Let’s calculate xG values for all 3,751 shots based on their event data and tracking data x,y positions. Ben Torvaney’s xG model is used (there are more accurate models but this is the one I have access to).  Comparing the difference in xG between the two locations will give us an initial glimpse at the size of the problem.

The maximum shift in xG values was -0.3770, which is a significant change in value and a real issue as a single figure….

A quick peak at the distribution…

Look at that green middle-finger to my hypothesis!

So the easy-defence team wins? … “yeah but with enough data the errors will be drain away via insignificant runoff!” … let’s see…

The mean difference distance in xG  -0.0081 whilst the median  was 0.0004.  So it’s true, when aggregating over a season the potential error in xG values becomes insignificant.

However.. what about at a player level?

Let’s get rid of all players that haven’t taken 10 shots or more and then rank the players by biggest mean xG difference. Here’s the top 10.

Once again the xG difference is almost insignificant and would not significantly impact the use of season long xG values when filtering and comparing players to recruit.

So… what about at per match basis?

We could use the mean distance difference for all shots of 216cm and do some simulations. A more reflective methodology would be to probabilistically generate distance differences based on their frequencies within dataset.

The probability distribution shows clear patterns:

So… Let’s simulate each game 1,880 times using the following steps for each shot within that game:

  1. Generate a ‘distance error’ based on the above probability distribution.
  2. Chose a random spot that is is same distance away from the original event as the ‘distance error’ yet on the pitch!
  3. Calculate the new xG value for the simulated x,y position.

For each match simulation the original and simulated xG values can be summed and compared to better understand the impact on single match xG analysis.

Let’s study the probability of various swings in xG per team. We are more interested in the extent of the swing rather than the direction of the swing so all negatives values have been converted to their additive inverse.

It’s positive to see that there is a 19.8% chance that there will be a swing of less than 0.1 xG per team per match. However, it’s concerning to see that there is a 5% chance that there will be a swing of more than 1 xG per team per match!

The potential of large swings makes you think that many single match xG results are wrong! What % of simulated games showed a reversal of xG match result? 6.7%!

Final Thoughts..

Although coaches utilise long-term trends, they have a large thirst for the data of singular matches. Therefore we have to ask ourselves are we comfortable with the dataset’s inherent error when presenting results at this level of granularity? If we are not, how do we present this margin or error/uncertainty to coaches?

Our audience may be skeptical and probabilities can be intuitively misread, so it would be easy to hide our error/uncertainty in a cupboard. However, I believe this only provides skeptics with ammunition to shoot us in the head if that error/uncertainty ever trips us up. Others will strongly disagree. There are good examples emerging within the media.

I would love to see more data visualisations within our small community that develop our competencies to discuss error and uncertainty with our audiences. Showing some weakness just might strength our relationships and influence with our audience.

** Disclaimer – TRACAB and Opta data are not used in this article **


In June 2017, I launched ShotSmash which was my attempt to recreate Mark Zuckerberg’s FaceSmash but instead of Harvard female students being compared, there were scoring opportunities in football.

The site was live for a few days and over 12,000 votes were cast comparing different types of scoring opportunities. Over the coming months I will be publishing some findings from the experiment.