I am convinced that once tracking data becomes (more) available in football, we’ll be in for 2-5 years of absolute chaos before we understand how to use it. There will be wailing and gnashing of teeth and absolute dogshit visualisations everywhere.
Marek Kwiatkowski, New Kind of Analytics Inc.
Tracking data wont effect football decision making for two decades, if ever.
Paul Riley, Brand Excel
Accuracy
Now you have two choices. One, ignore the mismatch, don’t tell anyone and hope it’s never mentioned again. Two, investigate further.
Visually not specific significant directional patterns can be identified. The error is a scary. In an xG model, two shots with a 100cm positional change could result in a large variance of xG values.
Lets see if there are any patterns in the direction of error in the subset of shots.
Once again there are no visually identified patterns in the direction of error. There appears to be a generalised error. My gut reaction would be this could have a large impact on Expected Goals (xG) values.
Impacts on xG
Let’s calculate xG values for all 3,751 shots based on their event data and tracking data x,y positions. Ben Torvaney’s xG model is used (there are more accurate models but this is the one I have access to). Comparing the difference in xG between the two locations will give us an initial glimpse at the size of the problem.
The maximum shift in xG values was -0.3770, which is a significant change in value and a real issue as a single figure….
A quick peak at the distribution…
Look at that green middle-finger to my hypothesis!
So the easy-defence team wins? … “yeah but with enough data the errors will be drain away via insignificant runoff!” … let’s see…
The mean difference distance in xG -0.0081 whilst the median was 0.0004. So it’s true, when aggregating over a season the potential error in xG values becomes insignificant.
However.. what about at a player level?
Let’s get rid of all players that haven’t taken 10 shots or more and then rank the players by biggest mean xG difference. Here’s the top 10.
Once again the xG difference is almost insignificant and would not significantly impact the use of season long xG values when filtering and comparing players to recruit.
So… what about at per match basis?
We could use the mean distance difference for all shots of 216cm and do some simulations. A more reflective methodology would be to probabilistically generate distance differences based on their frequencies within dataset.
The probability distribution shows clear patterns:
So… Let’s simulate each game 1,880 times using the following steps for each shot within that game:
- Generate a ‘distance error’ based on the above probability distribution.
- Chose a random spot that is is same distance away from the original event as the ‘distance error’ yet on the pitch!
- Calculate the new xG value for the simulated x,y position.
For each match simulation the original and simulated xG values can be summed and compared to better understand the impact on single match xG analysis.
Let’s study the probability of various swings in xG per team. We are more interested in the extent of the swing rather than the direction of the swing so all negatives values have been converted to their additive inverse.
It’s positive to see that there is a 19.8% chance that there will be a swing of less than 0.1 xG per team per match. However, it’s concerning to see that there is a 5% chance that there will be a swing of more than 1 xG per team per match!
The potential of large swings makes you think that many single match xG results are wrong! What % of simulated games showed a reversal of xG match result? 6.7%!
Final Thoughts..
Although coaches utilise long-term trends, they have a large thirst for the data of singular matches. Therefore we have to ask ourselves are we comfortable with the dataset’s inherent error when presenting results at this level of granularity? If we are not, how do we present this margin or error/uncertainty to coaches?
Our audience may be skeptical and probabilities can be intuitively misread, so it would be easy to hide our error/uncertainty in a cupboard. However, I believe this only provides skeptics with ammunition to shoot us in the head if that error/uncertainty ever trips us up. Others will strongly disagree. There are good examples emerging within the media.
I would love to see more data visualisations within our small community that develop our competencies to discuss error and uncertainty with our audiences. Showing some weakness just might strength our relationships and influence with our audience.
** Disclaimer – TRACAB and Opta data are not used in this article **
Imran
December 28, 2017 — 6:11 pm
Very interesting findings. Is there not a case to use tracking data alone as the ‘ground truth’? Given it’s inherent error can be quantified quite easily (I’m guessing it’s mostly instrumentation error in the cameras) and as you say Opta’s event data error is more variable, could we not ignore Opta entirely and derive all events from the tracking data? I suppose we could still use the Opta event timestamps to confirm what occurred and ignore the x/y coordinates.
Joe
December 29, 2017 — 1:59 am
It’s true that the inherent error could be more precisely quantified and therefore accounted for. I believe that companies are working on auto-tagging events based on tracking data, this should reduce the errors but also make things much data cheaper to produce.. and therefore maybe to buy?
I have ‘resynced’ the x/y coordinates based on the timestamp before.. its a solution but this relies on the timestamp being right 😉
Moh Chow
January 2, 2018 — 4:21 am
..”Opta’s event data error is more variable, could we not ignore Opta entirely and derive all events from the tracking data?..””
I must have missed this declaration in the article. Where does it mention that Optas data was even used so how can we conclude the above statement.
Joe
January 13, 2018 — 7:07 pm
Indeed Opta data was not used but it is likely all providers have similar issues in the underlying data
Harry
December 28, 2017 — 6:53 pm
This is really good. I quite often see people noting the model’s inaccuracies but input data not usually mentioned. Also some of those locations being off by > 20m! Are those extremes from the human or tracking (or both) side?
Joe
December 29, 2017 — 1:56 am
Harry, to be honest I don’t know and they could be where the coder tags the wrong player but in the right location!
Julen
December 29, 2017 — 10:52 pm
Hi,
Great study, could you do a Bland and Altman chart to see of there is any pattern between meassures?
Joe
December 30, 2017 — 12:56 am
Julen,
Thanks.
Will look into Bland and Altman charts further, thanks for the suggestion.
Joe