When can Liverpool win the Premier League

Written on 10 March 2025, 10:52pm

Tagged with: , , ,

With 9 games to go, Liverpool are top of the Premier League with a 15 points advantage:

Arsenal now have less than 1% chances, and the bookmakers offer 41 to 1 odds if you bet on Arsenal. It’s been a bit of a rollercoaster though, just watch the fall from 3.75 on 19 February to 26 just one week later:

source

Liverpool navigated very well the fiery February, but, with two draws that felt like two losses (at Everton and Villa), they needed Arsenal to drop some points. Arsenal dropped more than expected, and now it’s simply a matter of time until Liverpool are crowned PL champions.

I wanted to find a scientific way to determine when this could happen. And the answer is this:

How to read it?

The chart says that the earliest that Liverpool can win the PL is in match day 32 (with a 15.3% chance), when they will play West Ham at home. The most likely is the next match day (Leicester away, 35.9%), and by the time they play Arsenal (MD 36), there is a 90% chance that the PL title is already won. Arsenal only have a 0.35% chance to win the PL (not shown in the chart).

How does it work?

I used the xG data from fbref.com, averaging the xG at home and away for every PL team. If we take the next Liverpool game, against Everton: Liverpool have a 2.18 average xG at home, while Everton have a 0.99 average xG away. I used a Poisson distribution function to turn these xG numbers into actual goals, and, ultimately, into winning percentages: 65% Liverpool win, 19% draw, 16% Everton. This is how it looks like:

https://sinceawin.com/data/tools/poisson

I ran 100.000 simulations of the remaining PL games for both Liverpool and Arsenal and I came up with the percentages in the chart above. Interesting fact: running 10.000 simulations on an Intel Core i5 8th generation took about one minute. The same number of simulations on a MacBook M1 Pro took 13 seconds; so I could afford to run 100.000 simulations. However, running 10x more simulations did not change the data more than a rounding error, so for all intents and purposes, 10.000 simulations are sufficient.

More data

10 PL teams can no longer catch Liverpool. 2 more can follow if Liverpool win their next game against Everton at home, and another one (Bournemouth) if they don’t win their next game:

According to my simulations, 80 points are sufficient to win the PL in more than 80% of cases. So 10 more to go!

See also:

Update after GW 29

The curve shifts right

Statistical coefficients and Excel

Written on 11 January 2022, 10:32am

Tagged with: , , ,

Quick follow up to this post.

Here is how to use Excel in order to answer the question below:

The correlation coefficient is calculated in Excel using the correl() function: =CORREL(B4:B9;C4:C9)

The determination coefficient is calculated in Excel using the rsq() function: =RSQ(B4:B9;C4:C9)

Of course, the coefficient of determination (R^2) can also be calculated as (correlation coefficient) ^ 2

Note: instead of the correl() function, you can also use the formula as here. You will arrive at the same result.

More complicated, but same result (0.529809)

The R-squared of the data set can be also shown by Excel if the data points are plotted in a chart and a linear trendline is added:

Note the same R^2 value of 0.2807

2 notes on data visualization

Written on 9 October 2019, 09:37pm

Tagged with: , ,

  1. Know the limitations of pie charts: not so good for comparing values between themselves, but really good to compare relative to the 50% line
  2. Match your type of data with the right color scheme. There are 3 types of data: sequential, divergent and qualitative. The sequential color schemes help with ordered data. The divergent schemes use a neutral color the mid-range data and highly contrasting colors for the extremes. The qualitative schemes focus on creating visual differences between the sets of data.
Bar charts are better if you need to compare the values
But pie charts have their strengths when comparing to the 50% line
A sequential scheme. Colors range from light to dark, and are usually colorblind safe
A diverging scheme. Mid-range neutral color, highly contrasting extremes.
A qualitative, colorblind-safe scheme. It gets trickier if you need more than 4 colors. Each color need to scream “I’m different!

https://www.data-to-viz.com/caveat/pie.html
https://www.perceptualedge.com/articles/visual_business_intelligence/save_the_pies_for_dessert.pdf

http://colorbrewer2.org