Statistical coefficients and Excel

Written on 11 January 2022, 10:32am

Tagged with: , , ,

Quick follow up to this post.

Here is how to use Excel in order to answer the question below:

The correlation coefficient is calculated in Excel using the correl() function: =CORREL(B4:B9;C4:C9)

The determination coefficient is calculated in Excel using the rsq() function: =RSQ(B4:B9;C4:C9)

Of course, the coefficient of determination (R^2) can also be calculated as (correlation coefficient) ^ 2

Note: instead of the correl() function, you can also use the formula as here. You will arrive at the same result.

More complicated, but same result (0.529809)

The R-squared of the data set can be also shown by Excel if the data points are plotted in a chart and a linear trendline is added:

Note the same R^2 value of 0.2807

Recently, Alberto Cairo created the Datasaurus dataset which urges people to “never trust summary statistics alone; always visualize your data”, since, while the data exhibits normal seeming statistics, plotting the data reveals a picture of a dinosaur. These 13 datasets (the Datasaurus, plus 12 others) each have the same summary statistics (x/y mean, x/y standard deviation, and Pearson’s correlation) to two decimal places, while being drastically different in appearance. 

https://www.autodesk.com/research/publications/same-stats-different-graphs

Below I try to understand the highlighted concepts.

(more…)

Relax? Not yet

Written on 18 April 2020, 07:45pm

Tagged with: , , , ,

My view on the recent study arguing that the lock-down measures implemented in Belgium should be relaxed to match the ones in the Netherlands, which lead to similar infection numbers. For completion, the 3 main differences between the two countries are:

  • no legal enforcement of the lock-down in the NL
  • all shops remain open in the NL
  • telework is encouraged, but not mandatory in the NL

Note: For the sake of readability, I will add the relevant links at the end of the post.

What I liked in the study

  • the dependency between the policy, human behavior and outcome
  • the use of Google mobility data

The two premises

  • I pretty much agree with the first one – about the complexity of the epidemic models and the fact that the Belgian government is not very transparent in sharing all the data. But we should not underestimate the importance of the scientists working together with the decision-makers.
  • However, I do not agree with the second premise: that the models need precise data in order to work. Scientists routinely work with incomplete or imperfect data. Confidence intervals, margins of error, type I and II errors are all part of the game. This is not a perfect world.
(more…)