5 lessons from ‘How to lie with statistics’

 

False Statistics Meme

Data and statistics play an important role for Product Managers during decision-making. However, a product manager needs to be cautious while reading the data as the representation can be guileful. In the book, How to Lie with Statistics, Darrell Huff provides us with a few checks to be kept in mind to avoid being manipulated and taking wrong decisions.

Lesson #1: Be careful while reading graphs

Look at the graph showcasing the weekly average order value of a B2B e-commerce app over a period of time.

mis-leading graph

The reporter summarized the graph saying that the average order value on the platform has been trending constantly over the first 16 weeks of the year.
However, look at the same data when we change the range of the Y-axis.

Corrected Graph

The graph shows two trends:

  • Increasing trend of AOV
  • 4th-week seasonality of the AOV

The two representations are outputs from the same data. Hence, it is important to carefully look at the underlying data to correct the visualization for preventing ourselves from being misled.


Lesson #2: Check the sample size of the data before drawing conclusions

  • Before drawing any conclusions from the data, we should check the sample size over which data has been analysed. 
  • Larger the sample size, the better for analysis due to lower variance. 
  • Eg: We might get 7 heads in 10 coin tosses (70%), however, the probability to get 70 heads in 100 coin tosses would be much lower.


Lesson #3: Correlation does not always mean causation

  • Correlation showcases the relationship between two variables. On the other hand, causation signifies that one variable leads to another. 
  • Understanding the difference between the two would help us not draw wrong conclusions. 
  • Eg: there is a positive correlation between education level and income. However, there is no direct causation because factors like family background, networking, employment switches, salary hikes, etc. also contribute to increasing income.


Lesson #4: Check the underlying data while looking at percentages and averages

  • If we see an 80% increment in sales, we should deep dive into understanding the absolute figures. Is the increment happening over a base sale of Rs. 3 lakhs per annum or Rs. 3 crore per annum? These scenarios would provide a very different picture in terms of growth for the firm. 
  • Similarly, we should identify the data distribution to check if we should look at the mean, median or nth percentile of the data. Mean would be a good measure for a symmetric normal distribution, however, the 90th percentile might be a good measure for skewed distributions.  


Lesson #5: Build confidence in the source of data

  • The sample over which data is collected should be sufficiently big in size and randomly sampled, avoiding sampling bias.
  • The interview or survey used to collect the data should not have any leading or confirmatory questions which would result in biased results. 
  • Eg: When we ask a parent if using mobile would be having a negative effect on the grades of their child, the response would definitely be YES. Hence, concluding that ‘more than 95% of parents confirmed that mobiles have a negative effect on the grades of their child’ would not be correct. Instead, we should nudge the parents to share the factors they believe are the reasons for the lower grades of their child.


Read the book to view more examples of how data can be manipulated

How to Lie with Statistics by Darrell Huff (Kindle | Hardcover | Paperback)



Book a slot for interactions: https://topmate.io/jayant_jain7/

Post a Comment

Previous Post Next Post