Posts Tagged ‘mistakes’

Devilish DataViz – Six Data Analysis Mistakes To Avoid

June 21, 2014 in Uncategorized | Comments (0)

Tags: , ,

A thought bubble persisting in my mind as the (tech marketing) fads of the moment celebrate how easy it is to deal with large volumes of disparate data, is how not thinking carefully about the inputs and methodologies can lead to terrible, terrible outcomes.

Simply presenting a compelling story with pretty graphics is a sure fire winner for mainstream, and often business management, consumption. I don’t dare to delve into the arguments that erupted (and for some, continue) over climate change which, based on the zillions of data points, lead us to try boil it down to binaries of, is it real or not, man-made or not, or the even more difficult, when.

I’ll take a different, and perhaps equally contentious, but certainly fun route of talking about porn and politics!

What? How do these go together, you ask?

Building on my previous article which looked at spurious correlations, I came across a great article that looks in detail how an infographic, well, just a chart really, makes six big decisions affecting the outcome that the casual viewer will not realise without digging deeper. We’re back at statistics being used to argue any point desired.

The folks at Source have written a great, in depth, article about the “mistakes” (their word) in a data presented by PornHub’s data scientists relating online pornography consumption by America’s “red” and “blue” states. That is, a predominance of voting for the Republican or Democratic parties.

PornHub - Red vs Blue

By Christopher Ingraham

I’m sure the chart would raise lots of eyebrows, but Jacob Harris writes on Source about six key errors:

  • Sloppy use of proxies – or bad sampling
  • Dichotomizing – turning something complex into just two categories
  • Correlation does not equal causation – ’nuff said
  • Ecological inference – sorta like stereotyping
  • Geocoding – specifically, badly applied IP geocoding
  • Data naivete – it’s not just numbers, there’s real meaning there to understand

Yes, data visualization can be devilish, but like anything, the devil is in the detail. You can’t blindly trust an infographic or chart. Validate the sources and think about what’s really going on.

I leave you with a link to this Dilbert comic.