Archive for June, 2014

Devilish DataViz – Six Data Analysis Mistakes To Avoid

Darren Wu | June 21, 2014 in Uncategorized | Comments (0)

Tags: , ,

A thought bubble persisting in my mind as the (tech marketing) fads of the moment celebrate how easy it is to deal with large volumes of disparate data, is how not thinking carefully about the inputs and methodologies can lead to terrible, terrible outcomes.

Simply presenting a compelling story with pretty graphics is a sure fire winner for mainstream, and often business management, consumption. I don’t dare to delve into the arguments that erupted (and for some, continue) over climate change which, based on the zillions of data points, lead us to try boil it down to binaries of, is it real or not, man-made or not, or the even more difficult, when.

I’ll take a different, and perhaps equally contentious, but certainly fun route of talking about porn and politics!

What? How do these go together, you ask?

Building on my previous article which looked at spurious correlations, I came across a great article that looks in detail how an infographic, well, just a chart really, makes six big decisions affecting the outcome that the casual viewer will not realise without digging deeper. We’re back at statistics being used to argue any point desired.

The folks at Source have written a great, in depth, article about the “mistakes” (their word) in a data presented by PornHub’s data scientists relating online pornography consumption by America’s “red” and “blue” states. That is, a predominance of voting for the Republican or Democratic parties.

PornHub - Red vs Blue

By Christopher Ingraham

I’m sure the chart would raise lots of eyebrows, but Jacob Harris writes on Source about six key errors:

  • Sloppy use of proxies – or bad sampling
  • Dichotomizing – turning something complex into just two categories
  • Correlation does not equal causation – ’nuff said
  • Ecological inference – sorta like stereotyping
  • Geocoding – specifically, badly applied IP geocoding
  • Data naivete – it’s not just numbers, there’s real meaning there to understand

Yes, data visualization can be devilish, but like anything, the devil is in the detail. You can’t blindly trust an infographic or chart. Validate the sources and think about what’s really going on.

I leave you with a link to this Dilbert comic.


Beware the Malicious (or Ignorant) Data Scientist

Darren Wu | June 12, 2014 in Uncategorized | Comments (0)

Tags: , ,

Oh, people can come up with statistics to prove anything. 14% of people know that.” – Homer Simpson

98% of all statistics are made up.” – A common statistics joke.

There are three kinds of lies: lies, damned lies, and statistics.” – A phrase popularised by Mark Twain, among others.

Three in ten Americans (29%) report that they are not good at math.” – Olgivy PR run survey for Change the Equation.

Fifty-five percent of Americans think that they are smarter than the average American.” – Yougov survey.

Bad Statistics

Three out of the above five statements/quotes are about or by Americans. No, I’m not picking on Americans, that’s just what I was able to find quickly using an American company’s search engine.

Statistics is a foundational component to data science, so, as much as many may fear mathematics, especially statistics, practitioners and users of modern business intelligence systems, data visualisation tools and infographics need to be careful of the power they wield and not become the profession derided by mass media through misuse.

The claims that some software vendors make, especially in the big data arena, about how easy it is to mash data together and derive useful insights bewilders me. Sure, I agree that certain things can be made easier, less tedious, faster and prettier, but that doesn’t detract from the need to have a strong knowledge of the underlying data, how that data was acquired, and the systems, processes, events or activities the data represents.

I would generalise that data scientists are not malicious or ignorant. At least not willingly or knowingly. But the great work a data scientist may have produced can be taken out of context and misused.

On the other hand, data scientists can have a bit of fun. Check out some of the work done at Spurious Correlations.


Man-Machine Interface & Data Visualisation

Darren Wu | June 6, 2014 in Uncategorized | Comments (0)

I’ve always been interested in the man-machine interface (or now, more politically correct, human-machine interface). And I’ve only more recently realised that in many ways, data visualisation is an extension of this and will become even more important as the “internet of things” becomes more mainstream.

Traditionally, thinking as a mechanical engineer, the man-machine interface has been the way in which we control machines, and in more recent times, controlling electronic machines such as computers, tablets and smartphones.

Computer Workstation Variables.jpg
Computer Workstation Variables” by Berkeley Lab – Ergonomics, Integrated Safety Management, Berkeley Lab.. Licensed under Public domain via Wikimedia Commons.

Whether flying an aeroplane or controlling a little angry bird on a smartphone, the human takes in information and acts upon a number of “levers” in order to get a response from the controlled object. In business, we are looking to impart control over profits and costs and the machinery of business presents itself in different kinds of data.

Instead of altitude, velocity and direction, in the case of a plane, we might look at sales pipeline data, call centre complaint call metrics, or financial costs of operations as some examples. Each of these needs to be presented to the pilot, or business manager to quickly assess the situation and make appropriate decisions.

This is where the dashboard concept has come from. Without getting into the detail of what should or should not be on a business data dashboard, the other challenge is in how that data, whatever it may be, should be presented. The science of communicating data must factor a plethora of sciences including psychology, ergonomics, cognition, usability, as well as all the data quality, data analytics, and perhaps a bunch of other specialities in between. It’s a lot to think about!

I recently found this highly educational document about Data Visualization for Human Perception in the Interaction Design Foundation website. This is a great primer on the topic suitable for newbies and veterans alike.

As Apple has demonstrated in great fashion, the KISS principle (Keep it simple, stupid) should reign supreme when designing a man machine interface or data visualisation. A key to that is understanding what you want the user to get out of it.

Now, if only I could get my utility to design better bills…..They want me to pay. How much? By when? How to pay? What did I use? Probably in that order. Everything else should be “below the fold”.