Beware the Malicious (or Ignorant) Data Scientist

Darren Wu | June 12, 2014 in Uncategorized | Comments (0)

Tags: , ,

Oh, people can come up with statistics to prove anything. 14% of people know that.” – Homer Simpson

98% of all statistics are made up.” – A common statistics joke.

There are three kinds of lies: lies, damned lies, and statistics.” – A phrase popularised by Mark Twain, among others.

Three in ten Americans (29%) report that they are not good at math.” – Olgivy PR run survey for Change the Equation.

Fifty-five percent of Americans think that they are smarter than the average American.” – Yougov survey.

Bad Statistics

Three out of the above five statements/quotes are about or by Americans. No, I’m not picking on Americans, that’s just what I was able to find quickly using an American company’s search engine.

Statistics is a foundational component to data science, so, as much as many may fear mathematics, especially statistics, practitioners and users of modern business intelligence systems, data visualisation tools and infographics need to be careful of the power they wield and not become the profession derided by mass media through misuse.

The claims that some software vendors make, especially in the big data arena, about how easy it is to mash data together and derive useful insights bewilders me. Sure, I agree that certain things can be made easier, less tedious, faster and prettier, but that doesn’t detract from the need to have a strong knowledge of the underlying data, how that data was acquired, and the systems, processes, events or activities the data represents.

I would generalise that data scientists are not malicious or ignorant. At least not willingly or knowingly. But the great work a data scientist may have produced can be taken out of context and misused.

On the other hand, data scientists can have a bit of fun. Check out some of the work done at Spurious Correlations.

Leave a Reply

You can use these XHTML tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>