Archive for the ‘Uncategorized’ Category

Data Competency and Business Management – Shift Happens

July 3, 2014 in Uncategorized | Comments (0)

Tags: , ,

Data Models by JD Hancock

Data Models by JD Hancock

Data, big data, and more data. Data is everywhere in business. I have seen a lot of questions about what is big data, what is a data scientist, and where can we find good data scientists. But, today I ask, how does business gain management competency with data? Business schools typically teach data analysis as a core subject for their MBA and Executive MBA programs. Subject names vary, but include:

Now, I’m not going to suggest that one grad school course in what is ultimately foundational probability and statistics is going to set up a manager to competently deal with the ever increasing demands of data capture and data analytics (it won’t), but I do want to say, based on the course outlines I’ve seen, they are perhaps teaching only a tiny subset of what managers really need in business and are short changing industry.

Some common issues, though certainly not an exhaustive list, I see that business leaders are grappling with, in the area of data, are:

  • I’ve got all this data, how does it help answer my questions?
  • Is the data of good quality?
  • There’s too much data, too many variables, how do I understand what I’ve got?
  • How do I connect this data to other data?
  • That’s a pretty chart. What does it mean? Can I trust it?

I am struggling to see how a course in probability and statistics, especially those that teach students how to perform the mechanics of conducting such analyses, is going to help managers answer those questions.

Even before the advent of “big data”, we’ve had a plethora of data analytics activities such as data mining, “fuzzy” logic, genetic alorithms, geospatial analytics, machine learning, natural language processing, querying, signal processing, and I’m sure there’s many more. Oh yeah, and statistics.

The challenge for the modern manager is perhaps the ability to call bullsh*t when presented with certain information, or to be able to reliably and confidently direct a line of enquiry within available data.

Another line of study for data lies within governance and law. What are the privacy and ethical issues surrounding the capture, use and storage of data?

Then there’s the immense field of data related technology. I don’t think it is the place of an MBA course to cover the plethora of tools available that cover the multitudes of analytics types for all manner of data types. Certainly, an MBA marketing course will not, and I think, should not, cover intricacies of Search Engine Optimisation, but at least cover the fact that it exists when covering digital marketing and the various P’s of the marketing mix.

The technology space moves way too fast for any academic setting to cover in a generalist course. But today, what doesn’t move fast?

Some years ago, the education sector released a fantastic video called, Shift Happens. Here’s the 2012 version of Shift Happens. It went viral, and many subsequent updates and other videos based on it have been released. Here’s a more recent one on YouTube. It’s also known as, “Did You Know?”

One of the main messages in it is, we are currently preparing students for jobs that we don’t even know will exist in 10 years’ time. It presents a large series of information snippets, like how the top 10 in-demand jobs in 2010 did not exist in 2004. Shift happens.

I think our business schools should be opening the eyes of our managers and future managers to the current and the possibilities, and work through how to think, analyse and solve business problems using data.

Rob Stenz at Forbes looks at how data is a core competency in a growing set of occupations. In it, he quotes the CEO of CareerBuilder, a large US jobs website, ““Occupations are evolving, and we are seeing data analysis in more job descriptions.”

I’d like to see MBA courses evolve to factor this shift.

 


Google Cloud DataFlow Previewed at Google I/O

July 1, 2014 in Uncategorized | Comments (0)

Tags: , ,

I had another post in mind for this recent week, got a bit busy, and then Google previews a new big data processing service that’s native and only available in the cloud.

This has the potential to be huge, literally. I am also presuming that you need large physical connections to get your live stream of data into it at a good pace to take real advantage. I think you’ll also need to be dealing with huge volume data streams.

Ingesting, cleansing and transforming huge volumes of data, in real time for real time analysis makes for interesting possibilities. Now I want to go look for some use scenarios that are applicable for the less than super massive data generating companies. I am thinking it could be an interesting platform for health and government data. I also wonder how it can be used in conjunction with some heavy engineering analysis.

I’d love to hear your thoughts. Meanwhile, here’s where I read about it:

http://techcrunch.com/2014/06/25/google-launches-cloud-dataflow-a-managed-data-processing-service/


Devilish DataViz – Six Data Analysis Mistakes To Avoid

June 21, 2014 in Uncategorized | Comments (0)

Tags: , ,

A thought bubble persisting in my mind as the (tech marketing) fads of the moment celebrate how easy it is to deal with large volumes of disparate data, is how not thinking carefully about the inputs and methodologies can lead to terrible, terrible outcomes.

Simply presenting a compelling story with pretty graphics is a sure fire winner for mainstream, and often business management, consumption. I don’t dare to delve into the arguments that erupted (and for some, continue) over climate change which, based on the zillions of data points, lead us to try boil it down to binaries of, is it real or not, man-made or not, or the even more difficult, when.

I’ll take a different, and perhaps equally contentious, but certainly fun route of talking about porn and politics!

What? How do these go together, you ask?

Building on my previous article which looked at spurious correlations, I came across a great article that looks in detail how an infographic, well, just a chart really, makes six big decisions affecting the outcome that the casual viewer will not realise without digging deeper. We’re back at statistics being used to argue any point desired.

The folks at Source have written a great, in depth, article about the “mistakes” (their word) in a data presented by PornHub’s data scientists relating online pornography consumption by America’s “red” and “blue” states. That is, a predominance of voting for the Republican or Democratic parties.

PornHub - Red vs Blue

By Christopher Ingraham

I’m sure the chart would raise lots of eyebrows, but Jacob Harris writes on Source about six key errors:

  • Sloppy use of proxies – or bad sampling
  • Dichotomizing – turning something complex into just two categories
  • Correlation does not equal causation – ’nuff said
  • Ecological inference – sorta like stereotyping
  • Geocoding – specifically, badly applied IP geocoding
  • Data naivete – it’s not just numbers, there’s real meaning there to understand

Yes, data visualization can be devilish, but like anything, the devil is in the detail. You can’t blindly trust an infographic or chart. Validate the sources and think about what’s really going on.

I leave you with a link to this Dilbert comic.


Beware the Malicious (or Ignorant) Data Scientist

June 12, 2014 in Uncategorized | Comments (0)

Tags: , ,

Oh, people can come up with statistics to prove anything. 14% of people know that.” – Homer Simpson

98% of all statistics are made up.” – A common statistics joke.

There are three kinds of lies: lies, damned lies, and statistics.” – A phrase popularised by Mark Twain, among others.

Three in ten Americans (29%) report that they are not good at math.” – Olgivy PR run survey for Change the Equation.

Fifty-five percent of Americans think that they are smarter than the average American.” – Yougov survey.

Bad Statistics

Three out of the above five statements/quotes are about or by Americans. No, I’m not picking on Americans, that’s just what I was able to find quickly using an American company’s search engine.

Statistics is a foundational component to data science, so, as much as many may fear mathematics, especially statistics, practitioners and users of modern business intelligence systems, data visualisation tools and infographics need to be careful of the power they wield and not become the profession derided by mass media through misuse.

The claims that some software vendors make, especially in the big data arena, about how easy it is to mash data together and derive useful insights bewilders me. Sure, I agree that certain things can be made easier, less tedious, faster and prettier, but that doesn’t detract from the need to have a strong knowledge of the underlying data, how that data was acquired, and the systems, processes, events or activities the data represents.

I would generalise that data scientists are not malicious or ignorant. At least not willingly or knowingly. But the great work a data scientist may have produced can be taken out of context and misused.

On the other hand, data scientists can have a bit of fun. Check out some of the work done at Spurious Correlations.


Man-Machine Interface & Data Visualisation

June 6, 2014 in Uncategorized | Comments (0)

I’ve always been interested in the man-machine interface (or now, more politically correct, human-machine interface). And I’ve only more recently realised that in many ways, data visualisation is an extension of this and will become even more important as the “internet of things” becomes more mainstream.

Traditionally, thinking as a mechanical engineer, the man-machine interface has been the way in which we control machines, and in more recent times, controlling electronic machines such as computers, tablets and smartphones.

Computer Workstation Variables.jpg
Computer Workstation Variables” by Berkeley Lab – Ergonomics, Integrated Safety Management, Berkeley Lab.. Licensed under Public domain via Wikimedia Commons.

Whether flying an aeroplane or controlling a little angry bird on a smartphone, the human takes in information and acts upon a number of “levers” in order to get a response from the controlled object. In business, we are looking to impart control over profits and costs and the machinery of business presents itself in different kinds of data.

Instead of altitude, velocity and direction, in the case of a plane, we might look at sales pipeline data, call centre complaint call metrics, or financial costs of operations as some examples. Each of these needs to be presented to the pilot, or business manager to quickly assess the situation and make appropriate decisions.

This is where the dashboard concept has come from. Without getting into the detail of what should or should not be on a business data dashboard, the other challenge is in how that data, whatever it may be, should be presented. The science of communicating data must factor a plethora of sciences including psychology, ergonomics, cognition, usability, as well as all the data quality, data analytics, and perhaps a bunch of other specialities in between. It’s a lot to think about!

I recently found this highly educational document about Data Visualization for Human Perception in the Interaction Design Foundation website. This is a great primer on the topic suitable for newbies and veterans alike.

As Apple has demonstrated in great fashion, the KISS principle (Keep it simple, stupid) should reign supreme when designing a man machine interface or data visualisation. A key to that is understanding what you want the user to get out of it.

Now, if only I could get my utility to design better bills…..They want me to pay. How much? By when? How to pay? What did I use? Probably in that order. Everything else should be “below the fold”.


Causality and the Link Between Twerking and Syria

October 10, 2013 in Uncategorized | Comments (0)

Tags: ,

Gotta love data analysis, but most of all, how one can magically link totally unrelated topics. Does that sound like the promises made by big data vendors? Throw a lot of data in and magically gain insights not thought of previously?

If you were lucky, you may have learnt about causality in science class at school. If not, here’s what Wikipedia has to say about causality.  More importantly, you may know that correlation does not imply causation. Wikipedia explains it, but you might enjoy XKCD’s version more.

Well, perhaps it’s not all that bad, but this interesting article looking at tweet volumes in the US for tweets including the term twerking or Syria.

The Twerkyria Index by County, August 2013 - Floatingsheep.org

The Twerkyria Index by County, August 2013 – Floatingsheep.org

What seems to be a bit of fun, can point to useful social research. Useful for what? Well, I’ll leave that to others. What would be interesting, maybe not really useful, is tweet volumes over the same period for twerking and Syria by country.

Bringing this all back to business….I wonder if there’s any research done using big data, on the hype that is big data, maybe done by tweet analysis on the phrase, “big data”? Is it correlated to major product launches, industry trade shows, or mentions on mainstream media?

Interested to see what you have to say in the comments…


40 Maps That Will Help You Make Sense of the World

October 3, 2013 in Uncategorized | Comments (0)

Tags: , ,

Mapping is one of the most common forms of data visualisation. Some of these images have small data sets behind them (like the map of writing systems), others have massive amounts (like the map of all the rivers in the contiguous United States).

Average Age of First Sexual Intercourse by Country, ChartsBin.com
Average Age at first sex by Country
Most of these maps make information accessible, and for some, also help contextualize scale.

The thing that is missing in each of these, though perhaps they are listed on their source websites, is information or metadata about the data sources. In today’s digital media and web era where images and text can be easily separated, it is important that the image contains relevant and complete information if it is to be taken seriously, especially when images can be taken out of their original context.

Metadata such as data sources, when the image was created, by whom, and any major assumptions should be part of the image, even in small text if necessary, if the image is to be used in anything more than a meme.

Check out the 40 maps here:

http://twistedsifter.com/2013/08/maps-that-will-help-you-make-sense-of-the-world/


The Color of Fire: How Palette Choice Impacts Maps of Yosemite’s Rim Fire

September 16, 2013 in Uncategorized | Comments (0)

Tags: ,

When preparing visualisations of data (for business or otherwise), think about what it is you are trying to convey and how easy it is (or not) to determine that information at a glance.

 

Compare this image depicting the progress of the massive Rimfire in California (source Wired):

Rimfire Progress Palette 1

Rimfire Progress Palette 1

and this (source Wired):

Rimfire Progress Palette 2

Rimfire Progress Palette 2

One uses a rather random colour scale for each of the days’  fire progress, where as the second one makes a colour progression reflecting the fire’s progression.

Read more about how this was prepared here:

http://www.wired.com/wiredscience/2013/09/rim-fire-map-color-scale/


Hierarchy of Business Data Needs

September 9, 2013 in Uncategorized | Comments (0)

Tags: ,

Kelle O’Neal over at Information Management talks about a business’s hierarchy of needs as a variation of Maslow’s hierarchy of needs. (http://www.information-management.com/news/the-information-management-hierarchy-of-needs-10024781-1.html)

Maslow’s Hierachy of Needs (source Wikipedia)

Maslow's Hierarchy of Needs Pyramid

Maslow’s Hierarchy of Needs Pyramid

I was thinking…what about a hierarchy of needs focusing on the use of data in business.

At the base, records, data points about business events such as transactions, customer and supplier details, and product details. Data is the foundation, without which you would have nothing.

Next, databases, and sometimes, spreadsheets as bulk storage of those records. This also provides some way of organising the data. You could consider data warehouses here too, as big databases, in a way.

Followed by querying and reporting. Having data is useless unless you can access it and do something with it.

Now we’re getting exciting…with the next level up being business intelligence, analytics and visualization. This adds sophisticated querying, you could say, and reporting. Some of this can be dashboards, graphic charts or even geospatial representation.

Finally, at the top of the hierarchy, I think would be data quality and master data management.

Now, you could also say that MDM and data quality should come first….but if you have no data to govern or ensure quality of, then why have MDM and DQ?

What we see in the real and practical world, since we supposedly have skills in foresight and planning, is knowing that we want to have useful data for business decisions, we should logically ensure DQ and MDM is done first.


Google’s Dremel Makes Big Data Look Small

August 18, 2012 in Uncategorized | Comments (0)

Tags: , , , ,

Google BigQuery Picture

It’s not so much visual, but you have to get answers before you have information to show. Want to query petabytes of data in a matter of seconds?

(more…)