Thursday, July 28, 2011

"Big Data" and a little sleuthing

A recent McKinsey study entitled “Big Data” (tweeted by my friend and former colleague, Tim Suther of Acxiom) provides some eye-watering statistics about data.  To pick one, “15 out of 17 sectors in the United States have more data stored per company than the US Library of Congress.”  The embarrassment of riches forces us to ask: “what are we doing with all these data?”

You probably know that old joke that if you love sausage or you love the law, you shouldn’t watch either of them being made.  The same joke holds true for marketing data; you may love it, but you might not want to see how it’s made.

Scratch that.  If you love marketing data, you SHOULD see how it’s made.  Perhaps if by-the-numbers marketers really understood their data, they would agree with my mantra: don’t over-measure.

Let me give you a brief example.  In a previous job, a travel client asked me to analyze its loyalty database to understand more about their members--where they lived, what languages they spoke and so on.  (Side note: I’m not telling tales out of school here; I have worked with six different travel clients in my time.  Feel free to guess which one I discuss here, though.)

This marketer used the ISO 3166-1  two-letter codes for country of residence.  This code presented no problem for a lot of countries; it didn’t take a Jeopardy champion to know that “UK” meant “United Kingdom” or that “DE” meant “Germany.”  Then I got to “SJ.”  Turns out that “SJ” refers to Svalbard, a territory in the Arctic Circle belonging to Denmark.

That seemed odd.  Then I did a quick sort and found that this loyalty program had something like 20,000 members in Svalbard.  Even odder.  A quick visit to one of my favorite reference sites, the CIA’s World Factbook, uncovered that Svalbard has only about 40,000 residents.  To put it mildly, it seemed unlikely that half of this remote territory’s residents had joined a travel loyalty program.

Then I found other likely errors, such as a Persian Gulf country in which 40% of the members elected Chinese as their language preference.  Now, this country (It may have been Saudi Arabia, but it also may have been one of the Emirates; I don’t remember) may have a large community of itinerant Chinese, but the names associated with the language preference included “Yusuf” and “Doud,” clearly Arabic.

What accounted for these extremely unlikely data findings?  Who knows.  Perhaps massive mis-coding took place.  Perhaps an admin imported an old file incorrectly.  I wouldn’t rule out a full moon or Bieber fever.  More to the point, how could this marketer use these data in good conscience without weeding out the questionable files?

I don’t mean to imply that we should suspect all data.  Every database-friendly marketer from Amazon to Zappos would laugh at me on the way to the bank.  Too many marketers have made too much money--even from less-than-pristine data--to invalidate data as a whole.

Rather, I recommend that every marketer who depends on customer or prospect data should spend some time with those data.  Here’s a short checklist:

  • Conduct some top-level analysis simply by tabulating the counts for each field
  • Look for things that seem out of whack
  • Try to understand where any serious errors may have originated
  • Look for ways to remediate wayward data; many marketers can employ tactics such as appends or targeted “confirm your information with us” emails, but consider also using point-of-sale (POS), call centers or good ol’ fashioned direct mail

Or, if you prefer, continue sending Russian-language direct mail to Marta Gonzalez in Seoul.

No comments:

Post a Comment