Since the dawn of history, man has endeavored to know why things happen. Generations have been devoted to determining root causes of countless phenomena. Of course, the thought goes, if we know "why," we can predict future actions that might improve our lives. One of the promises of Big Data, however, is that the "why" is no longer important, as long as we know "what."
Possibly one of the most famous outcomes of Big Data efforts has to do with Walmart, Pop-Tarts and hurricanes. A few years ago, the story goes, Walmart, one of the most technologically savvy retailers in the world, conducted what can only be defined as a Big Data study and, as a result, was able to determine what its customers buy when facing the threat of a hurricane or similar storm.
Walmart went into its own databases of past transactions to figure out what items moved fastest when storm notices went out. Of course, flashlights, bottled water and the like were hot sellers, but curiously, so were Pop-Tarts (and beer, which is often left out of this report). So now Walmart moves Pop-Tarts (and presumably beer) to the front of its stores whenever a storm threatens.
The moral of this story, though, is that Walmart did not stop to think "why" Pop-Tarts are so desirable. Of course, that can easily be discerned in this example, but without using the Big Data approach, Walmart probably never would have figured out the importance of Pop-Tarts to begin with.
Some might pooh-pooh this story as just good business and not indicative of the Big Data sensation. But a few characteristics of this process have "Big Data" written all over them.
First, the quantity of data used to conduct this study was by any measure very large, in the hundreds of terabytes if not more. Walmart had compiled this data over years, without really knowing what would be done with it.
In addition to the items purchased and their cost, the company also tracked time of day, weather and many other seemingly insignificant points. In the old days no one kept anything but the most critical data, simply because there wasn’t enough disk space. With Big Data, storing what used to be incredible amounts of data is cheap and relatively easy.
Second, Walmart analyzed ALL of its data. It didn’t use the usual approach of taking a sample set and correcting for outliers. It ran its studies across its entire data set. In the old days, again, this simply was not feasible due not only to limitations on disk space, but also processing power. Nowadays, processors are incredibly powerful.
Finally, Walmart did not develop hypotheses and try to analyze the data to prove any theory. Instead of brainstorming a "why" and trying to prove that point, they simply went after "what is." They don’t care why; they just know that Pop-Tarts are hot sellers in stormy weather.
This debate of correlation versus causation is a highly debated point between promoters of Big Data and its naysayers. It is clear that "what" is not always good enough and "why" needs to be defined. This is not a be-all, end-all solution. But in many, many situations this approach is more than sufficient.
———
John Agsalud is an IT expert with more than 20 years of information technology experience. Reach him at johnagsalud@yahoo.com.