"Big data" is the latest in a long line of new technologies to take the world by storm. While the promise of big data is as grand as its moniker, many wonder how to get started on such an endeavor.
Not surprisingly, from an IT perspective, implementing a big-data solution is no different from implementing any other software solution, and it is a process that hasn’t changed in decades.
First and foremost, a clear definition of what the system will — and will not — do must be identified. With big data this is often a loose definition, but a definition nonetheless. Are we trying to better predict our own customers’ buying patterns? Are we identifying better ways to manage our inventory? This is always the most critical step. What business issue are you trying to address by implementing a big-data solution?
Second, one must identify available options. This includes deciding whether you will be buying prepackaged software and/or hardware or building your own system. With big data the tools and/or environment might be new, but the decision-making process doesn’t change.
The next step is some type of design from both a hardware and software perspective. What are the sources of the data? How are you going to get this data into a format with which you can work? What hardware are you going to use to host your big data? Does software need to be developed? Do you need a specialized tool set to work with your data? Further, many big-data solutions are built upon specialized hardware, as opposed to commodity servers common in many environments.
If this is a custom software solution, then development comes next. For a big-data solution, this includes building the interfaces between the data sources and the data repository.
Next comes implementation of the solution. This is where all of the components of the system are assembled and put into place.
After implementation and before actual production use of the system, testing must be performed to ensure the system functions as planned. A test plan must be developed at the onset of, or prior to, this phase.
Prior to actually going live, organizations typically will require some sort of training. Because big data includes many relatively new technologies, cloud-based solutions could require more training than traditional systems.
Finally, after the new system has been put into production, how will it be supported? Who will you call when you run into issues or problems? These questions obviously need to be answered before actually going live.
So what is different about big data, from a systems implementation standpoint? The actual database, for one. For years, virtually every system has been built on what’s known as a "relational" database. Many big-data solutions still use a relational database, while others are depending upon other types of databases, such as NoSQL.
———
John Agsalud is an IT expert with more than 20 years of information technology experience. Reach him at johnagsalud@yahoo.com.