Absolutely nobody disputes the potential value of big data. It provides an economic way to ask new analytics questions that we were never able to ask before. And that is possible because we are able to combine new, large, and widely disparate data sets in ways that were never economically possible before. The challenge people are now facing is that it getting harder and harder to show business value.
Getting to Business Value
There is an amazing amount of innovation going on around big data technology. Practically every day some new technology or farm animal is announced. The challenge for all of us is that the rate of innovation can get in the way of delivering actual business results. Here are a couple of common examples:
- When MapReduce first came out, many people jumped on that opportunity and started writing great code with it. Then, Spark came along. Spark was so cool and interesting that many organizations decided to drop what they had been doing with MapReduce and move to Spark. And that meant a total re-write of all that code that had written, with the loss of thousands of programmer-hours. It is highly likely that there will be a high level of technology change for the foreseeable future. Who knows when we will see a Spark replacement?
- On another vector, imagine having to manage a big data stack. To keep a modest sized big data environment functioning you are probably looking at a minimum of 6-12 different technologies, for storage, computing, data warehouses, and higher-level analytics. Not to mention data discovery, data prep, data security, data quality and governance, and data visualization. An incredible amount of time is being spent keeping all of those technologies current and integrated with each other. No analytics organization I have ever spoken to wants to be in the system integration business. They want to be delivering actionable insights for their organizations.
- Even more interesting, a great deal of new big data and analytics innovation is starting to appear in the cloud. For example: I don’t think we will see Google Deep Learning offered on-premise any time soon. There are indisputable and well-documented advantages to using the cloud. But a hybrid environment will also mean a higher degree of challenge in designing and managing a hybrid data management architecture that connects the data in the clouds (probably plural) with on-premise systems.
The example below shows just some of the common big data technologies. There is a general progression from older technologies to newer technologies as you move from left to right.
The Only Constant in Big Data Is Change
The question becomes: how do you leverage the best technologies available while still maximizing the return on technology investment in big data for your organization?
You will never get there if you spend most of your time on the big data technology change treadmill. What is required is a data management platform that will enable you to run the big data technology that best fits your business need but abstracts that from the process of data management development.
In the example below, a data management platform separates the Data Visualization and Analytics layer from the underlying big data technologies; Compute, Storage, Distributions, and Data Warehouses.
What to Look for in a Data Management Platform
In any organization with a data-centric strategy, hand coding just will not scale to enterprise-class problems or to larger groups of developers. In this environment data must be a shared resource available to any system, process, or data self-service. Thought-leading organizations that are taking a taking a different approach. They are using data management platforms that provide:
- An end-to-end solution: Full data management includes data discovery, data integration, data quality, data prep, master data management, data security, data governance and more. This should be integrated.
- Modularity: You shouldn’t have to buy the entire platform at once. You should be able to start where it makes sense for you and grow your data management capabilities at the pace that is comfortable for you.
- Abstraction: The platform development environment must provide a layer of abstraction between the development layers and the underlying big data technology. You should be able to code once and have the platform intelligently determine the best engine to run the code on. And it will help a lot if the platform supports the most current engines available.
- Hybrid: The platform must be able to manage data wherever it resides, cloud, on-premise, big data, or something completely different.
- Intelligence: in 2017 IT budgets are starting to grow after many years of flat budgets worldwide. But, that will not be enough to scale to the needs of organizations who are looking to compete based on their use of data and analytics. The platform must accelerate productivity by providing intelligence to make recommendations, and automate tasks such as parsing and relating new data for greater understanding.
- Self-service: IT will play a role in delivering data that is ready for business use, but after a point, it makes sense to enable the subject matter experts, the business analysts, to do their own data prep and visualization.
Data management is being re-imagined to deliver greater, faster business value.