Over the last few years there has been a lot of industry buzz about the future of the enterprise data warehouse (EDW). Maybe we should change the classic EDW acronym for a new title: Extended Data Warehouse. One reason is the interest and use of Hadoop—the powerful distributed processing engine of Big Data. Hadoop is fast, cheap and handles unstructured data. From long-time database and data warehousing vendors to dozens of startups, you see a lot of chaos and upheaval as everyone is analyzing the latest trends, ideas and improvements. Transformation. Confusion. Disruption. Extension. It’s everywhere—and it’s a good thing.
Hadoop exploded on the scene quickly and was just confusing enough that in a 2012 Gartner survey, 11% of CIO’s surveyed said Hadoop would completely replace the data warehouse. Just one year later, only about 5% of CIO’s felt the same way.
So what happened in one year? I think it’s a good indication that some education is starting to take place and CIO’s and other technical leaders are better understanding this new wave of data analytics and how data and analytics should be done.
Enterprises are seeing the value of Hadoop. Industry spending analysis clearly indicates that organizations are starting to spend huge amounts of money on Big Data technologies. In my own research I am seeing global organizations publicly acknowledging significant projects and making big strategic investments in Hadoop as part of a core data framework for the future. Depending on the analyst, some figure estimate Big Data/Hadoop spending will be three times larger than data warehousing expenditures in the next 3-4 years.
I recently read a quote from a CIO who said that Hadoop adds to the value of the EDW and does not subtract. That’s an important concept. The relationship between the classic data warehouse and Big Data is complementary. Clear thinking CIO’s are not about to rip out all their well architected RDBMS or MPP systems. They will evaluate, select and use the right tools for the right projects and at the right price.
Ralph Kimball, a popular data warehouse pioneer and frequent speaker during the early days of The Data Warehousing Institute, said the following at a recent webinar. He was asked the question about whether relational databases are now dead.
I think that there was a sense, three or four years ago, that maybe this was all a giant zero sum game between Hadoop and relational databases, and that has simply gone away. Everyone has now realized that there’s a huge legacy value in relational databases for the purposes they are used for. Not only transaction processing, but for all the very-focused, index-oriented queries on that kind of data, and that will continue in a very robust way forever. Hadoop, therefore, will present this alternative kind of environment for different types of analysis for different kinds of data, and the two of them will coexist. And they will call each other. There may be points at which the business user isn’t actually quite sure which one of them they are touching at any point in time.Ralph Kimball
Much of the buzz seems to center on the Big Data impacts to an extended data warehouse, but there are other interesting technology breakthroughs that are impacting the data warehouse market, including in-memory-processing platforms and data virtualization potential.
While many companies and analysts continue to give names to these new ideas and strategies, the bottom line is that a simpler architecture where data is stored only once results in a lower total cost of ownership (TCO). Results are available instantly and cheaply. The extended data warehouse will bring about an entire new generation of applications for BI, reporting and dashboards, and mobile applications. It will be exciting to watch. Transformation. Confusion. Disruption. Extension.
Thanks to newer technologies like Hadoop, Mobile BI, the Cloud, NoSQL, multiple specialized BI disciplines, and the amazing combinations of technological and social forces, today’s business intelligence and data warehouse markets are in the middle of profound transformation. Ask people who have been around it all for a while and everyone agrees that changes are coming. What they don’t agree upon is what’s going to change, how will it change and how soon?
It sounds like the database wars of 20 to 30 years ago. Is this new wave going to be complete chaos or another golden opportunity? I think the latter—there is nothing better than a good, old-fashioned data battle. Maybe the word “big” will be the first casualty. It’s all data.