The dimensional model is a thing of beauty when it is done right. It is designed independently of vendor and architecture. It is built with fact tables for every process, conformed dimensions, hierarchies, slowly changing dimensions, bridges, aggregates and so on. It is a work of art you print on a plotter and hang on your office wall for business users to gaze upon in total awe. It is the map you use to trace data lineage and figure out join paths. It is the reference guide for enhancement impact analysis.
But stop, I’m getting carried away because I’m a data geek (aka guru). If I were ever to get lazy with a dimensional model (it would never happen), I know that usability of the system would punish me. Performance becomes less than optimal. Support is difficult. Auditability is lost. Data quality suffers.
Data modeling is a difficult thing to teach. There are several great books that should be part of every data modeler’s library, but true learning comes from being faced with a challenge, leveraging experience and best practices, and almost always making mistakes.
Monolithic hardware, a relational database engine and direct access queries comprise the traditional stack for the data warehouse. Data architects, modelers, and DBAs develop comprehensive and pragmatic best practices for the architecture to ensure optimal performance and maintainability.
All of the above worked for many years, but data and user requirements have evolved with lightning speed the last few years. Data now includes things like images, video, free text, social media inputs and machine data—just to name a few.
Ten years ago, technology vendors began releasing MPP database platforms. Indexes and partitions gave way to distribution and sorting. Performance was mind blowing. During this time, as a consulting organization, we noticed that the quality of our customers’ dimensional models diminished. After all, architects could shortcut best practices and the vendor technology would overcome performance issues. Views and federated access were leveraged for the BI tools hungry for those dimensional models.
At about the same time, commodity labor was expanding from other technologies into Data Warehousing and Business Intelligence. We saw Java web developers becoming ETL architects overnight. CIO’s demanded faster ROI from projects. Agile methodologies were implemented. Data models got even worse but results were delivered.
Today, we see new architecture options: NoSQL on either traditional or distributed hardware, Hadoop flavors, BI tools with non-relational connectors and analytics appliances. Hadoop’s advancements make it possible for its vendors to position Hadoop as a platform for the data warehouse. It’s coming—and quickly.
Our technology platforms are changing. Commodity (inexperienced) labor is flooding the candidate pool. This presents a huge risk for technical debt in our future data warehouses. However, sit back and take a deep breath, the sky is not falling. There are simple practices you can put in place in your organization to mitigate risk when it comes to data quality, integrity and properly designed and implemented projects.
- Challenge your data modeler candidates. Quiz them on the reasons for best practices, not on the book definition of a term. Ask them to describe their biggest data modeling mistake, the impact, and how it was resolved.
- For larger organizations, establish a data model review board. It should consist of ETL architects, BI architects, DBA’s, and data modelers. Schedule recurring meetings to work through challenges and approve designs.
- Encourage Proof of Concepts (POCs) on new models and new technologies. Don’t always go for the low hanging fruit. Make sure your model(s) can handle the most complex use cases.
- Stick to your standards. Build them if you don’t have them. Focus on:
- Naming conventions and patterns
- Conforming dimensions
- Hierarchies and snowflakes
- Multi-valued dimensions
- Recursive hierarchies
- Slowly changing techniques
- Transactional fact tables
- Snapshots, accumulations, factless fact tables
- Aggregates
Dimensional design maximizes the value of your data and the value of your BI applications, but it is meaningless without usability. With so much change on the horizon in our industry, the data model can make or break the success of your next project.