This paper defines a critique of current enterprise data warehouse (DWH) methodologies. In particular, it highlights the differences between line of business (LOB) specific data marts and an enterprise-wide DWH. The conclusion is drawn that an approach is needed for enterprise data warehousing that is completely different to that needed for LOB-specific warehouses. The concept of using enterprise business – strategy driven entity modeling to form the foundation of a new DWH methodology is introduced, and an outline of this approach and its implementation methodology is presented.
It is generally assumed that the requirements for a data warehouse are known, and that satisfying them is a case of implementing an appropriate technology, where the choice of technology is the most important factor.
This assumes that the data warehousing process can be broken down into a number of disparate steps, namely;
- Extract data from operational systems
- Massage the data
- Store the data in a warehouse
- Extract data appropriate to a users requirements, into a data mart
- Exploit the data mart with end user tools.
The current belief is that by applying the "correct" database design and managing the data with metabase*-driven tools, data from any number of operational systems may be stored in a single warehouse from which a unified view of the data may be presented to any user.
Database designs to achieve this aim are usually either data-driven relational systems, or designs based on the users exploitation requirements which could include star schema’s, EIS, OLAP, HOLAP or ROLAP systems. Database designs based on these systems can work when applied to the data from a single operational system, or even to data from a small number of related operational systems. Problems however arise when these design methodologies are applied to data from the myriad of databases present in a large corporation. This type of design frequently fails even when the inter-system key links necessary for the operational systems to meet the data requirements are defined and supported, because these design methodologies cannot guarantee the creation of a warehouse that is both correct and usable.
The reason for this is that the various parts of a corporate database do not fit together into any self-evident schema. Rather, they can be forced to fit almost any schema and the only sensible guide to designing the structure and content of a data warehouse is the currently known user exploitation requirement. While some degree of flexibility can be incorporated into the design process, as user requirements change instances will occur where the warehouse cannot support them. In time complex new requirements will render the chosen warehouse design obsolete. This highlights the fact that current data warehouse design methodologies do not manage the risk of either inappropriateness or redundancy.
The data warehouse may contain all the source data available in a corporation, and still be unable to support some exploitation requirements; in other words it is inappropriate; or it may contain data structures and content that are rendered redundant by changing user requirements.
The reason for the inadequacy of the current design methodologies is that the problems that they address through technology, vis. (i) the management of large quantities of data and (ii) the facilitation of end-user access to the database do not help the warehouse builders with the primary question of what should be in a corporate warehouse, and how should the content be arranged. Current warehouse design philosophies are also not appropriate for corporate data warehousing. This is because they are largely based on two "currently self-evident truths" that are in fact fallacies.
The first of these is "the consolidation myth" which states that a corporation needs to be able to source data from all lines of business in any combination. To facilitate this, all the data from each line of business - specific operational system must be available in a single database at a low level of granularity as this will facilitate consolidation into a corporate report. The second fallacy is "the extrapolation myth" which holds that a corporate warehouse is the same as a line of business specific data mart, except that it contains the data for all or several marts from various lines of business.
"The Information Age...is dead and gone, replaced by the information economy...The Information Age was about the building of databases. It was about the rise of…