Dimensional modeling
Dimensional modeling (DM) is the name of a logical design technique often used for data warehouses. It is considered to be different from entity-relationship modeling (ER). Dimensional Modeling does not necessarily involve a relational database. The same modeling approach, at the logical level, can be used for any physical form, such as multidimensional database or even flat files. According to Dr. Kimball[1], DM is a design technique for databases intended to support end-user queries in a data warehouse. It is oriented around understandability, contrary to database administration. According to him, although transaction-oriented ER is very useful for the transaction capture, it should be avoided for end-user delivery.
Dimensional modeling always uses the concepts of facts, AKA 'measures', and dimensions. Facts are typically (but not always) numeric values that can be aggregated, and dimensions are groups of hierarchies and descriptors that define the facts. For example, sales amount is a fact; timestamp, product, register#, store#, etc. are elements of dimensions. Dimensional models are built by business process area, e.g. store sales, inventory, claims, etc. Because the different business process areas share some but not all dimensions, efficiency in design, operation, and consistency, is achieved using conformed dimensions, i.e. using one copy of the shared dimension across subject areas. The term "conformed dimensions" was originated by Ralph Kimball.
The dimensional model is build on a star-like schema, with dimensions surrounding the fact table. To build the schema, the following design model is used;
1. Choose the business process
2. Declare the Grain
3. Identify the dimensions
4. Identify the Fact
CHOOSE THE BUSINESS PROCESS:
The process of dimensional modeling builds on a 4-step design method that helps to ensure the usability of the dimensional model and the use of the data warehouse. The basics in the design builds on the actual business process which the data warehouse should cover. Therefore the first step in the model is to describe the business process in which the model builds on. This could for instance be a sales situation in a retail store. To describe the business process, one can use to do this in plain text or use basic BPMN or with other design guides like UML.
DECLARING THE GRAIN:
After describing the Business Process, the next step in the design it to declare the grain of the model. The grain of the model is the exact description of the what the dimensional model should be focusing on. This could for instance be “An individual line item on a customer slip from a retail store”. To clarify what the grain means, you should pick the central process and describe it with one sentence. Furthermore the grain(sentence) is what you are going to build your dimensions and fact table from. You might find it necessary to go back to this step to alter the grain due to new information gained on what your model is suppose to be able to deliver.
IDENTIFY THE DIMENSION:
The third step in the design process is to define the dimensions of the model. The dimensions must be defined within the grain from the second step of the 4-step process. Dimensions are the foundation of the fact table, and is where the data for the fact table is collected. Typically dimensions are nouns like date, store, inventory etc. These dimensions are where all the data is stored. For example, the date dimension could contain data such as year, month and weekday.
IDENTIFY THE FACT:
After defining the dimensions, the next step in the process is to make keys for the fact table. This step is to identify the numeric facts that will populate each fact table row. This step is closely related to the business users of the system, since this is where they get access to data stored in the data warehouse. Therefore most of the fact table rows are numerical, additive figures such as quantity or cost per unit etc.
- ^ Kimball 1997.
- ^ R. Kimball - The Data warehouse lifecycle toolkit - Practical techniques for building data warehouse and business intelligence systems - Second Edition ISBN:978-0-470-14977-5
- ^ Matteo Golfarelli, Stefano Rizzi - Data Warehouse Design - Modern Principles and methodologies - ISBN: 978-0-161039-1
Dimension Normalization
Dimensional normalization or snowflaking removes redundant attributes, which are known in the normal flatten de-normalized dimensions. Dimensions are strictly joined together in sub dimensions.
Snowflaking has an influence on the data structure that differs from many philosophies of data warehouses.[1]
Developers often don't normalize dimensions due to several facts.
This is a an overview of some of the reasons[2]:
1.Normalization makes the data structure more complex
2.Performance can be slower, due to the many joins between tables
3.The space savings are minimal
4.The use of bitmap indexes can't be done
There are some good arguments, why normalization can be useful[1]:
1. It can be an advantage when part of hierarchies, is common to more the one dimension
When the same dimension can be reusable, for example a geographic dimension can be reusable because both the costumer and the supplier dimension both uses it.
2. With the possibility of having many characteristics of a product, snowflaking or normalizing is the possible way to handle such a problem
Literature
- Ralph Kimball (1997). "A Dimensional Modeling Manifesto". DBMS and Internet Systems. 10 (9).
- Margy Ross (Kimball Group) (2005). "Identifying Business Processes" (PDF). Kimball Group, Design Tips (69).
References
This article includes a list of references, related reading, or external links, but its sources remain unclear because it lacks inline citations. (November 2009) |