Welcome to Modeling | Modeling News | Modeling Videos | Latest Modeling Trends


Wednesday, December 05, 2007

An Introduction To Dimensional Modeling For Data Warehousing - Part 1

The design principles of the dimensional model, which is commonly used in data warehousing, are described in this article series. Dimensional models capture business performance measurements, which are used to support decision making. Dimensional model The descriptive simplicity and high performance in query execution, are characteristics which have contributed to the increased use of the dimensional model in data warehouse infrastructures. The symmetry and descriptive simplicity can be seen at the conceptual model (see resource link) which relates to retail sales monitoring (data warehousing technology has been introduced initially in retailing).

Relational data models are use to implement the above conceptual model (as depicted in the resource link).

This model is easily understood by Business analysts, in contrast with other operational systems models (‘normalized data models’ in relational modeling language). In a relational modeling representation, the model in its simple form, consists of a central ‘fact table’ and ‘dimension tables’ which are connected to the ‘fact table’ via reference keys (foreign keys in relational modeling language). This form is called ‘star schema’. Fact table The ‘fact table’ is the central table in a dimensional model, which stores the measurements (facts) on which analytical processing takes place. All measurements adhere to the same level of detail. The most useful measurements are usually additive or semi-additive, in order to allow analytical processing (numerical calculations which produce additional ‘derived’ facts). During analytical processing, thousands or millions of fact rows are retrieved and numerical processing is applied on facts or fact combinations. Facts which are stored in a fact table, should be captured at the most detailed (or most granular) level (also called ‘atomic level’, meaning something that cannot be divided). On the other hand, maintaining aggregate facts, limits the analysis (or drill down) capability on certain dimensions. The model in the figure (see resource), captures an event: the sale of a product at a given time and all (or most) related to the sales event dimensions. This fact table type, is called a transaction fact table. Dimension tables Dimension tables describe the dimensions of a measurement on a business process. The features of each dimension should be as rich and flexibly described as possible (with many descriptive fields on the dimension table). Attribute names of the dimension tables should be sufficiently descriptive, so as to be easily and unambiguously understood.

Codes which are used in operational systems, should be replaced with descriptive names of the characteristics. Numerical (quantitative) measurements should not be entered in dimension tables (given that these facts should be stored in fact tables). Non numerical measurements (e.g. measurements which can be described in text) which are derived from a list of discrete values, should be entered in a dimension table. The dimension tables usually maintain a limited number of records (the different descriptions that a dimensional entity may take) (the number of rows is known as the cardinality of the table). The attributes of dimension tables, play an important role in dimensional analytical processing, given that they form the base of all ‘restriction operations’ which are applied. (e.g. sales that took place on a specific branch and date are derived by restricting on the branch and the date dimension). Moreover, they form the headings in the reports produced. Therefore, dimensional attributes are the ‘entry points’ to the measurements which are captured in the fact table. The value of a dimensional model is directly proportional to the quality and depth of its dimension tables.