Abstract: |
The Entity-Relationship (ER) model was developed to tackle the issue of logical view of data (Chen, 1976) and to this day, ER modeling is used as a basic tool in database design due to its simplicity, intuitive appeal, and ability to capture useful semantics. As the model is used across different domains, researchers are encouraged to add and develop constructs to the ER model in a careful manner (Badia, 2004). According to Moody and Kortink, entities in the ER model can be classified into three categories with their respect to mapping into dimensional model: transaction, component, and classification entities. Transaction entities record details of business events, component entities are those related to transaction entity by one-to-many relationship and classification entities are related to a component entity by a chain of one-to-many relationships in standard dimensional modeling approach (Moody & Kortink, 2003). Conversely, entities labeled as component and classification entities contain fields that are mapped into the dimensions. In analytical systems we often encounter situations where many to many relationships exist between component/classification and transaction entities which exhibit different levels of cardinalities which are currently nevertheless modeled in the same fashion. Our proposed data modeling notation has impact on better data warehouse design with specific impact on quality of business intelligence extracted from data warehouse (DW) designed in such fashion. The current standard cardinality notations don’t specify the exact size of the range of instances of one entity as it relates to the instances of another, or they specify the min and max range as specific integer numbers. However, these notations do not list the probability distribution of these ranges which in turn miss the opportunity to investigate one of the key factors that determine how the resulting dimensional model of the data warehouse is designed and implemented. Our notation acknowledges existence of probability distributions and suggests different dimensional model strategies based on category of distributions represented in a particular relationship. We propose new notation identifying different cases leading to different mapping strategies as ER model is mapped into a dimensional model. To illustrate our approach, we provide different examples of scenarios the new notation could be applied to. Two broad categories of cases we present lead to different approach to the apportioning of KPI metrics of interest. We also propose additional types of classifications of KPI metrics characterized by the nature of the method of their apportioning. Our discussion shows how our suggested additions to the standard ER diagram can improve efficiency and accuracy of the modeling stage resulting in better database design solutions more suited for effective analytics.
Badia, Antonio. (2004). Entity-Relationship modeling revisited. SIGMOD Record. 33. 77-82. 10.1145/974121.974135.
Chen, P. (1976) The Entity-Relationship Model: Toward a Unified View of Data. ACM Transactions on Database Systems, 1, 9-36.
Moody, D. L., & Kortink, M. A. R. (2003). From ER models to dimensional models: bridging the gap between OLTP and OLAP design, Part I. Business Intelligence Journal, 8, 7-24.
Roy-Hubara, Noa & Sturm, Arnon. (2020). Design methods for the new database era: a systematic literature review. Software & Systems Modeling. 19. |