Abstract: |
One of the most important parts of a Data Warehousing System is the Extract-Transform-Load (ETL) component. It is responsible for extracting, transforming, conciliating, and loading data for supporting decision-making requirements. Usually, due to the complexity of managing heterogeneous data, this component is responsible for consuming most of the resources required for implementing a Data Warehousing System, representing a critical component that compromises the adequacy of the system. Despite their importance, the ETL development method is essentially ad-hoc, which does not always follow or embodies the best practices. With the emergence of Big Data and associated tools, script-based ETL became, even more, a common approach. In the last years, BPMN – Business Process Model and Notation – have been proposed and used to support ETL conceptual models. Still, as an expressive language, it provides different approaches for representing the same requirements. In this paper, we explore the use of BPMN for ETL conceptual modelling, analyzing existing approaches, and proposing a set of guidelines for using this notation in a more consistent way. |