Keynote Lectures

Schema Evolution for Relational Databases
Panos Vassiliadis, University of Ioannina, Greece

Data Lakes: A Solution or a new Challenge for Big Data Integration?
Christoph Quix, Hochschule Niederrhein, University of Applied Sciences and Fraunhofer FIT, Germany

Schema Evolution for Relational Databases

Panos Vassiliadis
University of Ioannina, Greece

Short Bio
Prof. Panos Vassiliadis received his Diploma in Electrical Engineering and his PhD from the National Technical University of Athens in 1995 and 2000, respectively. He joined the Department of Computer Science & Engineering of the University of Ioannina in 2002. Prof. Vassiliadis has been involved with research in the area of data warehousing (metadata management, OLAP, and quite emphatically, ETL) since the late’90’s. He has also worked on top-k and context-based querying and web service management. Following a common thread in his work, he is currently investigating how the rigorous modeling of data, software and their interdependence can be exploited for the design, visualization and smooth evolution of data-intensive software ecosystems. Prof. Vassiliadis is the co-editor of the book “Fundamentals of Data Warehouses” (Springer). He has several publications in international journals and conferences and an active service to the scientific community as a reviewer and PC chair.

Abstract
Like all software systems, databases are subject to evolution as time passes. The impact of this evolution is tremendous as every change to the schema of a database affects the syntactic correctness and the semantic validity of all the surrounding applications and in fact necessitates their maintenance in order to remove errors from their source code. The talk will provide a walk-through on the current state of knowledge on the mechanics of schema evolution for relational databases. The main lessons learned from the existing case studies will be discussed; moreover, recent findings on frequent patterns of change will also be presented. Open issues for further research will be discussed at the end of the talk.

Data Lakes: A Solution or a new Challenge for Big Data Integration?

Christoph Quix
Hochschule Niederrhein, University of Applied Sciences and Fraunhofer FIT, Germany

Short Bio
Christoph Quix is a senior researcher in the Life Science Informatics group at the Fraunhofer Institute for Applied Information Technology (FIT) in St. Augustin, Germany, where he leads the department for High Content Analysis. Earlier, he was an assistant professor in the Information Systems Group (Informatik 5) of RWTH Aachen University, Germany, where he completed his habilitation in early 2013 and received his Ph.D. degree in computer science. His research focuses on data integration, big data, management of heterogeneous data, metadata management, and semantic web technologies. He has about 80 publications in scientific journals and international conferences. He has been involved in several national and international research projects, which have been conducted in cooperation with research and industry partners. He was a PC chair of CAiSE 2014, member of the PC for several major conferences on databases and data modeling (e.g., ER, ICDE, and ODBASE), and the organizing chair of several international workshops.

Abstract
“Data Lake” is a new concept that has been introduced in the Big Data field to address the problem of the integration of heterogeneous information. Silos of isolated information should be avoided by loading the data into a coherent data repository. In contrast to classical ETL processes as in data warehouse systems, the transformation step is skipped and data is loaded in its original structure to avoid upfront integration efforts and to make all source data available for later data analysis tasks. The transformation is done at a later phase in which the target application is more clear and a more powerful data processing framework (e.g., Hadoop) is available. Although the idea of a data lake seems to be an attractive solution to make Big Data integration more efficient, the original problems of data integration are not resolved. Data is still very heterogeneous in its structure, semantics, and quality. In order to avoid that the data lake turns into a data swamp, we propose a metadata-driven and quality-oriented approach for data lake management. Components for automatic metadata extraction and enrichment, semantic annotations, and quality monitoring are key elements of our architecture. The talk will give an overview of the state-of-the-art and the state-of-practice in data lakes and point out the challenges for future research.