Banner
Home      Log In      Contacts      FAQs      INSTICC Portal
 
Documents

Keynote Lectures

Managing Big Multidimensional Data - A Journey From Acquisition to Prescriptive Analytics
Torben Bach Pedersen, Computer Science, Aalborg University, Denmark

On Information Propagation, Social Influence, and Communities
Francesco Bonchi, ISI Foundation, Italy

Data-Driven Genomic Computing: Making Sense of the Signals from the Genome
Stefano Ceri, Dipartimento di Elettronica e Informazione, Politecnico di Milano, Italy

Advances and Future Challenges in Machine Learning and Knowledge Extraction.
Andreas Holzinger, Medical Informatics, Medical University Graz, Austria

 

Managing Big Multidimensional Data - A Journey From Acquisition to Prescriptive Analytics

Torben Bach Pedersen
Computer Science, Aalborg University
Denmark
http://people.cs.aau.dk/~tbp/
 

Brief Bio
Torben Bach Pedersen is a Professor of Computer Science at Aalborg University, Denmark. His research interests include many aspects of Big Data analytics, with a focus on technologies for "Big Multidimensional Data" - the integration and analysis of large amounts of complex and highly dynamic multidimensional data in domains such as logistics (indoor/outdoor moving objects), smart grids (energy data management), transport (GPS data), and Linked Open Data. He is an ACM Distinguished Scientist, and a member of the Danish Academy of Technical Sciences, the SSTD Endowment, and the SSDBM Steering Committee. He has served as Area Editor for Information Systems and Springer EDBS, PC Chair for DaWaK, DOLAP, SSDBM, and DASFAA, and regularly serves on the PCs of the major database conferences.


Abstract
More and more data is being collected from a variety of new sources such as sensors, smart devices, social media, crowd-sourcing, and (Linked) Open Data. Such data is large, fast, and often complex. There is a universal wish perform multidimensional OLAP-style analytics on such data, i.e., to turn it into "Big Multidimensional Data". The keynote will look at challenges and solutions in managing Big Multidimensional data. This is a multi-stage journey from its initial acquisition, over cleansing and transformation, to (distributed) storage, indexing, and query processing, further on to building (predictive) models over it, and ultimately performing prescriptive analytics that couples analytics with optimization to suggest optimal actions. A number of case studies from advanced application domains such as Smart Energy, Smart Transport, and Smart Logistics will be used for illustration.



 

 

On Information Propagation, Social Influence, and Communities

Francesco Bonchi
ISI Foundation
Italy
http://www.francescobonchi.com/resume.html
 

Brief Bio
Francesco Bonchi is Research Leader at the ISI Foundation, Turin, Italy, where he leades the "Algorithmic Data Analytics" group. He is also (part-time) Principal Scientist for Data Mining at Eurecat (Technological Center of Catalunya), Barcelona. Before he was Director of Research at Yahoo Labs in Barcelona, Spain, where he was leading the Web Mining Research group. His recent research interests include mining query-logs, social networks, and social media, as well as the privacy issues related to mining these kinds of sensible data. In the past he has been interested in data mining query languages, constrained pattern mining, mining spatiotemporal and mobility data, and privacy preserving data mining. He is member of the ECML PKDD Steering Committee, Associate Editor of the newly created IEEE Transactions on Big Data (TBD), of the IEEE Transactions on Knowledge and Data Engineering (TKDE), the ACM Transactions on Intelligent Systems and Technology (TIST), Knowledge and Information Systems (KAIS), and member of the Editorial Board of Data Mining and Knowledge Discovery (DMKD). He has been program co-chair of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2010). Dr. Bonchi has also served as program co-chair of the 28th ACM Conference on Hypertext and Hypermedia (HT 2017), the 16th IEEE International Conference on Data Mining (ICDM 2016), the first and second ACM SIGKDD International Workshop on Privacy, Security, and Trust in KDD (PinKDD 2007 and 2008), the 1st IEEE International Workshop on Privacy Aspects of Data Mining (PADM 2006), and the 4th International Workshop on Knowledge Discovery in Inductive Databases (KDID 2005). He is co-editor of the book "Privacy-Aware Knowledge Discovery: Novel Applications and New Techniques" published by Chapman & Hall/CRC Press. He earned his Ph.D. in computer science from the University of Pisa in December 2003.


Abstract
With the success of online social networks and microblogging platforms such as Facebook, Tumblr, and Twitter, the phenomenon of influence-driven propagation, has recently attracted the interest of computer scientists, sociologists, information technologists, and marketing specialists. In this talk we will take a data mining perspective, discussing what (and how) can be learned from a social network and a database of traces of past propagation over the social network. Starting from one of the key problems in this area, i.e. the identification of influential users, we will provide a brief overview of our recent contributions in this area. We will expose the connection between the phenomenon of information propagation and the existence of communities in social network, and we will go deeper in this new research topic arising at the overlap of information propagation analysis and community detection.



 

 

Data-Driven Genomic Computing: Making Sense of the Signals from the Genome

Stefano Ceri
Dipartimento di Elettronica e Informazione, Politecnico di Milano
Italy
 

Brief Bio
Stefano Ceri is professor of Database Systems at the Dipartimento di Elettronica, Informazione e Bioingegneria (DEIB) of Politecnico di Milano; he was visiting professor at the Computer Science Department of Stanford University (1983-1990). His research work covers four decades (1976-2016) and has been generally concerned with extending database technologies in order to incorporate new features: distribution, object-orientation, rules, streaming data; with the advent of the Web, his research has been targeted towards the engineering of Web-based applications and to search systems. More recently he turned to crowd searching, to social media analytics, and to genomic computing.He is the recipient of two ERC Advanced Grants: "Search Computing (SeCo)" (2008-2013), focused upon the rank-aware integration of search engines in order to support multi-domain queries and “Data-Centered Genomic Comouting (GeCo)” (2016-2021), focused upon new abstractions for querying and integrating genomic datasets.  He is the recipient of the ACM-SIGMOD "Edward T. Codd Innovation Award" (New York, June 26, 2013), an ACM Fellow and a member of Academia Europaea.


Abstract
Genomic computing is a new science focused on understanding the functioning of the genome, as a premise to fundamental discoveries in biology and medicine. Next Generation Sequencing (NGS) allows the production of the entire human genome sequence at a cost of about 1000 US; many algorithms exist for the extraction of genome features, or "signals", including peaks (enriched regions), mutations, or gene expression (intensity of transcription activity). The missing gap is a system supporting data integration and exploration, giving a “biological meaning” to all the available information; such a system can be used, e.g., for better understanding cancer or how environment influences cancer development.

The GeCo Project (Data-Driven Genomic Computing, ERC Advanced Grant, 2016-2021) has the objective or revisiting genomic computing through the lens of basic data management, through models, languages, and instruments, focusing on genomic data integration. Starting from an abstract model, we developed a system that can be used to query processed data produced by several large Genomic Consortia, including Encode and TCGA; the system employs internally the Spark engine, and prototypes can already be accessed from Cineca or from PoliMi servers. During the five-years of the ERC project, the system will be enriched with data analysis tools and environments and will be made increasingly efficient. Among the objectives of the project, the creation of an “open source” repository of public data, available to biological and clinical research through queries, web services and search interfaces.



 

 

Advances and Future Challenges in Machine Learning and Knowledge Extraction.

Andreas Holzinger
Medical Informatics, Medical University Graz
Austria
http://www.hci4all.at
 

Brief Bio
Andreas Holzinger is lead of the Holzinger Group HCI–KDD, Institute for Medical Informatics & Statistics  at the Medical University Graz, and Associate Professor of Applied Computer Science at the Faculty of Computer Science and Biomedical Engineering at Graz University of Technology. Currently, Andreas is Visiting Professor for Machine Learning in Health Informatics at the Faculty of Informatics at Vienna University of Technology. He serves as consultant for the Canadian, US, UK, Swiss, French, Italian and Dutch governments, for the German Excellence Initiative, and as national expert in the European Commission. Andreas obtained a PhD in Cognitive Science from Graz University in 1998 and his Habilitation (second PhD) in Computer Science from Graz University of Technology in 2003. Andreas was Visiting Professor in Berlin, Innsbruck, London (twice), and Aachen. Andreas and his Group work on extracting knowledge from data and foster a synergistic combination of methodologies of two areas that offer ideal conditions towards unraveling problems with complex health data: Human-Computer Interaction (HCI) and Knowledge Discovery/Data Mining (KDD), with the central goal of supporting human intelligence with machine learning to discover novel, previously unknown insights into data. To stimulate crazy ideas at international level without boundaries, Andreas founded the international Expert Network HCI–KDD. Andreas is Associate Editor of Knowledge and Information Systems (KAIS), Associate Editor of Springer Brain Informatics (BRIN) and Section Editor of BMC Medical Informatics and Decision Making (MIDM). He is member of IFIP WG 12.9 Computational Intelligence, the ACM, IEEE, GI and the Austrian Computer Society. Home: http://hci-kdd.org
Publications see <link> https://scholar.google.com/citations?hl=en&user=BTBd5V4AAAAJ&view_op=list_works&sortby=pubdate


Abstract
Today the problem are heterogeneous, probabilistic, high-dimensional and complex data sets. The challenge is to learn from such data to extract and discover knowledge, and to help to make decisions under uncertainty. In automatic machine learning (aML) great advances have been made, for example, in speech recognition, recommender systems, or autonomous vehicles. Automatic approaches greatly benefit from "big data" with many training sets. However, sometimes we are confronted with a small amount and complex data sets, where aML suffers of insufficient training samples. The application of such aML approaches in complex application domains such as health informatics seems elusive in the near future, and a good example are Gaussian processes, where aML (e.g. standard kernel machines) struggle on function extrapolation problems which are trivial for human learners. In such situations, interactive Machine Learning (iML) can be beneficial where a human-in-the-loop helps in solving computationally hard problems, e.g., subspace clustering, protein folding, or k-anonymization of health data, where the knowledge and experience of human experts can help to reduce an exponential search space through heuristic selection of samples. Therefore, what would otherwise be an NP-hard problem reduces greatly in complexity through the input and the assistance of a human agent involved in the learning phase. ML is a fast growing and very practical field with many business applications and much open research challenges, particularly in multi-task learning, transfer learning and hybrid multi-agent systems with humans-in-the-loop. Consequently, successful ML needs a concerted effort, fostering integrative research between experts ranging from diverse disciplines from data science to visualization and tackling complex challenges needs both disciplinary excellence and a cross-disciplinary skill-set and international joint work without any boundaries.



footer