Keynote Lecture
Mona Diab, Carnegie Mellon University, United States
Scalable Vector Analytics: A Story of Twists and Turns
Themis Palpanas, University Paris Cite, France
Keynote Lecture
Mona Diab
Carnegie Mellon University
United States
Brief Bio
Mona Diab is the new Director of the LTI. She is also the director of the R3LIT lab at CMU. Prior to joining CMU, she was a Lead Responsible AI Research Scientist with Meta. She was also a full Professor of Computer Science at the George Washington University (on leave) where she directed the CARE4Lang NLP Lab. Before joining Meta, she led the Lex Conversational AI project within Amazon AWS AI. Her current focus is on Responsible AI and how to operationalize it for NLP technologies and AI in general. Her interests span building robust technologies for low resource scenarios with a special interest in Arabic technologies, (mis) information propagation, computational socio-pragmatics, computational psycholinguistics, NLG evaluation metrics, Language modeling and resource creation. Mona has served the community in several capacities: Elected President of SIGLEX and SIGSemitic, and she currently serves as the President for ACL SIGDAT, the board supporting EMNLP conferences. She has delivered tutorials and organized numerous workshops and panels around Arabic processing, Responsible NLP, Code Switching, etc. She is a cofounder of CADIM (Consortium on Arabic Dialect Modeling, previously known as Columbia University Arabic Dialects Modeling Group), in 2005, which served as a world renowned reference point on Arabic Language Technologies. Mona was just awarded the King Salman Global Arabic Academy Prize for Arabic Language Technologies for 2023. Moreover she helped establish two research trends in NLP, namely computational approaches to Code Switching and Semantic Textual Similarity. She is also a founding member of the *SEM conference, one of the top tier conferences in NLP. Mona has published more than 250 peer reviewed articles.
Scalable Vector Analytics: A Story of Twists and Turns
Themis Palpanas
University Paris Cite
France
Brief Bio
Themis Palpanas is an elected Senior Member of the French University
Institute (IUF), a distinction that recognizes excellence across all
academic disciplines, and Distinguished Professor of computer science
at the University Paris Cite (France), where he is director of the Data
Intelligence Institute of Paris (diiP), and director of the data
management group, diNo. He received the BS degree from the National
Technical University of Athens, Greece, and the MSc and PhD degrees from
the University of Toronto, Canada. He has previously held positions at
the University of California at Riverside, University of Trento, and at
IBM T.J. Watson Research Center, and visited Microsoft Research, and
the IBM Almaden Research Center. His interests include problems related
to data science (big data analytics and machine learning applications).
He is the author of 14 patents. He is the recipient of 3 Best Paper
awards, and the IBM Shared University Research (SUR) Award. His service
includes the VLDB Endowment Board of Trustees (2018-2023),
Editor-in-Chief for PVLDB Journal (2024-2025) and BDR Journal (2016-
2021), PC Chair for IEEE BigData 2023 and ICDE 2023 Industry and
Applications Track, General Chair for VLDB 2013, Associate Editor for
the TKDE Journal (2014-2020), and Research PC Vice Chair for ICDE 2020.
Abstract
Similarity search in high-dimensional data spaces was a relevant and
challenging data management problem in the early 1970s, when the first
solutions to this problem were proposed. Today, fifty years later, we
can safely say that the exact same problem is more relevant (from Time
Series Management Systems to Vector Databases) and challenging than
ever. Very large amounts of high-dimensional data are now omnipresent
(ranging from traditional multidimensional data to time series and deep
embeddings), and the performance requirements (i.e., response-time and
accuracy) of a variety of applications that need to process and analyze
these data have become very stringent and demanding. In these past fifty
years, high-dimensional similarity search has been studied in its many
flavors. Similarity search algorithms for exact and approximate, one-off
and progressive query answering. Approximate algorithms with and without
(deterministic or probabilistic) quality guarantees. Solutions for
on-disk and in-memory data, static and streaming data. Approaches based
on multidimensional space-partitioning and metric trees, random
projections and locality-sensitive hashing (LSH), product quantization
(PQ) and inverted files, k-nearest neighbor graphs and optimized linear
scans. Surprisingly, the work on data-series (or time-series) similarity
search has recently been shown to achieve the state-of-the-art
performance for several variations of the problem, on both time-series
and general high-dimensional vector data. In this talk, we will touch
upon the different aspects of this interesting story, present some of
the state-of-the-art solutions, and discuss open research directions.