Cédric | Heterogeneous Data Management Revisited: from Mediators to Modern Polystores

Heterogeneous data is created at increasing volume and pace. Heterogeneity is often a result of multiple systems (and data models) being adopted and in use within one or several organizations, and producing data that needs to be processed together. The lack of a common abstract model, common syntactic model, of a unified schema and possibly of a single query language to exploit this heterogeneous data makes it hard on one hand, to conceive and model applications based on such heterogeneous data, and on the other hand, to efficiently execute the data management tasks they incur (typically, queries).

Our tutorial will provide a structured overview of the problem of heterogeneous data management and of the models and architectures used to handle such settings along the years. These range from mediator systems, to graph-based data integration (in particular facilitated by the advent of interoperable semantic data formats, such as RDF), to modern polystore systems, capable of handling very different kinds of processing on top of heterogeneous data management systems. We will outline the mechanisms for (a) building a conceptual model for a heterogeneous data management application, (b) interconnecting the data sources and the integration schema, and (c) translating queries asked to the heterogeneous data management system, into queries to be solved by the individual sources.

The lab session will be illustrated on systems issued of our research as well as off-the-shelf system from the Apache consortium.

Speaker

Ioana Manolescu is a senior researcher at Inria Saclay and a part-time professor at Ecole Polytechnique, France.She is the lead of the CEDAR INRIA team focusing on rich data analytics at cloud scale. She is also the scientific director of LabIA, a program ran by the French government whereas AI problems raised by branches of the local and national French public administration are tackled by French research teams.

She is a member of the PVLDB Endowment Board of Trustees, and has been an Associate Editor for PVLDB, a president of the ACM SIGMOD PhD Award Committee, and a chair of the IEEE ICDE conference. She has co-authored more than 150 articles in international journals and
conferences and contributed to several books, in particular « Web Data Management » with S. Abiteboul, P. Rigaux, M.-C. Rousset and P. Senellart.

Her main research interests algebraic and storage optimizations for semistructured data, in particular Semantic Web graphs, novel data models and languages for complex data management, data models and algorithms for fact-checking, and distributed architectures for complex large data. She is also a recipient of the ANR AI Chair of research and teaching in Artificial Intelligence « SourcesSay: Intelligent Analysis and Interconnexion of Heterogeneous Data in Digital Arenas » (2020-2023).

François Goasdoué is a full professor of computer science at Univ. Rennes 1. He leads the SHAMAN team of the IRISA lab, which focuses on AI for databases. He is also the head of the Lannion branch of the IRISA lab.

His research work relies on automated reasoning, databases and knowledge representation, in particular for data management, integration and summarization under deductive constraints.

Maxime Buron is a postdoc at Inria Sophia Antopolis in the GraphIK team. He will be an associate professor at the University Clermont Auvergne starting in September 2022.
He completed his PhD supervised by François Goasdoué, Ioana Monalescu and Marie-Laure Mugnier at Inria Saclay in 2020, then he worked one year with Michael Benedikt at the University of Oxford as an research associate.

His research work relies on automated reasoning, databases and knowledge representation in particular in the context of Ontology-Based Data Access.