Heterogeneous data is created at increasing volume and pace. Heterogeneity is often a result of multiple systems (and data models) being adopted and in use within one or several organizations, and producing data that needs to be processed together. The lack of a common abstract model, common syntactic model, of a unified schema and possibly of a single query language to exploit this heterogeneous data makes it hard on one hand, to conceive and model applications based on such heterogeneous data, and on the other hand, to efficiently execute the data management tasks they incur (typically, queries).
Our tutorial will provide a structured overview of the problem of heterogeneous data management and of the models and architectures used to handle such settings along the years. These range from mediator systems, to graph-based data integration (in particular facilitated by the advent of interoperable semantic data formats, such as RDF), to modern polystore systems, capable of handling very different kinds of processing on top of heterogeneous data management systems. We will outline the mechanisms for (a) building a conceptual model for a heterogeneous data management application, (b) interconnecting the data sources and the integration schema, and (c) translating queries asked to the heterogeneous data management system, into queries to be solved by the individual sources.
The lab session will be illustrated on systems issued of our research as well as off-the-shelf system from the Apache consortium.
Ioana Manolescu is a senior researcher at Inria Saclay and a part-time
professor at Ecole Polytechnique, France.She is the lead of the CEDAR INRIA team focusing on rich data analytics at cloud scale. She is also the scientific director of LabIA, a program ran by the French government whereas AI problems raised by branches of the local and national French public administration are tackled by French research teams.
She is a member of the PVLDB Endowment Board of Trustees, and has been
an Associate Editor for PVLDB, a president of the ACM SIGMOD PhD
Award Committee, and a chair of the IEEE ICDE conference.
She has co-authored more than 150 articles in international journals and
conferences and contributed to several books, in particular « Web Data
Management » with S. Abiteboul, P. Rigaux, M.-C. Rousset
and P. Senellart.
Her main research interests algebraic and storage optimizations for
semistructured data, in particular Semantic Web graphs, novel data models
and languages for complex data management, data models and algorithms
for fact-checking, and distributed architectures for complex large
data. She is also a recipient of the ANR AI Chair of research and teaching in Artificial Intelligence « SourcesSay: Intelligent Analysis and Interconnexion of Heterogeneous Data in Digital Arenas » (2020-2023).