Data fusion
Top Cited Papers
Open Access
- 15 January 2009
- journal article
- research article
- Published by Association for Computing Machinery (ACM) in ACM Computing Surveys
- Vol. 41 (1), 1-41
- https://doi.org/10.1145/1456650.1456651
Abstract
The development of the Internet in recent years has made it possible and useful to access many different information systems anywhere in the world to obtain information. While there is much research on the integration of heterogeneous information systems, most commercial systems stop short of the actual integration of available data. Data fusion is the process of fusing multiple records representing the same real-world object into a single, consistent, and clean representation. This article places data fusion into the greater context of data integration, precisely defines the goals of data fusion, namely, complete, concise, and consistent data, and highlights the challenges of data fusion, namely, uncertain and conflicting data values. We give an overview and classification of different ways of fusing data and present several techniques based on standard and advanced operators of the relational algebra and SQL. Finally, the article features a comprehensive survey of data integration systems from academia and industry, showing if and how data fusion is performed in each.Keywords
Funding Information
- Deutsche Forschungsgemeinschaft (NA 432)
This publication has 59 references indexed in Scilit:
- Multiplex, Fusionplex and AutoplexACM SIGMOD Record, 2004
- Completeness of integrated information sourcesInformation Systems, 2004
- The DaQuinCIS architecture: a platform for exchanging and improving data quality in cooperative information systemsInformation Systems, 2004
- Efficient similarity-based operations for data integrationData & Knowledge Engineering, 2004
- An overview and classification of mediated query systemsACM SIGMOD Record, 1999
- Scaling access to heterogeneous data sources with DISCOIEEE Transactions on Knowledge and Data Engineering, 1998
- LoreACM SIGMOD Record, 1997
- Answering heterogeneous database queries with degrees of uncertaintyDistributed and Parallel Databases, 1993
- Resolving database incompatibility: an approach to performing relational operations over mismatched domainsIEEE Transactions on Knowledge and Data Engineering, 1989
- The functional data model and the data languages DAPLEXACM Transactions on Database Systems, 1981