A survey of data provenance in e-science
- 1 September 2005
- journal article
- Published by Association for Computing Machinery (ACM) in ACM SIGMOD Record
- Vol. 34 (3), 31-36
- https://doi.org/10.1145/1084805.1084812
Abstract
Data management is growing in complexity as large-scale applications take advantage of the loosely coupled resources brought together by grid middleware and by abundant storage capacity. Metadata describing the data products used in and generated by these applications is essential to disambiguate the data and enable reuse. Data provenance, one kind of metadata, pertains to the derivation history of a data product starting from its original sources.In this paper we create a taxonomy of data provenance characteristics and apply it to current research efforts in e-science, focusing primarily on scientific workflow approaches. The main aspect of our taxonomy categorizes provenance systems based on why they record provenance, what they describe, how they represent and store provenance, and ways to disseminate it. The survey culminates with an identification of open research problems in the field.Keywords
This publication has 7 references indexed in Scilit:
- Lineage retrieval for scientific data processing: a surveyACM Computing Surveys, 2005
- Towards Dynamically Adaptive Weather Analysis and Forecasting in LEADLecture Notes in Computer Science, 2005
- Database management for life sciences researchACM SIGMOD Record, 2004
- Using Digital Library Techniques – Registration of Scientific Primary DataLecture Notes in Computer Science, 2004
- Chimera: a virtual data system for representing, querying, and automating data derivationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Lineage tracing for general data warehouse transformationsThe VLDB Journal, 2003
- Design of a Lineage-Based Meta-Data Base for GISCartography and Geographic Information Systems, 1991