Tracking and Sketching Distributed Data Provenance
- 1 December 2010
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 190-197
- https://doi.org/10.1109/escience.2010.51
Abstract
Current provenance collection systems typically gather metadata on remote hosts and submit it to a central server. In contrast, several data-intensive scientific applications require a decentralized architecture in which each host maintains an authoritative local repository of the provenance metadata gathered on that host. The latter approach allows the system to handle the large amounts of metadata generated when auditing occurs at fine granularity, and allows users to retain control over their provenance records. The decentralized architecture, however, increases the complexity of auditing, tracking, and querying distributed provenance. We describe a system for capturing data provenance in distributed applications, and the use of provenance sketches to optimize subsequent data provenance queries. Experiments with data gathered from distributed workflow applications demonstrate the feasibility of a decentralized provenance management system and improvements in the efficiency of provenance queries.Keywords
This publication has 11 references indexed in Scilit:
- Techniques for efficiently querying scientific workflow provenance graphsPublished by Association for Computing Machinery (ACM) ,2010
- On the Efficiency of Provenance QueriesInternational Conference on Data Engineering, 2009
- Bonsai: Balanced Lineage AuthenticationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2007
- Connecting Scientific Data to Scientific Experiments with ProvenancePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2007
- Applying Provenance in Distributed Organ Transplant ManagementLecture Notes in Computer Science, 2006
- Issues in Automatic Provenance CollectionLecture Notes in Computer Science, 2006
- Provenance-Aware Sensor Data StoragePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Performance and scalability of a replica location servicePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2004
- Network Applications of Bloom Filters: A SurveyInternet Mathematics, 2004
- Chimera: a virtual data system for representing, querying, and automating data derivationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003