On the Anonymization of Workflow Provenance without Compromising the Transparency of Lineage
- 23 December 2021
- journal article
- research article
- Published by Association for Computing Machinery (ACM) in Journal of Data and Information Quality
- Vol. 14 (1), 1-27
- https://doi.org/10.1145/3460207
Abstract
Workflows have been adopted in several scientific fields as a tool for the specification and execution of scientific experiments. In addition to automating the execution of experiments, workflow systems often include capabilities to record provenance information, which contains, among other things, data records used and generated by the workflow as a whole but also by its component modules. It is widely recognized that provenance information can be useful for the interpretation, verification, and re-use of workflow results, justifying its sharing and publication among scientists. However, workflow execution in some branches of science can manipulate sensitive datasets that contain information about individuals. To address this problem, we investigate, in this article, the problem of anonymizing the provenance of workflows. In doing so, we consider a popular class of workflows in which component modules use and generate collections of data records as a result of their invocation, as opposed to a single data record. The solution we propose offers guarantees of confidentiality without compromising lineage information, which provides transparency as to the relationships between the data records used and generated by the workflow modules. We provide algorithmic solutions that show how the provenance of a single module and an entire workflow can be anonymized and present the results of experiments that we conducted for their evaluation.Keywords
This publication has 17 references indexed in Scilit:
- Unified authentication factors and fuzzy service access using interaction provenanceComputers & Security, 2017
- Fuzzy Authentication Using Interaction Provenance in Service Oriented ComputingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloudNucleic Acids Research, 2013
- A formal semantics for the Taverna 2 workflow modelJournal of Computer and System Sciences, 2010
- Multirelational k-AnonymityIEEE Transactions on Knowledge and Data Engineering, 2008
- Privacy-preserving anonymization of set-valued dataProceedings of the VLDB Endowment, 2008
- Special Issue: The First Provenance ChallengeConcurrency and Computation: Practice and Experience, 2007
- Differential PrivacyLecture Notes in Computer Science, 2006
- On the complexity of optimal K-anonymityPublished by Association for Computing Machinery (ACM) ,2004
- Generalizing data to provide anonymity when disclosing information (abstract)Published by Association for Computing Machinery (ACM) ,1998