On the Anonymization of Workflow Provenance without Compromising the Transparency of Lineage

23 December 2021

journal article
research article
Published by Association for Computing Machinery (ACM) in Journal of Data and Information Quality

Vol. 14 (1), 1-27
https://doi.org/10.1145/3460207

Abstract

Workflows have been adopted in several scientific fields as a tool for the specification and execution of scientific experiments. In addition to automating the execution of experiments, workflow systems often include capabilities to record provenance information, which contains, among other things, data records used and generated by the workflow as a whole but also by its component modules. It is widely recognized that provenance information can be useful for the interpretation, verification, and re-use of workflow results, justifying its sharing and publication among scientists. However, workflow execution in some branches of science can manipulate sensitive datasets that contain information about individuals. To address this problem, we investigate, in this article, the problem of anonymizing the provenance of workflows. In doing so, we consider a popular class of workflows in which component modules use and generate collections of data records as a result of their invocation, as opposed to a single data record. The solution we propose offers guarantees of confidentiality without compromising lineage information, which provides transparency as to the relationships between the data records used and generated by the workflow modules. We provide algorithmic solutions that show how the provenance of a single module and an entire workflow can be anonymized and present the results of experiments that we conducted for their evaluation.

Keywords

This publication has 17 references indexed in Scilit:

Unified authentication factors and fuzzy service access using interaction provenance
Computers & Security, 2017
Fuzzy Authentication Using Interaction Provenance in Service Oriented Computing
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud
Nucleic Acids Research, 2013
A formal semantics for the Taverna 2 workflow model
Journal of Computer and System Sciences, 2010
Multirelational k-Anonymity
IEEE Transactions on Knowledge and Data Engineering, 2008
Privacy-preserving anonymization of set-valued data
Proceedings of the VLDB Endowment, 2008
Special Issue: The First Provenance Challenge
Concurrency and Computation: Practice and Experience, 2007
Differential Privacy
Lecture Notes in Computer Science, 2006
On the complexity of optimal K-anonymity
Published by Association for Computing Machinery (ACM) ,2004
Generalizing data to provide anonymity when disclosing information (abstract)
Published by Association for Computing Machinery (ACM) ,1998