Automated Annotations for AI Data and Model Transparency

11 December 2021

journal article
research article
Published by Association for Computing Machinery (ACM) in Journal of Data and Information Quality

Vol. 14 (1), 1-9
https://doi.org/10.1145/3460000

Abstract

The data and Artificial Intelligence revolution has had a massive impact on enterprises, governments, and society alike. It is fueled by two key factors. First, data have become increasingly abundant and are often available openly. Enterprises have more data than they can process. Governments are spearheading open data initiatives by setting up data portals such as data.gov and releasing large amounts of data to the public. Second, AI engineering development is becoming increasingly democratized. Open source frameworks have enabled even an individual developer to engineer sophisticated AI systems. But with such ease of use comes the potential for irresponsible use of data. Ensuring that AI systems adhere to a set of ethical principles is one of the major problems of our age. We believe that data and model transparency has a key role to play in mitigating the deleterious effects of AI systems. In this article, we describe a framework to synthesize ideas from various domains such as data transparency, data quality, data governance among others to tackle this problem. Specifically, we advocate an approach based on automated annotations (of both data and the AI model), which has a number of appealing properties. The annotations could be used by enterprises to get visibility of potential issues, prepare data transparency reports, create and ensure policy compliance, and evaluate the readiness of data for diverse downstream AI applications. We propose a model architecture and enumerate its key components that could achieve these requirements. Finally, we describe a number of interesting challenges and opportunities.

Keywords

This publication has 6 references indexed in Scilit:

Datasheets for datasets
Communications of the ACM, 2021
MithraLabel
Published by Association for Computing Machinery (ACM) ,2019
Redefining Data Transparency: A Multidimensional Approach
Computer, 2019
Model Cards for Model Reporting
Published by Association for Computing Machinery (ACM) ,2019
A Demo of the Data Civilizer System
Published by Association for Computing Machinery (ACM) ,2017
Data Complexity in Pattern Recognition
Published by Springer Science and Business Media LLC ,2006