Evolution of Privacy Loss in Wikipedia
- 8 February 2016
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM) in Proceedings of the Ninth ACM International Conference on Interactive Tabletops and Surfaces - ITS '14
Abstract
The cumulative effect of collective online participation has an important and adverse impact on individual privacy. As an online system evolves over time, new digital traces of individual behavior may uncover previously hidden statistical links between an individual's past actions and her private traits. To quantify this effect, we analyze the evolution of individual privacy loss by studying the edit history of Wikipedia over 13 years, including more than 117,523 different users performing 188,805,088 edits. We trace each Wikipedia's contributor using apparently harmless features, such as the number of edits performed on predefined broad categories in a given time period (e.g. Mathematics, Culture or Nature). We show that even at this unspecific level of behavior description, it is possible to use off-the-shelf machine learning algorithms to uncover usually undisclosed personal traits, such as gender, religion or education. We provide empirical evidence that the prediction accuracy for almost all private traits consistently improves over time. Surprisingly, the prediction performance for users who stopped editing after a given time still improves. The activities performed by new users seem to have contributed more to this effect than additional activities from existing (but still active) users. Insights from this work should help users, system designers, and policy makers understand and make long-term design choices in online content creation systems.Keywords
Funding Information
- National ICT Australia
- Air Force Office of Scientific Research (FA2386-15-1-4018)
This publication has 21 references indexed in Scilit:
- The Network Effect of Privacy ChoicesACM SIGMETRICS Performance Evaluation Review, 2015
- Assessing data intrusion threats—ResponseScience, 2015
- Assessing data intrusion threatsScience, 2015
- A new privacy debateScience, 2015
- Unique in the shopping mall: On the reidentifiability of credit card metadataScience, 2015
- What Is Privacy Worth?The Journal of Legal Studies, 2013
- Unique in the Crowd: The privacy bounds of human mobilityScientific Reports, 2013
- Private traits and attributes are predictable from digital records of human behaviorProceedings of the National Academy of Sciences of the United States of America, 2013
- The Rise and Decline of an Open Collaboration SystemAmerican Behavioral Scientist, 2012
- Myths and fallacies of "Personally Identifiable Information"Communications of the ACM, 2010