Noise Pollution: A Multi-Step Approach to Assessing the Consequences of (Not) Validating Search Terms on Automated Content Analyses

Open Access

23 September 2022

journal article
research article
Published by Taylor & Francis Ltd in Digital Journalism

Vol. 11 (2), 298-320
https://doi.org/10.1080/21670811.2022.2114920

Abstract

Advances in analytical methodologies and an avalanche of digitized data have opened new avenues for (digital) journalism research—and with it, new challenges. One of these challenges concerns the sampling and evaluation of data using (non-validated) search terms in combination with automated content analyses. This challenge has largely been neglected by research, which is surprising, considering that noise slipping in during the process of data collection can generate great methodological concerns. To address this gap, we first offer a systematic interdisciplinary literature review, revealing that the validation of search terms is far from acknowledged as a required standard procedure, both in and beyond journalism research. Second, we assess the consequences of validating search terms, using a multi-step approach and investigating common research topics from the field of (digital) journalism research. Our findings show that careless application of non-validated search terms has its pitfalls: while scattershot search terms can make sense in initial data exploration, final inferences based on insufficiently validated search terms are at higher risk of being obscured by noise. Consequently, we provide a step-by-step recommendation for developing and validating search terms.

Keywords

Funding Information

Deutsche Forschungsgemeinschaft
Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung

This publication has 47 references indexed in Scilit:

Assessing the Reporting of Reliability in Published Content Analyses: 1985–2010
Communication Methods and Measures, 2014
Public microblogging on climate change: One year of Twitter worldwide
Global Environmental Change, 2014
Reliabilitätstests in Inhaltsanalysen
Publizistik, 2012
CRITICAL QUESTIONS FOR BIG DATA
Information, Communication & Society, 2012
Compounds, creativity and complexity in climate change communication: The case of ‘carbon indulgences’
Global Environmental Change, 2009
Validation of Database Search Terms for Content Analysis: The Case of Cancer News Coverage
Journalism & Mass Communication Quarterly, 2006
Class Noise vs. Attribute Noise: A Quantitative Study
Artificial Intelligence Review, 2004
The Secret Life of Pronouns
Psychological Science, 2003
10.1162/jmlr.2003.3.4-5.993
Applied Physics Letters, 2000
Generating and evaluating domain-oriented multi-word terms from texts
Information Processing & Management, 1993

Cited by 6 articles