Noise Pollution: A Multi-Step Approach to Assessing the Consequences of (Not) Validating Search Terms on Automated Content Analyses
Open Access
- 23 September 2022
- journal article
- research article
- Published by Taylor & Francis Ltd in Digital Journalism
- Vol. 11 (2), 298-320
- https://doi.org/10.1080/21670811.2022.2114920
Abstract
Advances in analytical methodologies and an avalanche of digitized data have opened new avenues for (digital) journalism research—and with it, new challenges. One of these challenges concerns the sampling and evaluation of data using (non-validated) search terms in combination with automated content analyses. This challenge has largely been neglected by research, which is surprising, considering that noise slipping in during the process of data collection can generate great methodological concerns. To address this gap, we first offer a systematic interdisciplinary literature review, revealing that the validation of search terms is far from acknowledged as a required standard procedure, both in and beyond journalism research. Second, we assess the consequences of validating search terms, using a multi-step approach and investigating common research topics from the field of (digital) journalism research. Our findings show that careless application of non-validated search terms has its pitfalls: while scattershot search terms can make sense in initial data exploration, final inferences based on insufficiently validated search terms are at higher risk of being obscured by noise. Consequently, we provide a step-by-step recommendation for developing and validating search terms.Keywords
Funding Information
- Deutsche Forschungsgemeinschaft
- Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung
This publication has 47 references indexed in Scilit:
- Assessing the Reporting of Reliability in Published Content Analyses: 1985–2010Communication Methods and Measures, 2014
- Public microblogging on climate change: One year of Twitter worldwideGlobal Environmental Change, 2014
- Reliabilitätstests in InhaltsanalysenPublizistik, 2012
- CRITICAL QUESTIONS FOR BIG DATAInformation, Communication & Society, 2012
- Compounds, creativity and complexity in climate change communication: The case of ‘carbon indulgences’Global Environmental Change, 2009
- Validation of Database Search Terms for Content Analysis: The Case of Cancer News CoverageJournalism & Mass Communication Quarterly, 2006
- Class Noise vs. Attribute Noise: A Quantitative StudyArtificial Intelligence Review, 2004
- The Secret Life of PronounsPsychological Science, 2003
- 10.1162/jmlr.2003.3.4-5.993Applied Physics Letters, 2000
- Generating and evaluating domain-oriented multi-word terms from textsInformation Processing & Management, 1993