Statistically Significant Detection of Linguistic Change
- 18 May 2015
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM)
- p. 625-635
- https://doi.org/10.1145/2736277.2741627
Abstract
We propose a new computational approach for tracking and detecting statistically significant linguistic shifts in the meaning and usage of words. Such linguistic shifts are especially prevalent on the Internet, where the rapid exchange of ideas can quickly change a word's meaning. Our meta-analysis approach constructs property time series of word usage, and then uses statistically sound change point detection algorithms to identify significant linguistic shifts. We consider and analyze three approaches of increasing complexity to generate such linguistic property time series, the culmination of which uses distributional characteristics inferred from word co-occurrences. Using recently proposed deep neural language models, we first train vector representations of words for each time period. Second, we warp the vector spaces into one unified coordinate system. Finally, we construct a distance-based distributional time series for each word to track its linguistic displacement over time. We demonstrate that our approach is scalable by tracking linguistic change across years of micro-blogging using Twitter, a decade of product reviews using a corpus of movie reviews from Amazon, and a century of written books using the Google Book Ngrams. Our analysis reveals interesting patterns of language usage change commensurate with each medium.Keywords
Funding Information
- NSF (DBI-1355990, IIS-1017181)
- Google Faculty Research Award
- Renaissance Technologies Fellowship
- Institute for Computational Science at Stony Brook University
This publication has 19 references indexed in Scilit:
- A framework for analyzing semantic change of words across timePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2014
- Enhanced Search with Wildcards and Morphological Inflections in the Google Books Ngram ViewerPublished by Association for Computational Linguistics (ACL) ,2014
- Temporal Analysis of Language through Neural Language ModelsPublished by Association for Computational Linguistics (ACL) ,2014
- Representation Learning: A Review and New PerspectivesIEEE Transactions on Pattern Analysis and Machine Intelligence, 2013
- Predicting the Present with Google TrendsEconomic Record, 2012
- Internet LinguisticsPublished by Informa UK Limited ,2011
- Google Trends: A Web‐Based Tool for Real‐Time Surveillance of Disease OutbreaksClinical Infectious Diseases, 2009
- Neural Probabilistic Language ModelsPublished by Springer Science and Business Media LLC ,2006
- The Time Course of Language ChangeComputers and the Humanities, 2003
- Divergence measures based on the Shannon entropyIEEE Transactions on Information Theory, 1991