Statistically Significant Detection of Linguistic Change

18 May 2015

conference paper
conference paper
Published by Association for Computing Machinery (ACM)

p. 625-635
https://doi.org/10.1145/2736277.2741627

Abstract

We propose a new computational approach for tracking and detecting statistically significant linguistic shifts in the meaning and usage of words. Such linguistic shifts are especially prevalent on the Internet, where the rapid exchange of ideas can quickly change a word's meaning. Our meta-analysis approach constructs property time series of word usage, and then uses statistically sound change point detection algorithms to identify significant linguistic shifts. We consider and analyze three approaches of increasing complexity to generate such linguistic property time series, the culmination of which uses distributional characteristics inferred from word co-occurrences. Using recently proposed deep neural language models, we first train vector representations of words for each time period. Second, we warp the vector spaces into one unified coordinate system. Finally, we construct a distance-based distributional time series for each word to track its linguistic displacement over time. We demonstrate that our approach is scalable by tracking linguistic change across years of micro-blogging using Twitter, a decade of product reviews using a corpus of movie reviews from Amazon, and a century of written books using the Google Book Ngrams. Our analysis reveals interesting patterns of language usage change commensurate with each medium.

Keywords

Funding Information

NSF (DBI-1355990, IIS-1017181)
Google Faculty Research Award
Renaissance Technologies Fellowship
Institute for Computational Science at Stony Brook University

This publication has 19 references indexed in Scilit:

A framework for analyzing semantic change of words across time
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2014
Enhanced Search with Wildcards and Morphological Inflections in the Google Books Ngram Viewer
Published by Association for Computational Linguistics (ACL) ,2014
Temporal Analysis of Language through Neural Language Models
Published by Association for Computational Linguistics (ACL) ,2014
Representation Learning: A Review and New Perspectives
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013
Predicting the Present with Google Trends
Economic Record, 2012
Internet Linguistics
Published by Informa UK Limited ,2011
Google Trends: A Web‐Based Tool for Real‐Time Surveillance of Disease Outbreaks
Clinical Infectious Diseases, 2009
Neural Probabilistic Language Models
Published by Springer Science and Business Media LLC ,2006
The Time Course of Language Change
Computers and the Humanities, 2003
Divergence measures based on the Shannon entropy
IEEE Transactions on Information Theory, 1991

Cited by 153 articles