Emergent linguistic structure in artificial neural networks trained by self-supervision

Top Cited Papers

Open Access

3 June 2020

journal article
research article
Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences of the United States of America

Vol. 117 (48), 30046-30054
https://doi.org/10.1073/pnas.1907367117

Abstract

This paper explores the knowledge of linguistic structure learned by large artificial neural networks, trained via self-supervision, whereby the model simply tries to predict a masked word in a given context. Human language communication is via sequences of words, but language understanding requires constructing rich hierarchical structures that are never observed explicitly. The mechanisms for this have been a prime mystery of human language acquisition, while engineering work has mainly proceeded by supervised learning on treebanks of sentences hand labeled for this latent structure. However, we demonstrate that modern deep contextual language models learn major aspects of this structure, without any explicit supervision. We develop methods for identifying linguistic hierarchical structure emergent in artificial neural networks and demonstrate that components in these models focus on syntactic grammatical relationships and anaphoric coreference. Indeed, we show that a linear transformation of learned embeddings in these models captures parse tree distances to a surprising degree, allowing approximate reconstruction of the sentence tree structures normally assumed by linguists. These results help explain why these models have brought such large improvements across many language-understanding tasks.

Keywords

Funding Information

Tencent Corp. (gift)
Google (fellowship)

This publication has 21 references indexed in Scilit:

A Fast and Accurate Dependency Parser using Neural Networks
Published by Association for Computational Linguistics (ACL) ,2014
5 Grammatical Illusions and Selective Fallibility in Real-Time Language Comprehension
Published by Brill ,2011
Poverty of the Stimulus Revisited
Cognitive Science, 2011
Rethinking language: How probabilities shape the words we use
Proceedings of the National Academy of Sciences of the United States of America, 2011
Learning Deep Architectures for AI
Foundations and Trends® in Machine Learning, 2009
Early language acquisition: cracking the speech code
Nature Reviews Neuroscience, 2004
Head-Driven Statistical Models for Natural Language Parsing
Computational Linguistics, 2003
A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge.
Psychological Review, 1997
A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge.
Psychological Review, 1997
Broken agreement
Cognitive Psychology, 1991

Cited by 98 articles