From names to entities using thematic context distance
- 24 October 2011
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM)
- p. 857-866
- https://doi.org/10.1145/2063576.2063700
Abstract
Name ambiguity arises from the polysemy of names and causes uncertainty about the true identity of entities referenced in unstructured text. This is a major problem in areas like information retrieval or knowledge management, for example when searching for a specific entity or updating an existing knowledge base. We approach this problem of named entity disambiguation (NED) using thematic information derived from Latent Dirichlet Allocation (LDA) to compare the entity mention's context with candidate entities in Wikipedia represented by their respective articles. We evaluate various distances over topic distributions in a supervised classification setting to find the best suited candidate entity, which is either covered in Wikipedia or unknown. We compare our approach to a state of the art method and show that it achieves significantly better results in predictive performance, regarding both entities covered in Wikipedia as well as uncovered entities. We show that our approach is in general language independent as we obtain equally good results for named entity disambiguation using the English, the German and the French Wikipedia.Keywords
This publication has 13 references indexed in Scilit:
- More influence means less workPublished by Association for Computing Machinery (ACM) ,2011
- YAGO: A Large Ontology from Wikipedia and WordNetJournal of Web Semantics, 2008
- Ontology-Driven Automatic Entity Disambiguation in Unstructured TextLecture Notes in Computer Science, 2006
- SemTag and seekerPublished by Association for Computing Machinery (ACM) ,2003
- Optimizing search engines using clickthrough dataPublished by Association for Computing Machinery (ACM) ,2002
- Learning to Classify Text Using Support Vector MachinesPublished by Springer Science and Business Media LLC ,2002
- An Evaluation of Statistical Approaches to Text CategorizationInformation Retrieval Journal, 1999
- Entity-based cross-document coreferencing using the Vector Space ModelPublished by Association for Computational Linguistics (ACL) ,1998
- Divergence measures based on the Shannon entropyIEEE Transactions on Information Theory, 1991
- On Information and SufficiencyThe Annals of Mathematical Statistics, 1951