Mitigating linked data quality issues in knowledge-intense information extraction methods
- 19 June 2017
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM) in Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics
Abstract
Advances in research areas such as named entity linking and sentiment analysis have triggered the emergence of knowledge-intensive information extraction methods that combine classical information extraction with background knowledge from the Web. Despite data quality concerns, linked data sources such as DBpedia, GeoNames and Wikidata which encode facts in a standardized structured format are particularly attractive for such applications. This paper addresses the problem of data quality by introducing a framework that elaborates on linked data quality issues relevant to different stages of the background knowledge acquisition process, their impact on information extraction performance and applicable mitigation strategies. Applying this framework to named entity linking and data enrichment demonstrates the potential of the introduced mitigation strategies to lessen the impact of different kinds of data quality problems. An industrial use case that aims at the automatic generation of image metadata from image descriptions illustrates the successful deployment of knowledge-intensive information extraction in real-world applications and constraints introduced by data quality concerns.Keywords
This publication has 24 references indexed in Scilit:
- Social‐media‐based public policy informatics: Sentiment and network analyses of U.S. Immigration and border securityJournal of the Association for Information Science and Technology, 2015
- Consolidating Heterogeneous Enterprise Data for Named Entity Linking and Web IntelligenceInternational Journal on Artificial Intelligence Tools, 2015
- Analyzing Linked Data Quality with LiQuateLecture Notes in Computer Science, 2014
- The effect of news and public mood on stock movementsInformation Sciences, 2014
- Jumping NLP Curves: A Review of Natural Language Processing Research [Review Article]IEEE Computational Intelligence Magazine, 2014
- Improving the Quality of Linked Data Using Statistical DistributionsInternational Journal on Semantic Web and Information Systems, 2014
- Learning multilingual named entity recognition from WikipediaArtificial Intelligence, 2013
- From names to entities using thematic context distancePublished by Association for Computing Machinery (ACM) ,2011
- Entity disambiguation with hierarchical topic modelsPublished by Association for Computing Machinery (ACM) ,2011
- Fast unfolding of communities in large networksJournal of Statistical Mechanics: Theory and Experiment, 2008