Online Unsupervised Coreference Resolution for Semi-structured Heterogeneous Data
- 1 January 2012
- book chapter
- conference paper
- Published by Springer Science and Business Media LLC in Lecture Notes in Computer Science
Abstract
A pair of RDF instances are said to corefer when they are intended to denote the same thing in the world, for example, when two nodes of type foaf:Person describe the same individual. This problem is central to integrating and inter-linking semi-structured datasets. We are developing an online, unsupervised coreference resolution framework for heterogeneous, semi-structured data. The online aspect requires us to process new instances as they appear and not as a batch. The instances are heterogeneous in that they may contain terms from different ontologies whose alignments are not known in advance. Our framework encompasses a two-phased clustering algorithm that is both flexible and distributable, a probabilistic multidimensional attribute model that will support robust schema mappings, and a consolidation algorithm that will be used to perform instance consolidation in order to improve accuracy rates over time by addressing data spareness.Keywords
This publication has 4 references indexed in Scilit:
- Bootstrapping Object Coreferencing on the Semantic WebJournal of Computer Science and Technology, 2011
- Automatically Generating Data Linkages Using a Domain-Independent Candidate Selection ApproachLecture Notes in Computer Science, 2011
- Overcoming Schema Heterogeneity between Linked Semantic Repositories to Improve Coreference ResolutionLecture Notes in Computer Science, 2009
- Efficient clustering of high-dimensional data sets with application to reference matchingPublished by Association for Computing Machinery (ACM) ,2000