Semi-supervised truth discovery
- 28 March 2011
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM)
- p. 217-226
- https://doi.org/10.1145/1963405.1963439
Abstract
Accessing online information from various data sources has become a necessary part of our everyday life. Unfortunately such information is not always trustworthy, as different sources are of very different qualities and often provide inaccurate and conflicting information. Existing approaches attack this problem using unsupervised learning methods, and try to infer the confidence of the data value and trustworthiness of each source from each other by assuming values provided by more sources are more accurate. However, because false values can be widespread through copying among different sources and out-of-date data often overwhelm up-to-date data, such bootstrapping methods are often ineffective. In this paper we propose a semi-supervised approach that finds true values with the help of ground truth data. Such ground truth data, even in very small amount, can greatly help us identify trustworthy data sources. Unlike existing studies that only provide iterative algorithms, we derive the optimal solution to our problem and provide an iterative algorithm that converges to it. Experiments show our method achieves higher accuracy than existing approaches, and it can be applied on very huge data sets when implemented with MapReduce.Keywords
This publication has 7 references indexed in Scilit:
- Automatic extraction of clickable structured web contents for name entity queriesPublished by Association for Computing Machinery (ACM) ,2010
- Web-scale knowledge extraction from semi-structured tablesPublished by Association for Computing Machinery (ACM) ,2010
- Corroborating information from disagreeing viewsPublished by Association for Computing Machinery (ACM) ,2010
- Extracting data records from the web using tag path clusteringPublished by Association for Computing Machinery (ACM) ,2009
- Integrated graph-based semi-supervised multiple/single instance learning framework for image annotationPublished by Association for Computing Machinery (ACM) ,2008
- Truth discovery with multiple conflicting information providers on the webPublished by Association for Computing Machinery (ACM) ,2007
- DryadACM SIGOPS Operating Systems Review, 2007