Learning to Rerank Schema Matches
- 27 December 2019
- journal article
- research article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Knowledge and Data Engineering
- Vol. 33 (8), 3104-3116
- https://doi.org/10.1109/tkde.2019.2962124
Abstract
Schema matching is at the heart of integrating structured and semi-structured data with applications in data warehousing, data analysis recommendations, Web table matching, etc. Schema matching is known as an uncertain process and a common method to overcome this uncertainty introduces a human expert with a ranked list of possible schema matches to choose from, known as top-K matching. In this work we propose a learning algorithm that utilizes an innovative set of features to rerank a list of schema matches and improves upon the ranking of the best match. We provide a bound on the size of an initial match list, tying the number of matches with a desired level of confidence in finding the best match. We also propose the use of matching predictors as features in a learning task, and tailored nine new matching predictors for this purpose. The proposed algorithm assists the matching process by introducing a quality set of alternative matches to a human expert. It also serves as a step towards eliminating the involvement of human experts as decision makers in a matching process altogether. A large scale empirical evaluation with real-world benchmark shows the effectiveness of the proposed algorithmic solution.Keywords
Funding Information
- JPMorgan Chase and Company
This publication has 35 references indexed in Scilit:
- Data integration with uncertaintyThe VLDB Journal, 2008
- Learning to Rank for Information RetrievalFoundations and Trends® in Information Retrieval, 2007
- Automatic complex schema matching across Web query interfacesACM Transactions on Database Systems, 2006
- Flexible pseudo-relevance feedback via selective samplingACM Transactions on Asian Language Information Processing, 2005
- A framework for modeling and evaluating automatic semantic reconciliationThe VLDB Journal, 2005
- A scaled difference chi-square test statistic for moment structure analysisPsychometrika, 2001
- An $O(EV\log V)$ Algorithm for Finding a Maximal Weighted Matching in General GraphsSIAM Journal on Computing, 1986
- A rationale and test for the number of factors in factor analysisPsychometrika, 1965
- Multiple Comparisons among MeansJournal of the American Statistical Association, 1961
- A New Measure of Rank CorrelationBiometrika, 1938