Towards geospatial semantic search: exploiting latent semantic relations in geospatial data

Abstract
This paper reports our efforts to address the grand challenge of the Digital Earth vision in terms of intelligent data discovery from vast quantities of geo-referenced data. We propose an algorithm combining LSA and a Two-Tier Ranking (LSATTR) algorithm based on revised cosine similarity to build a more efficient search engine – Semantic Indexing and Ranking (SIR) – for a semantic-enabled, more effective data discovery. In addition to its ability to handle subject-based search, we propose a mechanism to combine geospatial taxonomy and Yahoo! GeoPlanet for automatic identification of location information from a spatial query and automatic filtering of datasets that are not spatially related. The metadata set, in the format of ISO19115, from NASA's SEDAC (Socio-Economic Data Application Center) is used as the corpus of SIR. Results show that our semantic search engine SIR built on LSATTR methods outperforms existing keyword-matching techniques, such as Lucene, in terms of both recall and precision. Moreover, the semantic associations among all existing words in the corpus are discovered. These associations provide substantial support for automating the population of spatial ontologies. We expect this work to support the operationalization of the Digital Earth vision by advancing the semantic-based geospatial data discovery.