Database tomography for information retrieval

Abstract
Database tomography is an information extraction and analysis system which operates on textual databases. Its primary use to date has been to identify pervasive technical thrusts and themes, and the interrelationships among these themes and sub-themes, which are intrinsic to large textual databases. Its two main algorithmic components are multiword phrase frequency analysis and phrase proximity analysis. This paper shows how database tomography can be used to enhance information retrieval from large textual databases through the newly developed process of simulated nucleation. The principles of simulated nucleation are presented, and the advantages for information retrieval are delineated. An application is described of developing, from Science Citation Index and Engineering Compendex, a database of journal articles focused on near-Earth space science and technology.

This publication has 18 references indexed in Scilit: