Computer-Assisted Generation of a Protein-Interaction Database for Nuclear Receptors

Abstract
With the increasing amount of biological data available, automated methods for information retrieval become necessary. We employed computer-assisted text mining to retrieve all protein-protein interactions for nuclear receptors from MEDLINE in a systematic way. A dictionary of protein names and of terms denoting interactions was generated, and trioccurrences of two protein names and one interaction term in one sentence were retrieved. Abstracts containing at least one such trioccurrence were manually checked by biologists to select the relevant interactions out of the automatically extracted data. In total, 4360 abstracts were retrieved containing data on protein interactions for nuclear receptors. The resulting database contains all reported protein interactions involving nuclear receptors from 1966 to September 2001. Remarkably, the annual increase in number of reported interactors for nuclear receptors has been following an exponential growth curve in the years 1991 to 2001. Apparent in the data set is the high complexity of protein interactions for nuclear receptors. The number of interactions correlates with the number of published papers for a given receptor, suggesting that the number of reported interactors is a reflection of the intensity of research dedicated to a given receptor. Indeed, comparison of the retrieved data to a systematic yeast two-hybrid-based interaction analysis suggests that most NRs are similar with respect to the number of interacting proteins. The data set obtained serves as a source for information on NR interactions, as well as a reference data set for the improvement of advanced text-mining methods.