Discovering molecular functions significantly related to phenotypes by combining gene expression data and biological information

Abstract
Motivation: The analysis of genome-scale data from different high throughput techniques can be used to obtain lists of genes ordered according to their different behaviours under distinct experimental conditions corresponding to different phenotypes (e.g. differential gene expression between diseased samples and controls, different response to a drug, etc.). The order in which the genes appear in the list is a consequence of the biological roles that the genes play within the cell, which account, at molecular scale, for the macroscopic differences observed between the phenotypes studied. Typically, two steps are followed for understanding the biological processes that differentiate phenotypes at molecular level: first, genes with significant differential expression are selected on the basis of their experimental values and subsequently, the functional properties of these genes are analysed. Instead, we present a simple procedure which combines experimental measurements with available biological information in a way that genes are simultaneously tested in groups related by common functional properties. The method proposed constitutes a very sensitive tool for selecting genes with significant differential behaviour in the experimental conditions tested. Results: We propose the use of a method to scan ordered lists of genes. The method allows the understanding of the biological processes operating at molecular level behind the macroscopic experiment from which the list was generated. This procedure can be useful in situations where it is not possible to obtain statistically significant differences based on the experimental measurements (e.g. low prevalence diseases, etc.). Two examples demonstrate its application in two microarray experiments and the type of information that can be extracted. Availability: The software used for the association of significant Gene Ontology (GO) terms to sets of genes is available at http://www.fatigo.org and http://www.babelomics.org. Software for ranking genes according to phenotypes is available in GEPAS (http://www.gepas.org). The multtest program from the bioconductor package is available at http://www.bioconductor.org/repository/devel/package/html/multtest.html. Contact:jdopazo@ochoa.fib.es