Composite Module Analyst: a fitness-based tool for identification of transcription factor binding site combinations

Abstract
Motivation: Functionally related genes involved in the same molecular-genetic, biochemical or physiological process are often regulated coordinately. Such regulation is provided by precisely organized binding of a multiplicity of special proteins [transcription factors (TFs)] to their target sites (cis-elements) in regulatory regions of genes. Cis-element combinations provide a structural basis for the generation of unique patterns of gene expression. Results: Here we present a new approach for defining promoter models based on the composition of TF binding sites and their pairs. We utilize a multicomponent fitness function for selection of the promoter model that fits best to the observed gene expression profile. We demonstrate examples of successful application of the fitness function with the help of a genetic algorithm for the analysis of functionally related or co-expressed genes as well as testing on simulated and permutated data. Availability: The CMA program is freely available for non-commercial users. URL Author Webpage. It is also a part of the commercial system ExPlain™ (Author Webpage) designed for causal analysis of gene expression data. Contact:alexander.kel@biobase-international.com