Robust Sequence Selection Method Used To Develop the FluChip Diagnostic Microarray for Influenza Virus

Abstract
DNA microarrays have proven to be powerful tools for gene expression analyses and are becoming increasingly attractive for diagnostic applications, e.g., for virus identification and subtyping. The selection of appropriate sequences for use on a microarray poses a challenge, particularly for highly mutable organisms such as influenza viruses, human immunodeficiency viruses, and hepatitis C viruses. The goal of this work was to develop an efficient method for mining large databases in order to identify regions of conservation in the influenza virus genome. From these regions of conservation, capture and label sequences capable of discriminating between different viral types and subtypes were selected. The salient features of the method were the use of phylogenetic trees for data reduction and the selection of a relatively small number of capture and label sequences capable of identifying a broad spectrum of influenza viruses. A detailed experimental evaluation of the selected sequences is described in a companion paper. The software is freely available under the General Public License at http://www.colorado.edu/chemistry/RGHP/software/ .