Sybil: Methods and Software for Multiple Genome Comparison and Visualization

1 January 2007

book chapter
research article
Published by Springer Science and Business Media LLC in Methods in molecular biology (Clifton, N.J.)

Vol. 408, 93-108
https://doi.org/10.1007/978-1-59745-547-3_6

Abstract

With the successful completion of genome sequencing projects for a variety of model organisms, the selection of candidate organisms for future sequencing efforts has been guided increasingly by a desire to enable comparative genomics. This trend has both depended on and encouraged the development of software tools that can elucidate and capitalize on the similarities and differences between genomes. “Sybil,” one such tool, is a primarily web-based software package whose primary goal is to facilitate the analysis and visualization of comparative genome data, with a particular emphasis on protein and gene cluster data. Herein, a two-phase protein clustering algorithm, used to generate protein clusters suitable for analysis through Sybil and a method for creating graphical displays of protein or gene clusters that span multiple genomes are described. When combined, these two relatively simple techniques provide the user of the Sybil software (The Institute for Genomic Research [TIGR] Bioinformatics Department) with a browsable graphical display of his or her “input” genomes, showing which genes are conserved based on the parameters supplied to the protein clustering algorithm. For any given protein cluster the graphical display consists of a local alignment of the genomes in which the clustered genes are located. The genomes are arranged in a vertical stack, as in a multiple alignment, and shaded areas are used to connect genes in the same cluster, thus displaying conservation at the protein level in the context of the underlying genomic sequences. The authors have found this display—and slight variants thereof—useful for a variety of annotation and comparison tasks, ranging from identifying “missed” gene models or single-exon discrepancies between orthologous genes, to finding large or small regions of conserved gene synteny, and investigating the properties of the breakpoints between such regions.

This publication has 21 references indexed in Scilit:

Comparative Genomics of Emerging Human Ehrlichiosis Agents
PLoS Genetics, 2006
The Genome Sequence of Trypanosoma cruzi , Etiologic Agent of Chagas Disease
Science, 2005
SynBrowse: a synteny browser for comparative sequence analysis
Bioinformatics, 2005
OrthoMCL: Identification of Ortholog Groups for Eukaryotic Genomes
Genome Research, 2003
The Bioperl Toolkit: Perl Modules for the Life Sciences
Genome Research, 2002
The Human Genome Browser at UCSC
Genome Research, 2002
An efficient algorithm for large-scale detection of protein families
Nucleic Acids Research, 2002
Automatic clustering of orthologs and in-paralogs from pairwise species comparisons
Journal of Molecular Biology, 2001
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice
Nucleic Acids Research, 1994
Basic local alignment search tool
Journal of Molecular Biology, 1990

Cited by 83 articles