The structural genomics experimental pipeline: Insights from global target lists

7 May 2004

journal article
research article
Published by Wiley in Proteins-Structure Function and Bioinformatics

Vol. 56 (2), 201-210
https://doi.org/10.1002/prot.20060

Abstract

Structural genomics (SG) initiatives are currently attempting to achieve the high-throughput determination of protein structures on a genome-wide scale. Here we analyze the SG target data that have been publicly released over a period of 16 months to assess the potential of the SG initiatives. We use statistical techniques most commonly applied in epidemiology to describe the dynamics of targets through the experimental SG pipeline. There is no clear bottleneck among the key stages of cloning, expression, purification and crystallization. An SG target will progress through each of these steps with a probability of approximately 45%. Around 80% of targets with diffraction data will yield a crystal structure, and 20% of targets with HSQC spectra will yield an NMR structure. We also find the overlaps among SG targets: 61% of SG protein sequences share at least 30% sequence identity with one or more other SG targets. There is no significant difference in average structure quality among SG structures and other structures in the PDB determined by “traditional” methods, but on average SG structures are deposited to the PDB twice as quickly after X-ray data collection. Proteins 2004.

Keywords

This publication has 18 references indexed in Scilit:

The Protein Data Bank and structural genomics
Nucleic Acids Research, 2003
Coverage of protein sequence space by current structural genomics targets.
Journal of Structural and Functional Genomics, 2003
The Protein Data Bank
Nucleic Acids Research, 2000
Structural genomics: beyond the Human Genome Project
Nature Genetics, 1999
Structural genomics: keystone for a Human Proteome Project.
Nature Structural & Molecular Biology, 1999
Class‐directed structure determination: Foundation for a protein structure initiative
Protein Science, 1998
Shining a light on structural genomics
Nature Structural & Molecular Biology, 1998
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Research, 1997
Improved tools for biological sequence comparison.
Proceedings of the National Academy of Sciences of the United States of America, 1988
Statistics
Wiley Series in Probability and Statistics, 1977

Cited by 32 articles