Characterization, Distribution, and Expression of Novel Genes among Eight Clinical Isolates of Streptococcuspneumoniae

Abstract
Eight low-passage-number Streptococcus pneumoniae clinical isolates, each of a different serotype and a different multilocus sequence type, were obtained from pediatric participants in a pneumococcal vaccine trial. Comparative genomic analyses were performed with these strains and two S. pneumoniae reference strains. Individual genomic libraries were constructed for each of the eight clinical isolates, with an average insert size of ∼1 kb. A total of 73,728 clones were picked for arraying, providing more than four times genomic coverage per strain. A subset of 4,793 clones were sequenced, for which homology searches revealed that 750 (15.6%) of the sequences were unique with respect to the TIGR4 reference genome and 263 (5.5%) clones were unrelated to any available streptococcal sequence. Hypothetical translations of the open reading frames identified within these novel sequences showed homologies to a variety of proteins, including bacterial virulence factors not previously identified in S. pneumoniae . The distribution and expression patterns of 58 of these novel sequences among the eight clinical isolates were analyzed by PCR- and reverse transcriptase PCR-based analyses, respectively. These unique sequences were nonuniformly distributed among the eight isolates, and transcription of these genes in planktonic cultures was detected in 81% (172/212) of their genic occurrences. All 58 novel sequences were transcribed in one or more of the clinical strains, suggesting that they all correspond to functional genes. Sixty-five percent (38/58) of these sequences were found in 50% or less of the clinical strains, indicating a significant degree of genomic plasticity among natural isolates.