Rapid construction of metabolic models for a family of Cyanobacteria using a multiple source annotation workflow

Abstract
Cyanobacteria are photoautotrophic prokaryotes that exhibit robust growth under diverse environmental conditions with minimal nutritional requirements. They can use solar energy to convert CO2 and other reduced carbon sources into biofuels and chemical products. The genus Cyanothece includes unicellular nitrogen-fixing cyanobacteria that have been shown to offer high levels of hydrogen production and nitrogen fixation. The reconstruction of quality genome-scale metabolic models for organisms with limited annotation resources remains a challenging task. Here we reconstruct and subsequently analyze and compare the metabolism of five Cyanothece strains, namely Cyanothece sp. PCC 7424, 7425, 7822, 8801 and 8802, as the genome-scale metabolic reconstructions iCyc792, iCyn731, iCyj826, iCyp752, and iCyh755 respectively. We compare these phylogenetically related Cyanothece strains to assess their bio-production potential. A systematic workflow is introduced for integrating and prioritizing annotation information from the Universal Protein Resource (Uniprot), NCBI Protein Clusters, and the Rapid Annotations using Subsystems Technology (RAST) method. The genome-scale metabolic models include fully traced photosynthesis reactions and respiratory chains, as well as balanced reactions and GPR associations. Metabolic differences between the organisms are highlighted such as the non-fermentative pathway for alcohol production found in only Cyanothece 7424, 8801, and 8802. Our development workflow provides a path for constructing models using information from curated models of related organisms and reviewed gene annotations. This effort lays the foundation for the expedient construction of curated metabolic models for organisms that, while not being the target of comprehensive research, have a sequenced genome and are related to an organism with a curated metabolic model. Organism-specific models, such as the five presented in this paper, can be used to identify optimal genetic manipulations for targeted metabolite overproduction as well as to investigate the biology of diverse organisms.