HostSeq : A Canadian Whole Genome Sequencing and Clinical Data Resource
Preprint
- 10 May 2022
- preprint
- research article
- Published by Cold Spring Harbor Laboratory
Abstract
HostSeq was launched in April 2020 as a national initiative to integrate whole genome sequencing data from 10,000 Canadians infected with SARS-CoV-2 with clinical information related to their disease experience. The mandate of HostSeq is to support the Canadian and international research communities in their efforts to understand the risk factors for disease and associated health outcomes and support the development of interventions such as vaccines and therapeutics. HostSeq is a collaboration among 13 independent epidemiological studies of SARS-CoV-2 across five provinces in Canada. Aggregated data collected by HostSeq are made available to the public through two data portals: a phenotype portal showing summaries of major variables and their distributions, and a variant search portal enabling queries in a genomic region. Individual-level data is available to the global research community for health research through a Data Access Agreement and Data Access Compliance Office approval. Here we provide an overview of the collective project design along with summary level information for HostSeq. We highlight several statistical considerations for researchers using the HostSeq platform regarding data aggregation, sampling mechanism, covariate adjustment, and X chromosome analysis. In addition to serving as a rich data source, the diversity of study designs, sample sizes, and research objectives among the participating studies provides unique opportunities for the research community.Keywords
This publication has 50 references indexed in Scilit:
- Regulatory T Cells Inhibit T Cell Proliferation and Decrease Demyelination in Mice Chronically Infected with a CoronavirusThe Journal of Immunology, 2010
- The Canadian Longitudinal Study on Aging (CLSA)Canadian Journal on Aging / La Revue canadienne du vieillissement, 2009
- Genome‐wide association scans for secondary traits using case‐control samplesGenetic Epidemiology, 2009
- Proper analysis of secondary phenotype data in case‐control association studiesGenetic Epidemiology, 2008
- PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage AnalysesAmerican Journal of Human Genetics, 2007
- Efficient Association Mapping of Quantitative Trait Loci with Selective GenotypingAmerican Journal of Human Genetics, 2007
- Exploiting Gene-Environment Interaction to Detect Genetic AssociationsHuman Heredity, 2007
- Association of Human‐Leukocyte‐Antigen Class I (B*0703) and Class II (DRB1*0301) Genotypes with Susceptibility and Resistance to the Development of Severe Acute Respiratory SyndromeThe Journal of Infectious Diseases, 2004
- Semiparametric Methods for Response-Selective and Missing Data Problems in RegressionJournal of the Royal Statistical Society Series B: Statistical Methodology, 1999
- Biased estimates of treatment effect in randomized experiments with nonlinear regressions and omitted covariatesBiometrika, 1984