World‐wide tracking of major SARS‐CoV‐2 genome haplotypes in sequences of June 1 to November 15, 2020 and discovery of rapid expansion of a new haplotype

Abstract
Earlier, 13 haplotype groups defined by SARS‐CoV‐2 genome sequence variations were identified in 2790 sequences available in March 2020. Also, 23403A>G that causes p.Asp614Gly in the spike protein and is one of the defining variations of the haplotype group H1, was becoming increasingly prevalent. As a follow‐up, 74922 SARS‐CoV‐2 sequences retrieved from individuals infected in June 1 through November 15 were analyzed. Consistent with the reports on 23403A>G, H1 haplotype frequency increased world‐wide; among August to November sequences, only 0.3% were associated with non‐H1 haplotypes. This finding prompted assessment of H1 sub‐haplotypes among the sequences of the later stage of the COVID‐19 pandemic. The distribution of the sub‐haplotypes differed in different regions, but 98.4% of the sequences were associated with five H1 sub‐haplotypes. One of these had not been previously observed and had emerged in Europe by June 2020. The most important finding of the present study is identification of this new sub‐haplotype (H1r) and finding evidences that suggest it may have a high potential for expansion. Its frequency had reached 10%‐90% in various countries/territories of Europe by the end of September. The new sub‐haplotype is defined by seven sequence variations, one of which causes Ala222Val in the spike protein.
Funding Information
  • National Institute for Genetic Engineering and Biotechnology (7559171)