Nguyen, L.-T., Schmidt, H. A., Von Haeseler, A. PLoS ONE 5, e10434 (2010). The fact that these estimates lie between the rates for MERS-CoV and HCoV-OC43 is consistent with the intermediate sampling time range of about 18years (Fig. Intragenomic rearrangements involving 5-untranslated region segments in SARS-CoV-2, other betacoronaviruses, and alphacoronaviruses, Crystal structure of the CoV-Y domain of SARS-CoV-2 nonstructural protein 3, Association of underlying comorbidities and progression of COVID-19 infection amongst 2586 patients hospitalised in the National Capital Region of India: a retrospective cohort study, Molecular characterization of horse nettle virus A, a new member of subgroup B of the genus Nepovirus, Molecular phylogeny of coronaviruses and host receptors among domestic and close-contact animals reveals subgenome-level conservation, crossover, and divergence. SARS-CoV-2 itself is not a recombinant of any sarbecoviruses detected to date, and its receptor-binding motif, important for specificity to human ACE2 receptors, appears to be an ancestral trait shared with bat viruses and not one acquired recently via recombination. Hu, B. et al. Annu Rev. RegionsAC had similar phylogenetic relationships among the southern China bat viruses (Yunnan, Guangxi and Guizhou provinces), the Hong Kong viruses, northern Chinese viruses (Jilin, Shanxi, Hebei and Henan provinces, including Shaanxi), pangolin viruses and the SARS-CoV-2 lineage. Because coronaviruses are known to be highly recombinant, we used three different approaches to identify non-recombinant regions for use in our Bayesian time-calibrated phylogenetic inference. A new coronavirus associated with human respiratory disease in China. Mol. Using the most conservative approach (NRR1), the divergence time estimate for SARS-CoV-2 and RaTG13 is 1969 (95% HPD: 19302000), while that between SARS-CoV and its most closely related bat sequence is 1962 (95% HPD: 19321988); see Fig. There is a 90% DNA match between SARS CoV 2 and a coronavirus in pangolins. Over relatively shallow timescales, such differences can primarily be explained by varying selective pressure, with mildly deleterious variants being eliminated more strongly by purifying selection over longer timescales44,45,46. Green boxplots show the TMRCA estimate for the RaTG13/SARS-CoV-2 lineage and its most closely related pangolin lineage (Guangdong 2019). Are pangolins the intermediate host of the 2019 novel coronavirus (SARS-CoV-2)? We thank all authors who have kindly deposited and shared genome data on GISAID. Effect of closure of live poultry markets on poultry-to-person transmission of avian influenza A H7N9 virus: an ecological study. All authors contributed to analyses and interpretations. & Muhire, B. RDP4: Detection and analysis of recombination patterns in virus genomes. 4). The latter was reconstructed using IQTREE66 v.2.0 under a general time-reversible (GTR) model with a discrete gamma distribution to model inter-site rate variation. Published. Background & objectives: Several phylogenetic classification systems have been devised to trace the viral lineages of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). J. Med. Su, S. et al. The Artic Network receives funding from the Wellcome Trust through project no. This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. Virus Evol. 16, e1008421 (2020). Lam, T. T. et al. We compare both MERS-CoV- and HCoV-OC43-centred prior distributions (Extended Data Fig. Intraspecies diversity of SARS-like coronaviruses in Rhinolophus sinicus and its implications for the origin of SARS coronaviruses in humans. Python 379 102 pangoLEARN Public Store of the trained model for pangolin to access. We focused on these three non-recombining regions/alignments for divergence time estimation; this avoids inappropriate modelling of evolutionary processes with recombination on strictly bifurcating trees, which can result in different artefacts such as homoplasies that inflate branch lengths and lead to apparently longer evolutionary divergence times. We say that this approach is conservative because sequences and subregions generating recombination signals have been removed, and BFRs were concatenated only when no PI signals could be detected between them. To gauge the length of time this lineage has circulated in bats, we estimate the time to the most recent common ancestor (TMRCA) of SARS-CoV-2 and RaTG13. Since experts have suggested that pangolins may be the reservoir species for COVID-19, the scaly anteater has been catapulted into headlines, news reports, and conversationsand some are calling COVID-19 "the revenge of the . R. Soc. and JavaScript. Pangolin was developed to implement the dynamic nomenclature of SARS-CoV-2 lineages, known as the Pango nomenclature. Viral metagenomics revealed Sendai virus and coronavirus infection of Malayan pangolins (Manis javanica). Isolation of SARS-CoV-2-related coronavirus from Malayan pangolins. J. Infect. Gray inset shows majority rule consensus trees with mean posterior branch lengths for the two regions, with posterior probabilities on the key nodes showing the relationships among SARS-CoV-2, RaTG13, and Pangolin 2019. GitHub - cov-lineages/pangolin: Software package for assigning SARS-CoV-2 genome sequences to global lineages. Boni, M. F., Posada, D. & Feldman, M. W. An exact nonparametric method for inferring mosaic structure in sequence triplets. # File containing the ID of the samples, the Sequence of the haplotype, the Continent, the country, the Region, the Data, the Lineage of Pangolin and Nextstrain clade, and the haplotype number # In this order # Could be obtained from the database 874850). Combining regions A, B and C and removing the five named sequences gives us putative NRR1, as an alignment of 63sequences. In such cases, even moderate rate variation among long, deep phylogenetic branches will substantially impact expected root-to-tip divergences over a sampling time range that represents only a small fraction of the evolutionary history40. We demonstrate that the sarbecoviruses circulating in horseshoe bats have complex recombination histories as reported by others15,20,21,22,23,24,25,26. Concatenated region ABC is NRR1. B 281, 20140732 (2014). 1a-c ), has the third-highest number of confirmed COVID-19 cases in the state of So. A tag already exists with the provided branch name. Chernomor, O. et al. In case of DRAGEN COVID Lineage tool, the minimum accepted alignment score was set to 22 and results with scores <22 were discarded. Now, the two researchers used genomic sequencing to compare the DNA of the new coronavirus in humans with that in animals and found a 99% match with pangolins. the development of viral diversity. Anyone you share the following link with will be able to read this content: Sorry, a shareable link is not currently available for this article. Lu, R. et al. 26 March 2020. J. Virol. is funded by The National Natural Science Foundation of China Excellent Young Scientists Fund (Hong Kong and Macau; no. CAS Google Scholar. Two other bat viruses (CoVZXC21 and CoVZC45) from Zhejiang Province fall on this lineage as recombinants of the RaTG13/SARS-CoV-2 lineage and the clade of Hong Kong bat viruses sampled between 2005 and 2007 (Fig. Share . 2, vew007 (2016). Zhang, Y.-Z. Biazzo et al. All three approaches to removal of recombinant genomic segments point to a single ancestral lineage for SARS-CoV-2 and RaTG13. and D.L.R. Schierup, M. H. & Hein, J. Recombination and the molecular clock. The sizes of the black internal node circles are proportional to the posterior node support. Curr. Five example sequences with incongruent phylogenetic positions in the two trees are indicated by dashed lines. 3 Priors and posteriors for evolutionary rate of SARS-CoV-2. Biol. 2). In addition, sequences NC_014470 (Bulgaria 2008), CoVZXC21, CoVZC45 and DQ412042 (Hubei-Yichang) needed to be removed to maintain a clean non-recombinant signal in A. Wan, Y., Shang, J., Graham, R., Baric, R. & Li, F. Receptor recognition by the novel Coronavirus from Wuhan: an analysis based on decade-long structural studies of SARS coronavirus. S. China corresponds to Guangxi, Yunnan, Guizhou and Guangdong provinces. Sign up for the Nature Briefing newsletter what matters in science, free to your inbox daily. J. Virol. stand-alone pangolin work flows or Illumina DRAGEN COVID Lineage App (v3.5.5) following the default parameters. 31922087). Natl Acad. Holmes, E. C. The Evolution and Emergence of RNA Viruses (Oxford Univ. Next, we (1) collected all breakpoints into a single set, (2) complemented this set to generate a set of non-breakpoints, (3) grouped non-breakpoints into contiguous BFRs and (4) sorted these regions by length. Abstract. One geographic clade includes viruses from provinces in southern China (Guangxi, Yunnan, Guizhou and Guangdong), with its major sister clade consisting of viruses from provinces in northern China (Shanxi, Henan, Hebei and Jilin) as well as Hubei Province in central China and Shaanxi Province in northwestern China. Article It is available as a command line tool and a web application. Early detection via genomics was not possible during Southeast Asias initial outbreaks of avian influenza H5N1 (1997 and 20032004) or the first SARS outbreak (20022003). Pangolin-CoV is 91.02% and 90.55% identical to SARS-CoV-2 and BatCoV RaTG13, respectively, at the whole-genome level. Nature 538, 193200 (2016). We used an uncorrelated relaxed clock model with log-normal distribution for all datasets, except for the low-diversity SARS data for which we specified a strict molecular clock model. Means and 95% HPD intervals are 0.080 [0.0580.101] and 0.530 [0.3040.780] for the patristic distances between SARS-CoV-2 and RaTG13 (green) and 0.143 [0.1090.180] and 0.154 [0.0930.231] for the patristic distances between SARS-CoV-2 and Pangolin 2019 (orange). Lond. As illustrated by the dashed arrows, these two posteriors motivate our specification of prior distributions with standard deviations inflated 10-fold (light color). Bruen, T. C., Philippe, H. & Bryant, D. A simple and robust statistical test for detecting the presence of recombination. When the first genome sequence of SARS-CoV-2, Wuhan-Hu-1, was released on 10January 2020 (GMT) on Virological.org by a consortium led by Zhang6, it enabled immediate analyses of its ancestry. Correspondence to SARS-CoV-2 is an appropriate name for the new coronavirus. M.F.B. Nature 583, 286289 (2020). 6, e14 (2017). We infer time-measured evolutionary histories using a Bayesian phylogenetic approach while incorporating rate priors based on mean MERS-CoV and HCoV-OC43 rates and with standard deviations that allow for more uncertainty than the empirical estimates for both viruses (see Methods). 2 Lack of root-to-tip temporal signal in SARS-CoV-2. Complete genome sequence data were downloaded from GenBank and ViPR; accession numbers of all 68sequences are available in Supplementary Table 4. The shaded region corresponds to the Sprotein. Evolutionary rate estimation can be profoundly affected by the presence of recombination50. Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia. The construction of NRR1 is the most conservative as it is least likely to contain any remaining recombination signals. Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic. This long divergence period suggests there are unsampled virus lineages circulating in horseshoe bats that have zoonotic potential due to the ancestral position of the human-adapted contact residues in the SARS-CoV-2 RBD. The 2009 influenza pandemic and subsequent outbreaks of MERS-CoV (2012), H7N9 avian influenza (2013), Ebola virus (2014) and Zika virus (2015) were met with rapid sequencing and genomic characterization. Biol. Using these breakpoints, the longest putative non-recombining segment (nt1,88521,753) is 9.9kb long, and we call this region NRR2. Publishers note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. ISSN 2058-5276 (online). "This is an extremely interesting . Results and discussion Genomic surveillance has been a hallmark of the COVID-19 pandemic that, in contrast to other pandemics, achieves tracking of the virus evolution and spread worldwide almost in real-time ( 4 ). However, inconsistency in the nomenclature limits uniformity in its epidemiological understanding. Eden, J.-S., Tanaka, M. M., Boni, M. F., Rawlinson, W. D. & White, P. A. Recombination within the pandemic norovirus GII.4 lineage. Scientists defined the pangolin lineage of this variant to be B.1.1.523 and it was originally recognized as a variant under monitoring on July 14, 2021. All sequence data analysed in this manuscript are available at https://github.com/plemey/SARSCoV2origins. https://doi.org/10.1093/molbev/msaa163 (2020). This study provides an integration of existing classifications and describes evolutionary trends of the SARS-CoV . Sliding window analysis of changes in the patterns of sequence similarity between human SARS-CoV-2, and pangolin and bat coronaviruses as described further in Fig. The divergence time estimates for SARS-CoV-2 and SARS-CoV from their respective most closely related bat lineages are reasonably consistent among the three approaches we use to eliminate the effects of recombination in the alignment. Suchard, M. A. et al. Kosakovsky Pond, S. L., Posada, D., Gravenor, M. B., Woelk, C. H. & Frost, S. D. W. Automated phylogenetic detection of recombination using a genetic algorithm. Sequences are colour-coded by province according to the map. https://doi.org/10.1038/s41564-020-0771-4, DOI: https://doi.org/10.1038/s41564-020-0771-4. Nat Microbiol 5, 14081417 (2020). Individual sequences such as RpShaanxi2011, Guangxi GX2013 and two sequences from Zhejiang Province (CoVZXC21/CoVZC45), as previously shown22,25, have strong phylogenetic recombination signals because they fall on different evolutionary lineages (with bootstrap support >80%) depending on what region of the genome is being examined. The virus then. It is clear from our analysis that viruses closely related to SARS-CoV-2 have been circulating in horseshoe bats for many decades. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Scientists trying to trace the ancestry of SARS-CoV-2, the virus responsible for COVID-19, have found the pangolin is unlikely to be the source of the virus responsible for the current pandemic. Sibling lineages to RaTG13/SARS-CoV-2 include a pangolin sequence sampled in Guangdong Province in March 2019 and a clade of pangolin sequences from Guangxi Province sampled in 2017. Current sampling of pangolins does not implicate them as an intermediate host. M.F.B., P.L. From this perspective, it may be useful to perform surveillance for more closely related viruses to SARS-CoV-2 along the gradient from Yunnan to Hubei. from the European Research Council under the European Unions Horizon 2020 research and innovation programme (grant agreement no. Posterior distributions were approximated through Markov chain Monte Carlo sampling, which were run sufficiently long to ensure effective sampling sizes >100. 4, vey016 (2018). 6, eabb9153 (2020). Liu, P. et al. Developed by the Centre for Genomic Pathogen Surveillance. In the absence of a strong temporal signal, we sought to identify a suitable prior rate distribution to calibrate the time-measured trees by examining several coronaviruses sampled over time, including HCoV-OC43, MERS-CoV, and SARS-CoV virus genomes. Katoh, K., Asimenos, G. & Toh, H. in Bioinformatics for DNA Sequence Analysis (ed. We thank A. Chan and A. Irving for helpful comments on the manuscript. PubMed Region A has been shortened to A (5,017nt) based on potential recombination signals within the region. 36, 7597 (2002). EPI_ISL_410538, EPI_ISL_410539, EPI_ISL_410540, EPI_ISL_410541 and EPI_ISL_410542) for the use of sequence data via the GISAID platform. A pneumonia outbreak associated with a new coronavirus of probable bat origin. We thank originating laboratories at South China Agricultural University (Y. Shen, L. Xiao and W. Chen; no. SARS-CoV-2 genetic lineages in the United States are routinely monitored through epidemiological investigations, virus genetic sequence-based surveillance, and laboratory studies. Genet. Bioinformatics 22, 26882690 (2006). An initial genomic sequence analysis found that the reemergence of COVID-19 in New Zealand was caused by a SARS-CoV-2 from the (now ancestral) lineage B.1.1.1 of the pangolin nomenclature ( 17 ). Download a free copy. Mol. Dis. Its genome is closest to that of severe acute respiratory syndrome-related coronaviruses from horseshoe bats, and its receptor-binding domain is closest to that of pangolin viruses. 26, 450452 (2020). The Sichuan (SC2018) virus appears to be a recombinant of northern/central and southern viruses, while the two Zhejiang viruses (CoVZXC21 and CoVZC45) appear to carry a recombinant region from southern or central China. According to GISAID . performed recombination analysis for non-recombining regions1 and 2, breakpoint analysis and phylogenetic inference on recombinant segments. Mol. Preprint at https://doi.org/10.1101/2020.02.10.942748 (2020). However, formal testing using marginal likelihood estimation41 does provide some evidence of a temporal signal, albeit with limited log Bayes factor support of 3 (NRR1), 10 (NRR2) and 3 (NRA3); see Supplementary Table 1. 36)gives a putative recombination-free alignment that we call non-recombinant alignment3 (NRA3) (see Methods). Get the most important science stories of the day, free in your inbox. Root-to-tip divergence as a function of sampling time for non-recombinant regions NRR1 and NRR2 and recombination-masked alignment set NRA3. Press, H.) 3964 (Springer, 2009). The species Severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2. First, we took an approach that relies on identification of mosaic regions (via 3SEQ14 v.1.7) that are also supported by PI signals19. The web application was developed by the Centre for Genomic Pathogen Surveillance. We aimed to analyze 3 naso-oropharyngeal swab samples collected between August and December 2021 to describe the amino acid changes present in the sequence reads that may have a role in the emergence of new . Our third approach involved identifying breakpoints and masking minor recombinant regions (with gaps, which are treated as unobserved characters in probabilistic phylogenetic approaches). The assumption of long-term purifying selection would imply that coronaviruses are in endemic equilibrium with their natural host species, horseshoe bats, to which they are presumably well adapted. Nature 583, 282285 (2020). In December 2019, a cluster of pneumonia cases epidemiologically linked to an open-air live animal market in the city of Wuhan (Hubei Province), China1,2 led local health officials to issue an epidemiological alert to the Chinese Center for Disease Control and Prevention and the World Health Organizations (WHO) China Country Office. Virus Evol. Here, we analyse the evolutionary history of SARS-CoV-2 using available genomic data on sarbecoviruses. Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic, https://doi.org/10.1038/s41564-020-0771-4. RegionsB and C span nt3,6259,150 and 9,26111,795, respectively. Using a third consensus-based approach for identifying recombinant regions in individual sequenceswith six different recombination detection methods in RDP5 (ref. Anderson, K. G. nCoV-2019 codon usage and reservoir (not snakes v2). Another similarity between SARS-CoV and SARS-CoV-2 is their divergence time (4070years ago) from currently known extant bat virus lineages (Fig. Alexandre Hassanin, Vuong Tan Tu, Gabor Csorba, Nicola F. Mller, Kathryn E. Kistler & Trevor Bedford, Jack M. Crook, Ivana Murphy, Diana Bell, Simon Pollett, Matthew A. Conte, Irina Maljkovic Berry, Yatish Turakhia, Bryan Thornlow, Russell Corbett-Detig, Nature Microbiology Proc. Forni, D., Cagliani, R., Clerici, M. & Sironi, M. Molecular evolution of human coronavirus genomes. Posterior rate distributions for MERS-CoV (far left) and HCoV-OC43 (far right) using BEAST on n=27 sequences spread over 4 years (MERS-CoV) and n=27 sequences spread over 49 years (HCoV-OC43). He, B. et al. Boni, M. F., Zhou, Y., Taubenberger, J. K. & Holmes, E. C. Homologous recombination is very rare or absent in human influenza A virus. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Subsequently a bat sarbecovirusRaTG13, sampled from a Rhinolophus affinis horseshoe bat in 2013 in Yunnan Provincewas reported that clusters with SARS-CoV-2 in almost all genomic regions with approximately 96% genome sequence identity2. P.L. This dataset comprises an updated version of that used in Hon et al.15 and includes a cluster of genomes sampled in late 2003 and early 2004, but the evolutionary rate estimate without this cluster (0.00175 substitutions per siteyr1 (0.00117,0.00229)) is consistent with the complete dataset (0.00169 substitutions per siteyr1, (0.00131,0.00205)). Time-measured phylogenetic reconstruction was performed using a Bayesian approach implemented in BEAST42 v.1.10.4.