Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2) belongs to the realm Riboviria, order Nidovirales, suborder Cornidovirineae, family Coronaviridae, subfamily Orthocoronavirinae, genus Betacoronavirus (lineage B), subgenus Sarbecovirus, and the species severe acute respiratory syndrome-related coronavirus. SARS-CoV-2 is a positive-sense, single stranded RNA virus whose genome size is ~29,903 bp.
1. Introduction
Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2) is a virus which surfaced in late 2019 in the city of Wuhan, China. The virus has spread across the globe quickly, leading to a pandemic situation causing acute respiratory disease known as “coronavirus disease 2019” (COVID-19)
[1]. As of 21 August 2022, the ongoing pandemic has registered more than 600 million confirmed cases and 6.4 million deaths reported by the WHO
[2]. The top five countries with highest numbers of cases reported are the United States of America (85,007,630), India (43,309,473), Brazil (31,611,769), France (29,164,805), and Germany (27,211,896). The top five countries with the highest number of reported deaths are the United States of America (1,002,946), Brazil (668,693), India (524,873), the Russian Federation (380,517), and Mexico (325,271) (
https://covid19.who.int/ (accessed on 21 June 2022).
Although mutations are a characteristic feature of all viruses, the rate of mutations in RNA viruses is much higher than that of DNA viruses
[3][4]. A study showed that the mutability magnitude of RNA viruses is five-fold compared to DNA viruses
[5]. Although SARS-CoV-2, has a low sequence diversity compared to many other RNA viruses, the genetic recombination due to 3′-to-5′ exoribonuclease (nsp14) activity has produced numerous variants of SARS-CoV-2
[6][7]. In the process of evolution, a few viruses may mutate, and new variants are expected to develop over time as variants of the original virus. These variants can vary among themselves by one or more mutations
[8]. A few newer variants may appear and disappear randomly and only a few may persist for a longer time
[9]. The natural selection decides the destiny of the new variants
[7][10]. Some mutations can negatively affect the viral replication, transmission or immunity escape, which consequently reduces the abundance of virus.
Mutations in the virus genome have an important influence on virulence, transmissibility, pathogenesis, diagnosing, treatment, as well as vaccine development
[11][12]. Since “S” protein of SARS-CoV-2 is one of the primary targets in vaccine design strategy, mutations in the S region can reduce the efficacy of vaccines against the virus. Hence, mutations generated in the emerging variants should be monitored in vaccinated and non-vaccinated positive cases. Several SARS-CoV-2 variants have already been reported and documented worldwide during the COVID-19 pandemic
[13]. To expand synchronization among Center for Disease Control and Prevention (CDC), National Institutes of Health (NIH), Biomedical Advanced Research Development Authority (BARDA), Food and Drug Administration (FDA) and Department of Defense (DoD), US Department of Health and Human Services (HHS) has established a SARS-CoV-2 Interagency Group (SIG). The main aim of this group is to rapidly characterize emerging variants and actively monitor their potential impact on critical SARS-CoV-2 counter-measures, including vaccines, therapeutics, and diagnostics
[14]. CDC, in association with SIG has established a classification system for SARS-CoV-2 variants based on the threat level they pose to the public health and classified variants into four major types. The first type is “Variants Being Monitored (VBM)”, and the second type is “Variants of Concern (VOC)”. Variants of Interest and Variants of High Consequence (VOHC) are the third and fourth types, respectively
[14]. The established nomenclature systems for naming and tracking SARS-CoV-2 lineages by GISAID, Nextstrain, and Pango are currently in use among scientific community
[15]. The Nextstrain nomenclature system is based on the diversity of SARS-CoV-2 patterns and label clades that can persist for at least several months; and have a significant geographic spread
[15]. The Pango lineage nomenclature is focused on the epidemiological event(s) such as the introduction of a virus into a distinct geographic area with evidence of onward spread
[16]. WHO along with group of scientists (WHO Virus Evolution Working Group, the WHO COVID-19 reference laboratory network), representatives (GISAID, Nextstrain, Pango) and experts (virological, microbial nomenclature) from several countries and agencies, developed an easy-to-pronounce and non-stigmatizing labels for variants to assist discussion among public
[15]. WHO expert group has recommended the use of Greek Alphabets, i.e., Alpha, Beta, Gamma, Delta (classified as previous VOC), Omicron (classified as current VOC) and Epsilon, Zeta, Eta, Theta, Lota, Kappa, Lambda and Mu (classified as previous VOI) which will be easier and more practical to be discussed by non-scientific community
[15]. According to GISAID, as of 16 May 2022 187 countries shared 3,648,731 Omicron genome sequences and 203 countries shared 4,425,174 Delta genome sequences [
https://www.gisaid.org/hcov19-variants/] (accessed on 16 May 2022).
The increase in the spread of VOCs has affected the efficacy of currently available vaccines. The phase III clinical trial results of the Oxford–AstraZeneca (ChAdOx1), Johnson & Johnson (Ad26.COV2.S), and Novartis (NVX-CoV2373) vaccines in South Africa have shown that the Vaccine Efficacy (VE) against the local Beta variant is decreased due to VOC
[17][18][19]. For example, the protective efficacy of the Oxford–AstraZeneca (ChAdOx1) vaccine against Beta variant (mild to moderate illness) is only 10%
[17]. Data from a gulf country showed that the Pfizer/BioNTech (BNT162b2) vaccine confers 87% and 72% protection against Alpha and Beta variants, respectively
[20]. In Israel, with the emergence of the Delta variant, the overall efficacy of vaccines against infection has reduced from 95% to 39%
[21]. In South Africa, the effectiveness of the BNT162b2 vaccine (two doses) against hospitalization for COVID-19 infection was significantly reduced from 93% in the non-Omicron period to 70% in Omicron period
[22]. This decrease in the effectiveness of vaccines against the emerging VOCs has not only raised the concerns on the efficiency of vaccines but also pointed the likelihood of reinfection. Even though approved COVID-19 vaccines are less effective against the current circulating VOCs, on a brighter side, they remain to be highly effective in averting severe illness and death
[23].
2. Structure and Genome Organization of SARS-CoV-2
SARS-CoV-2 virus was first reported
[24] on 31 December 2019
[25]. Taxonomically, SARS-CoV-2 belongs to the realm Riboviria, order Nidovirales, suborder Cornidovirineae, family Coronaviridae, subfamily Orthocoronavirinae, genus Betacoronavirus (lineage B), subgenus Sarbecovirus, and the species severe acute respiratory syndrome-related coronavirus
[26]. So far, there are seven different types of coronaviruses reported. Among these, four common human coronaviruses-229E, NL63, OC43, and HKU1—cause mild infections
[27]. However, individuals infected with either of the other three coronaviruses—severe acute respiratory syndrome coronavirus (SARS-CoV), Middle East Respiratory Syndrome Coronavirus (MERSCoV), and SARS-CoV-2—develop severe respiratory distress and viral pneumonia and may ultimately succumb to the disease
[28][29][30]. Jiang, Xiaowei, and Ruoqi Wang stated that although bats are the most probable reservoir animal for SARS-CoV-2
[31], zoonotic spillovers likely involving an intermediate animal and multiple transmissions from wildlife at a market in Wuhan probably led to SARS-CoV-2 emergence
[32].
SARS-CoV-2 (structure shown in
Figure 1A) is a positive-sense, single stranded RNA virus whose genome size is ~29,903 bp and is organized in the following order from 5’ to 3’: open reading frame (ORF) 1ab (replicas), spike glycoprotein (S), ORF3a protein, envelope protein (E), membrane glycoprotein (M), ORF6 protein, ORF7a protein, ORF7b protein, ORF8 protein, nucleocapsid-phosphoprotein (N), and ORF10 protein (
Figure 1B). The SARS-CoV-2 genomic RNA contains two major open reading frames (ORFs), ORF1ab and ORF1a, occupying 2/3rd of the genome (21,291 nucleotides or 1 to 21 kb) at the 5′ end and translated to polyprotein 1ab (pp1ab) and pp1a proteins. The virus genome encodes two proteases, a papain-like protease (PLpro), or nsp3, and a 3C-like protease (3CLpro), or nsp5, which cleave pp1a and pp1b polypeptides into 16 nonstructural proteins: leader protein, nsp2, nsp3, nsp4, 3C-like proteinase, nsp6, nsp7, nsp8, nsp9, nsp10, RNA-dependent RNA polymerase (RdRp), helicase, 3’–5’ exonuclease, endoRNAse, 2’-o-ribose methyltransferase, and nsp11
[32][33]. The remaining one third of the genome (21 to 29 kb) at the 3′-end has overlapping ORFs, encoding for at least four structural proteins: spike glycoprotein (S), envelope protein (E), membrane glycoprotein (M), and nucleocapsid protein (N) and six accessory proteins (ORF3a, ORF6, ORF7a, ORF7b, ORF8, ORF10)
[33][34][35]. The genes and proteins expressed by SARS-CoV-2 along with their nucleotide location and number of amino acids are shown in
Table 1.
Figure 1. Genomic Organization and Structural Features of surface glycoprotein of SARS-CoV-2: (A) Structure of SARS-CoV-2: The schematic depicts the structural features of SARS-CoV-2 virus particle, which contains a single-stranded RNA, S-glycoprotein, and other structural proteins that include envelope (E), membrane (M), and nucleocapsid (N) proteins. (B) Schematic representation of SARS-CoV-2. proteins: Genome of SARS-CoV-2 consists of approximately 29,903 nucleotides, with ORF-1a and ORF-1b, which are translated to polyprotein 1a (pp1a) and 1ab (pp1ab), respectively. Four genes encoding for structural proteins such as spike (S), envelop (E), membrane (M), and nucleocapsid (N). Accessory proteins (ORF3a, 6, 7a, 7b, 8, and 10) are distributed among structural proteins. (C) Structure of SARS-CoV-2 Surface glycoprotein (S): The surface glycoprotein (S protein) is made up of 1273 amino acids, including N-terminal signal peptide (SP), S1 subunit, and S2 subunit. The S1 subunit contains an N-terminal domain (NTD) and a receptor binding domain (RBD), while the S2 subunit is composed of the fusion peptide (FP), heptapeptide repeat sequence 1 (HR1), HR2, TM domain, and cytoplasm domain (C).
Table 1. Genes and proteins expressed by SARS-CoV-2.
The surface glycoprotein (S protein) is composed of 1273 amino acids, including the N-terminal signal peptide (SP) of 1–13 residues, the S1 subunit (14–685 residues) and S2 subunit (686–1273 residues) (
Figure 1C). The S1 subunit is made up of an N-terminal domain (NTD, 14–305 residues) and a receptor binding domain (319–541 residues), while the S2 subunit is made up of the fusion peptide (FP, 788–806 residues), heptapeptide repeat sequence 1 (HR1, 912–984 residues), HR2 (1163–1213 residues), TM domain (1213–1237 residues) and cytoplasm domain (1237–1273 residues)
[36]. Both S1 and S2 subunits are crucial in assembly and surface projection of the S protein, which interacts with Angiotensin-Converting Enzyme 2 (ACE2) receptors which are expressed on the lower respiratory pneumocytes of the host cell
[30][33]. Host transmembrane Serine Protease 2 (TMPRSS2) cleaves the S protein at the furin cleavage site (682–689 residues) into S1 and S2 subunits, enabling viral fusion and entry
[37][38] (
Figure 2). After entering into the host cell, SARS-CoV-2 takeovers the host cell machinery to rapidly synthesize viral proteins, assemble, and release virus progenies
[39][40].
Figure 2. Complete life cycle of SARS-CoV-2. SARS-CoV-2 lifecycle begins with primary binding of the Spike protein to its specific receptors (ACE2). Mostly host cell entry depends on several steps: (i) Initial cleavage of S1/S2 specific sites by surface transmembrane protease serine-2 (TMPRSS2) and furine, (ii) followed by virus–cell membrane fusion and endocytosis. (iii) Continuing from endocytosis, m-RNA genome is mainly released into the cytosol and translated into the polyproteins. Mostly polyproteins (pp1a and pp1ab) are cleaved by specific viral-encoded protease (VEP) into the several nonstructural proteins (nsps) (mostly including RNA-dependent RNA polymerase: RdRp) which is responsible for replication transcription complex (RTC) in the cells. (iv) Viral replication begins with virus-induced double membrane vesicles mainly derived from the endoplasmic reticulum (ER). (v) Positive-strand of genome serves as a main template for full-length negative-strand RNA and sub genomic (sg)RNA and sgRNA translation results in both structural proteins and accessory proteins are inserted into the ER–Golgi intermediate compartment (ERGIC) for virion assembly, respectively. (vi) RNA genomes with specific nucleocapsid proteins are incorporated into newly synthesized virions, which are secreted by exocytosis. DMV—double-membrane vesicle.