SARS-CoV-2 nucleocapsid protein (NCoV2) plays a key role in various processes related to the viral replication cycle such as the RNA genome packaging, interaction with other viral proteins. It has thus been a target for new drug development.
1. Introduction
A new coronavirus named severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused the cataclysmic pandemic through coronavirus disease 2019 (COVID-2019) pneumonia. SARS-CoV-2 is a member of β-coronavirus family that includes well-known SARS-CoV and MERS-CoV
[1]. It is an enveloped, single-stranded, positive-sense RNA virus whose genome of 30 kb is packaged into a virion of ~90 nm in diameter, which is a similar size as that of other coronaviruses
[2]. SARS-CoV-2 consists of 29 proteins in total: 16 non-structural proteins (nsp1–nsp16), 9 accessory proteins (Orf3a, Orf3b, Orf6, Orf7a, Orf7b, Orf8, Orf9b, Orf9c and Orf10)
[3], and 4 major structural proteins, which are the spike (S) protein, the envelope (E) protein, the membrane (M) protein and the nucleocapsid (N) protein. SARS-CoV-2 infection is initiated through binding of the spike protein to the angiotensin-converting enzyme 2 (ACE2) receptor in lung cells
[4]. After infection, the envelope protein serves as a cation-selective channel across the endoplasmic reticulum–Golgi intermediate compartment (ERGIC) membrane
[5]. The membrane protein is important for determination of the shape of the virus envelope
[6]. It interacts with all the above structural proteins and promotes viral assembly by stabilizing the nucleocapsid-RNA complex (ribonucleocapsid or ribonucleoprotein; RNP). The membrane and envelope proteins are thus essential for virion assembly and budding. Among the proteins expressed in this virus, the nucleocapsid protein is one of the abundant viral proteins, which shows multiple functions including packaging the viral genome into a helical RNP. It is essential for transcription of the viral genome and virion replication. Therefore, much attention has been paid to the molecular mechanism of biological functions of N protein during the infection cycle. There have already been brilliant reviews on the coronavirus nucleocapsid proteins
[7][8][9]. Even though the biochemical processes inside the infected cells that nucleocapsid protein is involved in and the high-resolution structural analysis by crystallography and cryo-EM have been featured, its structural properties and dynamical behavior in aqueous solution have not been addressed so far albeit its importance. In the following, the focus is put on both the structure and molecular flexibility of SARS-CoV-2 nucleocapsid proteins (N
CoV2).
Schematic illustration of N
CoV2, which is composed of 419 residues in total, is shown in a. N
CoV2 consists of two domains, the N-terminal domain (NTD; 49–174) and the C-terminal domain (CTD; 247–364). The NTD is responsible for RNA-binding and the CTD for dimerization and oligomerization (note that in the case of the nucleocapsid protein of SARS-CoV, both domains have the ability to bind RNA molecules
[10]). These domains are connected by a flexible linker, which contains Serine/Arginine-rich region (SR-R; 176–206) and Leucin/Glutamine-rich region (LQ-R; 210–246). Serine and arginine residues account for 45.2% and 19.4% of the total number of residues of the former linker, respectively, which is phosphorylatable in multiple sites. In LQ-R, leucine and glutamine account for 14.9% and 12.8%, respectively. In another nucleocapsid protein VP39 of baculovirus, it is known that amino acid substitution at glycine 276 affects nucleocapsid assembly and that this residue is completely conserved among various baculovirus families
[11], suggesting that certain glycine residues are essential for proper function of the nucleocapsid protein. Even though N
CoV2 contains 43 glycines in total, which account for 10.3% of 419 residues, there have not been systematic studies on the role of glycine and thus the possible roles of glycine in the N
CoV2 function remain unclear. In addition to SR-R and LQ-R, the NTD and CTD are flanked by two disordered regions, i.e., the N-tail (1–48) and the C-tail (365–419), respectively. The atomic structures of the NTD have recently been solved by X-ray crystallography at 2.7 Å resolution
[12] (b left) and by nuclear magnetic resonance (NMR)
[13]. Its feature is “(loops)-(β-sheet core)-(loops)” fold, which is conserved among the coronavirus NTDs. Similar to another coronavirus such as infectious bronchitis virus (IBV)
[14], the protruding positively-charged β-hairpin structure called “basic finger” (the bottom of the left panel of b) is essential for RNA-binding
[13]. Upon single-stranded RNA-binding, the basic finger changes its conformation such that the NTD grabs the RNA molecule (b right).
Figure 1. Structural features of the nucleocapsid protein of SARS-CoV-2 (N
CoV2). (
a) Schematic of N
CoV2, which consists of two domains, i.e., the N-terminal domain (NTD) and the C-terminal domain (CTD); (
b) atomic structure of the NTD in the RNA-unbounded form (left; PDB ID: 6M3M
[12]) and the single-stranded RNA-bounded form (right; PDB ID: 7ACT
[13]) are shown in light sea green and magenta, respectively. In the right panel of (
b), RNA is denoted in orange and the RNA-unbounded form of the NTD is superimposed for structural comparison. The direction of the movement of the basic finger upon RNA-binding is shown by a black arrow; (
c) CTD dimer interface. The CTD monomers (PDB ID: 6WZO
[15]) are shown in orange and marine blue, respectively. In (
b,
c), the residue numbers at both termini of the structures that are visible in the crystal structures are denoted. All the models are depicted using UCSF Chimera
[16].
The crystal structures of the CTD have been solved at 1.4 Å
[15][17] and 2.0 Å
[18] resolutions. It has been shown that the dimer interface consists of two β-strands and one α-helix (c), which is similar to that of the nucleocapsid protein of SARS-CoV
[19]. In solution, the N
CoV2 molecules have been shown to exist in dimers by small-angle X-ray scattering
[20]. Note that, however, N
CoV2 can also form tetramers
[15], suggesting that its solution structure is sensitive to the solvent conditions. Whereas the N
CoV2 dimers are formed through binding of two CTDs, the architecture of the tetramers remains unclear.
In order to see similarities and differences among different nucleocapsid proteins of coronavirus, the NTD and CTD structures of N
CoV2 were compared with those of the nucleocapsid protein of representative coronaviruses SARS-CoV (N
SARS) and MERS-CoV (N
MERS) as shown in . Even though the overall architecture is similar among them, superposition of the NTDs of N
CoV2 and N
SARS (a) shows that the β-hairpin structure of N
SARS NTD moves toward the N-terminus compared with N
CoV2 NTD. On the other hand, when compared with N
MERS NTD (b), the β-hairpin structure is more extended for N
MERS NTD. These comparisons suggest that the major differences in the NTD structure among the three coronaviruses reside in the β-hairpin structure, which is necessary for RNA-binding. These differences also imply that each coronavirus might have adapted the NTD structure to match its specific RNA conformation. Regarding the CTD structure, it is found that the structural feature is similar between N
CoV2 CTD and N
SARS CTD. However, in the case of N
MERS CTD, the loop between the two β-sheets (indicated by a grey arrow in b) is longer than the other CTDs. According to hydrogen/deuterium exchange measurements on dimers of N
CoV2 CTD
[15], this region is exposed to the solvent, suggesting that this region is not involved in dimer formation. So far, it is unclear what role this long insertion observed in N
MERS CTD plays in the function of the protein. The fact that other regions of the CTD possess common structural features among three coronaviruses displayed here suggests that dimer formation of the nucleocapsid protein through its CTD is essential for formation of RNP complexes and hence RNA genome packaging, which is the fundamental function of the nucleocapsid protein.
Figure 2. Comparison of the NTD and CTD structures of the nucleocapsid protein of different coronaviruses. (
a) (left) Comparison of the NTD structures of the nucleocapsid protein of SARS-CoV-2 (N
CoV2; PDB ID: 6M3M
[12]) and SARS-CoV (N
SARS; PDB ID: 2OFZ
[21]), which are shown in marine blue and magenta, respectively. (right) Comparison of N
CoV2 NTD and the NTD of the nucleocapsid protein of MERS-CoV (N
MERS; PDB ID: 4UD1
[22]) shown in orange. Note that N
CoV2 NTD in the left and right panels are displayed at slightly different angles to focus on the structural differences from the N
SARS and N
MERS counterparts; (
b) (left) Comparison of the CTD structures of N
CoV2 (PDB ID: 7C22
[18]) and of N
SARS (PDB ID: 2CJR
[23]), which are shown in marine blue and magenta, respectively. (right) Comparison of N
CoV2 CTD and the CTD structure of N
MERS (PDB ID: 6G13
[24]) shown in orange. In both (
a,
b), the major structural differences are shown by grey arrows. All the models are depicted using UCSF Chimera
[16].
2. Perspective–Toward Structural and Dynamical Characterization of NCoV2 Droplets
The recent findings that N
CoV2, RNA, other viral proteins and a stress granule protein are concentrated into a different phase from its surroundings
[25][26][27][28][29][30][31] suggest that the N
CoV2 droplet formed through liquid-liquid phase separation (LLPS) is the main arena where molecular events critical for the virus replication take place. It is thus indispensable to reveal the internal organization of N
CoV2 droplets, especially the structure of N
CoV2 in droplets formed with various binding partners to develop a new intervention technique targeting N
CoV2 as well as to understand the molecular mechanism of biological functions of N
CoV2 in detail. Furthermore, the molecules within the droplets are not frozen to the spot but fluctuate dynamically around their average conformation under the influence of thermal energy. The nature of these structural fluctuations is closely related to the interaction, the stability and the function of the proteins. Thus, unraveling the structure and structural fluctuation of N
CoV2 in the droplets formed in various conditions is essential and the research in this direction would occupy an important position in this research field.
A powerful method to characterize the molecular structure of N
CoV2 in droplets would be small-angle neutron scattering (SANS). This technique has been used to characterize the structure of viral proteins in isolated state
[32][33] and in complex with other viral components
[34][35][36]. However, to the author’s knowledge, the structural and dynamical analysis of droplets formed by virus-related proteins has not been carried out so far. Even though there are excellent reviews and books on neutron scattering
[37][38][39][40], some basic information on neutron scattering is given below in order to describe a possible method to reveal the structure and dynamics of N
CoV2 droplets. In neutron scattering, the strength of scattering (coherent scattering length) depends on the type of atoms bombarded by neutron. The major feature of coherent neutron scattering is that the scattering length of hydrogen and its isotope deuterium is largely different and thus deuteration technique plays a significant role in the structure analysis of a complex system that consists of different kinds of components
[41][42][43]. In the case of the solution samples such as protein solutions, the scattering signal in SANS measurements derives from the difference (contrast) in the scattering length density (SLD: the total scattering length of a molecule divided by its volume) between solutes and solvents. a shows the variations of the SLD of major biomolecules (proteins, lipids, DNA and RNA) as a function of the fraction of D
2O in the solvent. As the D
2O content in the solvent increases, the labile hydrogen atoms in the solutes are exchanged for deuterium atoms, which modifies the SLD of the solutes. The SLD of perdeuterated proteins (denoted as “D-protein” in a) is much larger than that of the hydrogenated counterparts because all the hydrogen atoms including those in methyl and ethyl groups are replaced with deuterium atoms. As seen in a, the SLD of the solvent containing 40% D
2O is equal to that of the hydrogenated protein (H-protein), meaning that the scattering contrast of the H-protein is zero. In this condition, the contribution of the H-protein to the scattering signal is effectively zero and hence it is “invisible” to neutron. Even though techniques to perdeuterate or partially deuterate RNA molecules by chemical synthesis, in vitro transcription and in vivo production have been established for NMR studies
[44][45], production of the sufficient amount of perdeuterated RNA for neutron experiments is more expensive and/or more difficult than production of perdeuterated protein and thus there have been only a very few SANS studies using perdeuterated RNA
[38][46]. Therefore, in the following, only RNA in the hydrogenated state (H-RNA) is considered.
Figure 3c illustrates a promising method to characterize the NCoV2 structure in droplets. In the case where droplets are formed by NCoV2, RNA and another protein (Figure 3c), three kinds of samples would be required to obtain the scattering signal from each protein: (1) droplets formed by H-NCoV2, H-RNA and the hydrogenated another protein (H-AP), (2) those formed by H-NCoV2, H-RNA and the perdeuterated another protein (D-AP) and (3) those formed by the perdeuterated NCoV2 (D-NCoV2), H-RNA and D-AP. The SANS measurements in these samples in the ~65% D2O solvent, the SLD of which matches that of H-RNA, provide three kinds of scattering data where each protein contribution is different. From these data, the scattering data arising from each protein can be extracted, leading to the elucidation of its conformation together with how each protein is arranged inside the droplets.
The structural fluctuation of the molecules in the droplets are obtained by incoherent neutron scattering (iNS). This technique measures the intensity of neutron scattered by the samples as a function of the momentum and energy change of the neutron, from which the amplitude, the frequency and their distribution of atomic motions at pico- to nanosecond timescale at ångström length scale are estimated. The values of incoherent scattering cross-section of atoms found in biomolecules are shown in Figure 3b. It is found that the scattering cross-section of hydrogen atoms is much larger than any other atom and deuterium, meaning that iNS provides information of the motions of hydrogen atoms. Furthermore, the measurements in 100% D2O buffer can minimize the solvent contribution to the iNS spectra. Since the hydrogen atoms are quasi-uniformly distributed throughout the molecules in the biological systems, the dynamical information averaged over the whole molecule is obtained. In the case of proteins, the observed motions reflect the fluctuations of amino acid side chains and backbones because hydrogen atoms are bound to these chemical groups. The schematic illustration of the dynamical analysis of the NCoV2 droplets is depicted in Figure 3d. For droplets generated from NCoV2, RNA and another protein, two types of samples are required: (1) droplets formed by H-NCoV2, H-RNA and D-AP, (2) those formed by D-NCoV2, H-RNA and D-AP. By subtracting the iNS spectra of the latter from those of the former measured in 100% D2O, the remaining spectra arise from the structural fluctuation of the NCoV2 molecules, thereby providing dynamical information on NCoV2 inside the droplets. Instead, the dynamics of another protein in the droplets can be extracted in a similar manner.
Figure 3. A promising method for physical characterization of the N
CoV2 droplets using neutron scattering. (
a) Variation of the neutron scattering length density (SLD) of biomacromolecules as a function of the heavy water concentration in the solvent. H-protein and D-protein denote the hydrogenated and perdeuterated proteins, respectively. The values of the SLD of H-protein, D-protein, lipids and solvent are taken from
[37], and those of DNA and RNA are taken from
[39]; (
b) values of incoherent neutron scattering cross-section of atoms found in biomolecules and an isotope of hydrogen atom, deuterium. Note that 1 barn = 10
−24 cm
2. The values are taken from
[47]; (
c) schematic illustration of the structural analysis of the N
CoV2 droplets using contrast-matching small-angle neutron scattering (SANS). The components that are “invisible” to neutron are shown in the same color as solvent. The prefixes H- and D- denote “hydrogenated” and “perdeuterated”, respectively, which are shown in different colors. AP denotes another protein; (
d) schematic illustration of the molecular dynamics analysis of N
CoV2 droplets using incoherent neutron scattering (iNS) combined with deuteration technique. For detailed explanations of (
c,
d), refer to the main text.
3. Conclusions
In conclusion, the molecular flexibility as well as conformations of NCoV2 plays a critical role in drug-binding, RNA-binding and formation of the droplets. Knowledge of the structure and dynamics of the NCoV2 molecules in isolated states in solution and in a variety of droplets would advance our understanding of the architecture and the physical properties of the droplets, thereby shedding light on the molecular mechanism of the viral infection and replication cycle. Furthermore, all this information would eventually be beneficial to the development of new drug molecules targeting NCoV2, of improved diagnostic techniques, of a new type of treatment for the COVID-19 pneumonia, and of a new combat strategy for a future pandemic.
This entry is adapted from the peer-reviewed paper 10.3390/biology10060454