Retroviruses selectively package two copies of their RNA genomes from a cellular milieu that includes a substantial excess of host and non-genomic viral RNAs. Present understanding of the structural determinants and mechanism of retroviral genome packaging has been derived from combinations of genetic experiments, phylogenetic analyses, nucleotide accessibility mapping, in silico RNA structure predictions, and biophysical studies. Genetic experiments provided early clues regarding the protein and RNA elements required for packaging, and nucleotide accessibility mapping experiments provided insights into the secondary structures of functionally important elements in the genome. Three-dimensional structural determinants of packaging were primarily derived by nuclear magnetic resonance (NMR) spectroscopy. A key advantage of NMR, relative to other methods for determining biomolecular structure (such as X-ray crystallography), is that it is well suited for studies of conformationally dynamic and heterogeneous systems—a hallmark of the retrovirus packaging machinery.
During virus assembly, all retroviruses package two copies of their 5ʹ-capped, unspliced RNA genomes, a requirement for strand transfer-mediate recombination during reverse transcription. Insights into the molecular structures and mechanisms responsible for genome packaging have been obtained by combinations of virological, molecular biological, and biophysical methodologies. A method that has contributed considerably to our understanding of the structural determinants of packaging is solution-state nuclear magnetic resonance (NMR). A particularly attractive feature of NMR is its ability to probe the structures and dynamics of conformationally distinct biomolecule structures within a heterogeneous mixtures of equilibrating species. Disadvantages of the NMR approach include low proton density, poor chemical shift dispersion, and relatively low signal sensitivity and resolution that render studies of larger RNAs (> ~50 nucleotides) problematic. The development of new isotopic labeling methods and interplay of hybrid approaches has opened the door to surpassing the bounds of previously determined NMR structures. Here we summarize insights into the structures and mechanism of retroviral genome packaging that have been obtained by NMR.
Retroviral genome packaging and virus particle production are facilitated by Gag polyproteins, which consist of three structural domains (matrix; MA, capsid; CA, and nucleocapsid; NC) that are functionally important for retroviral replication: MA directs Gag to the plasma membrane for particle assembly, CA orchestrates the necessary protein-protein interactions required for Gag multimerization, and NC functions as an RNA binding domain for selective genome recognition. Each of these domains has been investigated using NMR either independently or in complex with their cognate lipid, protein, or nucleic acid partners. Following are some of the contributions to understanding retroviral proteins that have been advanced via NMR.
NMR has been utilized in many instances to characterize the MA domain  in the context of its relationship to cytosolic trafficking , MA-RNA interactions , and MA-plasma membrane interactions. Structural elucidation of the MA domain consistently revealed a globular fold, of five ⍺-helices, a short 310 helical stretch, and a three-strand mixed β-sheet . MA-plasma membrane NMR studies investigated a native co-translationally modified (myristoylation) protein revealing that it adopts both myristoyl exposed and sequestered conformations that do not perturb the tertiary fold . Although NMR studies with water-soluble PI(4,5)P2 lipids containing truncated fatty acid chains showed that one of the acyl chains can bind to a hydrophobic cleft on MA (the so-called “extended lipid” binding mode) , later NMR studies with native PI(4,5)P2 molecules embedded in bicelles and liposomes supported models in which both fatty acid chains of PI(4,5)2 remain embedded in the liposome upon MA binding . NMR was used to show that tRNALys3 interacts with residues in the basic patch region of MA, and that tRNALys3 binding inhibits MA interactions with PI(4,5)P2 enriched liposomes .
The CA domain uses both its N- and C-terminal domains to participate in intermolecular contacts allowing CA multimers  and additional Gag-Gag interactions. NMR studies to investigate these contact interfaces were conducted with a construct comprising CACTD through NC (CACTD-SP1-NC) . Data revealed that SP1 (spacer region 1) is conformationally labile and exists as unstructured (predominant) and helical (minor) states. There have also been studies on larger protein constructs spanning from the MA domain through NC (unmyristoylated MA-CA-SP1-NC, ~100 kDa); 1H-15N chemical shift perturbation mapping revealed structural changes that occur upon nucleic acid binding .
NMR structures have been reported for isolated NC proteins from Human immunodeficiency virus (HIV-1/2), Mason-Pfizer monkey virus (MPMV), Moloney murine leukemia virus (MoMuLV), Mouse mammary tumor virus (MMTV), and Simian immunodeficiency virus (SIV). The proteins contain one or two copies of a CCHC-type zinc knuckle domain, and most studies indicate that they adopt structures that behave like “beads on a string” . There are also several NMR structures for NC bound to viral RNA for different viruses such as HIV-1/2, MoMuLV, and Rous sarcoma virus (RSV). Each of which illustrates specific recognition of RNA ψ packaging elements that are facilitated by interactions with hydrophobic pockets of the NC zinc knuckles . In addition to its role in genome packaging, the NC domain of Gag (or the mature NC protein) functions as an RNA chaperone to catalyze conformational rearrangements . NMR studies showed that NC performs its chaperone activity by lowering the energy barrier required to break base pairs or by facilitating the formation of new base-pairs .
Retroviruses contain two copies of their RNA genomes , which functions in a number of replication processes from dimerization, packaging, transcriptional activation, splicing, and initiation of reverse transcription. Each of these processes are promoted by the elements located within the 5′-leader of the viral genome , which is among the most conserved regions of the ~9kb genome , (http://www.hiv.lanl.gov/). A combination studies have identified minimal packaging regions within the 5′-leaders of HIV-1 , MoMuLV , and Rous Sarcoma Virus (RSV), which are independently capable of directing heterologous RNAs into assembling virus-like particles (VLPs). These regions have been coined “core encapsidation signals” (ψCES) and are typically localized near residues that promote RNA dimerization . Significant efforts have been aimed toward elucidating the three-dimensional structures of the ψCES regions and other functional elements within the 5ʹ-leader in order to better understand their mechanism of action in relationship to genome packaging. Summarized below are few of the NMR studies that have been instrumental in the advancement of the not only the field of retroviruses but also RNA structural biology.
The trans-activation region (TAR) is a 59-nt sequence located at the 5′-end of the 5ʹ-leader and is essential in Tat-mediated transcriptional activation . The first NMR structures for TAR and TAR-Tat complexes revealed the structural elements that are critical for Tat binding specifically bulge residue (U23), two base-pairs immediately downstream from the bulge (G26-C39 and A27-U38), and three phosphate groups . Continued NMR studies revealed additional unique characteristics of TAR including, a base-triple that forms between U23 and A27-U38, which functions in stabilizing the interactions between Tat arginine residue G26 with phosphates in the major grooves of the RNA .
Structural studies of ΨCES were facilitated by implementing mutations that reduced the size of the RNA without altering the remaining base pairing or function. At 155 nt, this construct was five times larger than the average size of RNA structures previously solved by NMR at the time . Complications with signal resolution for an RNA this size were still not completely resolved with the traditional methods of nucleotide-specific 2H-labeling. However, a novel approach involving the non-covalent annealing of differentially 2H-labeled RNA fragments aided in the necessary resolution for the identification of a unique tandem three-way junction formation for the RNA .
It was surprising that the dimeric form of the intact 5′-leader of the NL4-3 strain of HIV-1 (molecular weight of ~230 kDa) produced interpretable NMR data . The quality of the spectra revealed that the RNA is comprised of independently folded subdomains. This in combination with 2H-labeling enabled for the direct detection of predicted secondary structural elements . Other studies of the intact leader utilized 2H-edited to serve in the development of a new diagnostic tool, called long-range Adenosine Interaction Detection (lrAID) , which takes advantage of a [AAU]:[AUU] base pairing sequence that produces resolved adenosine signals even in larger RNA constructs.
The promoter of the HIV-1 proviral DNA contains three sequential guanosines that can function as the start site for transcription. In cells, RNAs are transcribed by RNA Polymerase II and are co-transcriptionally capped with 5ʹ-5ʹ triphosphate linked 7-methylguanosine (7MeG) . In vitro, capped RNAs containing a single 5ʹ-guanosine preferentially form dimers whereas those that begin with two or three 5ʹ-guanosines preferentially form monomers . NMR studies revealed that the capped 1G leader adopts a structure in which the cap is sandwiched between the TAR and polyA helices, which are co-linearly stacked in an “end-to-end” manner . 2H-edited NMR studies of the capped 2G/3G leader RNAs, revealed extensive structural remodeling compared to the capped 1G RNA  and that the cap residue is exposed and disordered . These studies provided a structural explanation for how transcriptional start heterogeneity modulates the structure, function, and fate of HIV-1 RNA.
The above studies illustrate the versatility of NMR and its utility for developing detailed understanding of retroviral structural biology. Future studies of genome packaging will likely focus on even larger RNAs and protein-RNA complexes. Some steps have been made to enable NMR studies of larger RNAs, including the development of site-specific isotopic labeling  and the expanded use of 1H-15N correlated NMR methods . But as questions shift to larger systems, new approaches involving hybrid methodologies, advancements in labeling techniques, and solid-state NMR, will likely make important future contributions.