2. The Mitochondrial DNA of P. polycephalum
P. polycephalum is a myxomycete or acellular slime mold and is a eukaryote belonging to a clade whose ancestor diverged early from those of plants, animals, and fungi. The most common mtDNA structure in strains of
P. polycephalum is circular and contains 72 genes (
Figure 1). The mtDNA varies in size from 56 to 62 kb, depending on the presence of some small deletions in the mtDNA of some strains
[8], reviewed in
[9]. This mtDNA is unique in that it is composed of two different types of gene. The first type includes 46 genes which are homologous to genes found on the mtDNA of
R. americana and together account for 38.5 kb of the mtDNA (red arrows in
Figure 1), indicating that they are derived from the common eubacterial ancestor. A second type of potential gene includes 26 significant open reading frames located on 24.3 kb of the mtDNA (green arrows in
Figure 1). The 26 unassigned open reading frames are interspersed with the classical, ancestral genes at four positions (group 1, URFs A, B, C; group 2, URFs D–I; group 3, URFs J–Y; group 4, URF Z;
Figure 1). Only one ORF (URF Z) is transcribed, the other 25 URFs are not transcribed
[10,11][10][11].
Figure 1. Genetic map of P. polycephalum’s mtDNA showing its 72 genes. Gene symbols are as defined in Table 1. Genes which have their direction of potential transcription marked by green arrows are unassigned reading frames. Genes marked with pink arrows show the direction of transcription for classic mitochondrial genes.
2.1. Mitochondrial Insertional Cotranscriptional RNA Editing in the Myxomycetes (MICOTREM)
Probably the most unique derived characteristic of the
P. polycephalum mtDNA is the RNA editing needed to express 43 of the 47 transcribed genes (recently reviewed in
[9] and
[12]). This unique type of RNA editing was first identified and characterized by Mahendran et al.
[13] in the α subunit of the ATP synthase cryptogene (genes requiring RNA editing to provide genetic information necessary for their expression) in the mtDNA of
P. polycephalum and later extended to additional cryptogenes in the
P. polycephalum mtDNA
[14,15,16,17,18,19][14][15][16][17][18][19]. This unique type of RNA editing has been designated MICOTREM (
Mitochondrial
Insertional,
Cotranscriptional
RNA
Editing in
Myxomycetes). Currently, MICOTREM has only been found in the mtDNAs of the myxomycetes
[20] and produces genetic information lacking in the cryptogenes by inserting nucleotides in RNA relative to the template DNA to create open reading frames in mRNAs and functional RNA structure in tRNAs and rRNAs. These non-templated nucleotide insertions are most commonly single cytidines but can also be single uridines or a subset of the possible dinucleotides (CU or UC, AA, GC or CG, UU, UA). These non-templated insertions are separated by an average of about 25 nucleotides in mRNAs, so that about 4% of the mRNA nucleotides are non-templated. Although the distribution of editing sites appears essentially random, no two insertion sites have been observed closer than nine nucleotides (
Figure 2). Overall, 1324 RNA editing sites have been identified in the RNAs produced from the classical, ancestral genes of
P. polycephalum, 1301 single-nucleotide sites and 23 dinucleotide sites for a total of 1347 non-templated nucleotides added to mitochondrial RNAs
[11].
Figure 2. A portion of the atp9 gene on the mtDNA of P. polycephalum (Pp-mtDNA) and Lycogala epidendrum (Le-mtDNA) aligned with P. polycephalum cDNA (Pp-cDNA) and the inferred protein products using the classic genetic code. The alignment shows the editing site distribution and variation in editing site locations in different myxomycetes. Red letters in Pp-cDNA are experimentally determined RNA editing sites in P. polycephalum. Red letters in Le-mtDNA are RNA editing sites inferred by alignment. Green letters in Le-mtDNA and Le-Protein show differences between the P. polycephalum and L. epidendrum sequences. * indicates termination codon.
The mitochondrial RNA polymerase of
P. polycephalum is similar to other mitochondrial RNA polymerases that lack MICOTREM RNA editing in that it consists of a single polypeptide with significant homology to several bacteriophage RNA polymerases
[24][21]. How the mitochondrial RNA polymerase identifies the location of an editing site, or how the specific nucleotide or dinucleotide is specified has not been determined. The complexity of this RNA editing argues that genetic information must be required to specify nucleotide identity and location. Rigorous searches for antisense nucleotides with editing site information (gRNAs) have not revealed any candidates. The only repository of the complete mitochondrial genetic information is the fully edited mRNA itself, but it is unclear how this genetic information could be used to specify nucleotide location as a nascent RNA is being transcribed and edited. Although no consensus sequence in the mtDNA around editing sites has been found, Rhee et al.
[25][22] have shown that sequences just upstream and downstream of editing sites are critical for correct editing. Miller, Padmanaban, and Sancar
[9] have proposed that RNA editing sites are identified by duplex formation between the fully edited RNA and its antisense DNA template. Bases inserted in the RNA relative to the DNA template by editing would create a nucleotide bulge in the DNA-RNA duplex in which the unpaired RNA base could be flipped out of the duplex without significant disruption of the duplex. This flipped out base could serve as the marker for RNA editing site location. Retention of the edited RNA which is Watson–Crick-based paired with the DNA template, and a displaced non-template DNA strand plectonemically associated with the major groove of the duplex and stabilized via Hoogsteen base pairing would produce a DNA-RNA-DNA triplex. In vivo and in vitro testing of this model are in progress.
A second major unanswered question about MICOTREM editing is how non-templated nucleotides are specified at RNA editing sites. This specificity is clearly necessary since any of the RNA nucleotides can be inserted at RNA editing sites but always the same nucleotide or dinucleotide is inserted at a given RNA editing site. Miller and Miller
[22][23] have shown that the
P. polycephalum mitochondrial RNA polymerase is able to add random nucleotides to the 3′ end of RNAs in vitro in the presence of complex RNA sequences. Likewise, Sarcar and Miller
[26][24] have shown that T7 RNA polymerase is able to add random ribonucleotides to the 3′ ends of RNAs and DNAs in the presence of complex DNA or RNA sequences. In addition, they demonstrated that T7 RNA polymerase (a member of the super family of single-subunit polymerases including mitochondrial polymerases) can add specific nucleotides to the 3′ ends of DNAs or RNAs in vitro by creating the potential for intramolecular or intermolecular base pairing which creates recessed 3′ ends that can be extended by one or a few nucleotides on the template provided by the extended 5′ end (limited primer extension). This templated activity can occur in the absence of transcription when only a single nucleotide is provided. Whether the
P. polycephalum mitochondrial RNA polymerase can also add specific nucleotides to the 3′ end of RNAs in vitro by limited primer extension has not been determined.
An alternate way in which the edited nucleotide could be specified is through a binding factor that recognizes the flipped-out nucleotide in an edited RNA/template DNA duplex (see above) and either provides an identical nucleotide triphosphate to the active site of the RNA polymerase which would be added without a template or directly adds the identical nucleotide to the 3′ end of the nascent RNA through a terminal transferase activity. This model would require a specific binding factor for each of the mono- or dinucleotides that can be inserted. Binding of the factor at the editing site might cause the RNA polymerase to pause in transcription to allow the insertion of the non-template nucleotide by either mechanism.
Evolution of RNA Editing Site Location
Comparison of RNA editing sites within the same genes of different myxomycetes shows that while they all display MICOTREM editing, the location of RNA editing sites varies relative to conserved regions within analogous genes (
Figure 2). In contrast to the conservation of editing sites in the mtDNA of individual myxomycetes, this observation implies an unanticipated dynamic in the location of editing sites over evolutionary time periods and provides insight into the constraints on editing site location and distribution, as well as the mechanism of editing site fixation and elimination. Krishnan et al.
[20] compared editing site location in a 452-nucleotide region of the small subunit rRNA among six myxomycetes. Each myxomycete had a similar number of editing sites (eight to ten) which were distributed such that no two editing sites were closer than nine nucleotides and, in each case, restored the conserved sequence of the SSU rRNA. However, these editing sites were distributed in different patterns in the six different RNAs and were located at 29 different sites relative to the conserved sequence of the RNA. In general, the more closely related the myxomycetes, the more editing sites they have at the same location. These variations indicate that editing sites can be created and/or removed over evolutionary time. Analysis of these editing patterns in relationship to established phylogenetic trees confirm that editing sites have been both created and deleted during the evolution of the mtDNA to produce the editing patterns observed in contemporary organisms.
Editing site patterns may also be altered by the removal of RNA editing sites. Landweber
[27][25] and Simpson and colleagues
[28,29][26][27] have proposed retrotranscription as a mechanism of eliminating insertional editing sites. Integration of cDNAs produced from reverse transcription of edited RNAs would remove the deletions in the mtDNA and eliminate the need for a compensating insertion of nucleotides in the RNA.
2.2. Unidentified, Untranscribed but Significant Open Reading Frames in the mtDNA of P. polycephalum
A second unique feature of
P. polycephalum mtDNA is the presence of 26 open readings that do not correspond to any of the genes classically observed on mitochondrial DNAs
[8,9][8][9]. Most of these reading frames are significantly long (greater than 100 codons), so that they are not likely to have been generated by chance, but with one exception are not transcribed
[10,11][10][11] The fact that these significant unassigned reading frames (SURFs) remain intact in the absence of the transcription that would provide the selection to maintain open reading frames, implies that they may be recently acquired.
The 26 SURFs are interspersed within the classical genes of the mtDNA in four groups. Group 1 SURFs are designed A, B, and C and would be transcribed counterclockwise in
Figure 1 in the order CBA. URFs A and B would code for proteins of 238 and 411 amino acids, respectively. These proteins have transmembrane characteristics consistent with being membrane proteins. However, they do not have significant homology with any protein in GenBank. URF C, a smaller open reading frame, also does not have significant homology to any gene in GenBank.
The Group 2 untranscribed region has six SURFs, two that would be transcribed clockwise in
Figure 1 (G and H), and four which would be transcribed counterclockwise in
Figure 1 (I, F, E, D). SURFs D, E, H, and I have transmembrane characteristics but none of these SURFs have significant similarity to proteins in GenBank.
The largest group of SURFs is Group 3 which includes SURFs J to Y and covers 18,022 base pairs of the mtDNA. These 16 SURFs would all be transcribed clockwise in
Figure 1 and have very little noncoding space between reading frames. These SURFs are predicted to code for proteins ranging in size from 112 amino acids (SURF V) to 724 amino acids (URF Y). SURFs J, K, L, N, O, Q, S, T, and U would code for proteins predicted to have transmembrane features and could be membrane proteins. Most of these SURFs would code for proteins that do not have similarity to any proteins in GenBank; however, several of the proteins predicted to be produced from these SURFs have recently been matched with proteins. SURF N (400 amino acids) and SURF Q (389 amino acids) have similarity to each other and to a hypothetical protein from
Flavobacteriales bacterium (328 amino acids, GenBank sequence ID: NQX98395.1) recently identified during a metagenomic search of ocean water from the marine abyssalpelagic zone, Pacific Ocean, North Pacific Gyre, Station ALOHA (Leu, A. O., 2020, unpublished). All three hypothetical proteins have a region of similarity of about 200 amino acids starting at 139 amino acids from the N-terminus. SURF R (663 nucleotides, 221 amino acids) has a region of identity to SURF 7 (1098 nucleotides, 366 amino acids) in the mitochondrial mF plasmid (see below). This region of identity in SURF R is 474 nucleotides (158 amino acids) in length starting near the N-terminus. This region of identity is the site of homologous recombination between the circular
P. polycephalum mtDNA and the mF plasmid (see below). The one SURF with the potential to produce a protein with a known function is SURF Y. SURF Y has the potential to produce a protein 724 amino acids in length. This protein has significant homology to a number of single subunit RNA polymerases from linear mitochondrial plasmids. This RNA polymerase is presumably not the mitochondrial RNA polymerase used to transcribe genes on the mtDNA of
P. polycephalum, since this SURF is not transcribed, and the encoded RNA polymerase is not produced. (The actual RNA polymerase used to transcribe the mtDNA is encoded in the nucleus and is well characterized
[24][21].) The similarity of the SURF Y amino acid sequence to RNA polymerases on linear mitochondrial plasmids and the identity of the portion of SURF R with SURF 7 of the mF linear mitochondrial plasmid argues that this region of the mtDNA may derive from a linear mitochondrial plasmid or a related bacteriophage with a linear double stranded DNA such as phi 29
[30][28].
Group 4 consists of one SURF, ORF Z. In contrast to the other unassigned reading frames, ORF Z is transcribed and in a clockwise direction. It is possible that this unidentified ORF is a classical mitochondrial gene but has diverged to an extent that it cannot be identified. However, the absence of MICOTREM RNA editing argues against it being classical and argues for it being recently acquired.