Chloroplast DNA Barcodes: Comparison
Please note this is a comparison between Version 1 by Andreas Tzakos and Version 2 by Wendy Huang.

DNA barcodes are standardized sequences, ideally unique, coding or non-coding, either from the genome of the organism or from its organelles, that are used to identify/classify an organismal group; in short, the method includes amplification of the DNA barcode, sequencing and comparison with a reference database containing the relevant sequences from different species. In plants, the use a universal DNA barcode, such as COI, which is used in animals, has not been achieved so far. 

  • DNA barcoding
  • chloroplast
  • matK
  • rbcL
  • trnH-psbA
  • rpoB
  • rpoC1
  • psbK
  • psbI

1. DNA Barcodes

Standardization, minimalism, and scalability are the three oilers of DNA barcoding. This technique has been successfully used for species identification in animals; a 648-base pair (bp) fragment near the 5′-end of the mitochondrial gene cytochrome c oxidase subunit I (COI) has been selected as the standard barcode [1][26]: (a) there is a large copy number per cell resulting in easier amplification from smaller or degraded samples, (b) it is maternally inherited, (c) there is no possibility of recombination with paternal copies, and (d) it rapidly accumulates mutations [2][3][27,28]. While COI is a suitable target for animals, it does not discriminate most plants because of a much slower mutation rate. This has led to the search of alternative barcoding regions [4][5][29,30].
The fundamental concept underpinning DNA barcoding is rooted in the notion that throughout the evolution of species, certain DNA lengths within both coding and non-coding regions remain highly conserved, undergoing minor changes. Sequences found in cytoplasmic mitochondrial DNA, chloroplast DNA, and selected segments from nuclear DNA embody these characteristics, making them suitable candidates for DNA barcoding. The utilization of these sequences enables the differentiation of species, providing a molecular signature that facilitates accurate and efficient species identification in the field of DNA barcoding [6][31]. The suitability of such loci or the combination thereof is under discussion for plant species where there is not one easily applied solution. Τhe design of universal primers could enable efficient PCR amplification that following sequencing and bioinformatic analysis and would ideally identify all the known species. Unfortunately, so far, the ideal DNA barcode does not exist in plants [7][8][9][20,21,32]. Several barcodes, single or multiple, have been used and are presented below.
Chloroplast DNA is a circular molecule with a size between 120 and 220 kb and consists of a large and a small single-copy region (LSC and SSC) intervened by two copies of a large, inverted repeat (Ira and Irb). There are about 100 functional genes that can be used for species identification and, according to some researchers, besides single-locus markers, the whole plastid genome could be used for DNA barcoding besides single-locus markers. DNA barcodes from chloroplast genes are extensively used in plant phylogenetic studies; the design of primers is easy, gene order in the genome of the organelle is conserved, and amplification is much easier due to the high copy number per cell. Nevertheless, compared with the nuclear genome genes of the chloroplast genome, they are characterized by a low evolutionary rate [6][10][11][3,23,31]. Among the chloroplast markers, the following have been successfully used:

2. matK

matK (maturase K) is one of the most rapidly evolving chloroplast genes, which has been used for identification at the family, the genus, and even the species level. matK exhibits interspecific divergence and a low transition/transversion rate. It is approximately 1550 bp long and encodes maturase K, an enzyme involved in the splicing of type-II introns [12][13][14][15][35,39,44,45]. However, its use as a universal DNA barcode is hampered by technical problems, mainly the design of the universal primer sets, due to the high substitution rate [16][17][18][37,46,47]. However, matK constitutes a suitable marker for angiosperm, flowering plant, bryophyte, lycophyte, gymnosperm, and monilophyte identification [15][18][45,47].

3. rbcL

rbcL (ribulose bisphosphate carboxylase/oxygenase large subunit) is a candidate locus for comparing at the levels of family and genus; however, it is not suitable for species identification as it has modest discriminatory power. This marker has been one of the most studied among the plastid genome, with wide representation from all major groups and many available sequences in GenBank [5][7][8][13][20,21,30,39]. It was the first gene sequenced from the plant chloroplast genome and encodes the large subunit of rubilose-1,5-bisphosphate carboxylase/oxygenase (RUBISCO), a critical photosynthetic enzyme [19][48]. rbcL is easy amplify and sequence but has a slow evolutionary rate [13][20][7,39]. Its length is approximately 1430 bp and thus at least two sets of primers are needed to sequence the entire coding sequence [8][12][21,35]. rbcL meets most of the desired criteria and can be used in conjunction with other markers [5][16][30,37]. It is also widely used for alga, peptidophyte, and angiosperm identification [9][21][6,32].

4. trnH-psbA

trnH-psbA is one of the most variable non-coding plastid loci with an intergenic spacer suitable to offer a high level of species discrimination [12][16][35,37]. It is easily amplified with universal primers but as it has high rates of insertion/deletion, alignment can be difficult. Moreover, its length varies among different families, with this region containing copies of rps19 in some cases, as well as a pseudogene that is located between trnH and psbA; this causes a problem, as despite obtaining high-quality bidirectional sequences, alignment is difficult due to the high length variation. Most researchers have proposed that trnH-psbA should be used in combination with one or more loci to provide adequate resolution [5][7][16][18][20][7,20,30,37,47]. Nevertheless, is has been shown that it is a suitable marker for flowering plants and peptidophytes [8][21][6,21].

5. rpoB and rpoC1

rpoB (RNA polymerase subunit B) and rpoC1 (RNA polymerase subunit C1) are plastid genes, encoding subunits of the plastid-encoded plastid RNA polymerase that have been used for the identification at the family level but, due to their slow evolution rate, they cannot be used for species discrimination in many plant families [15][16][37,45]. Both can be efficiently amplified with a limited range of PCR conditions and primer sets [16][37]. rpoB, rpoC1, rpoC2 encode three out of four subunits of the chloroplast RNA polymerase [22][49] and are suitable markers for bryophyte identification [23][50].

6. trnL-trnF (Genic, Intron, and Intergenic Spacer)

The trnL-trnF intergenic spacer has been proposed as a universal plastid amplicon and has been widely used in plant systematics and plylogeography since the 1990s [12][18][35,47]. This region is located in the large single copy region of the chloroplast genome [24][19]. Despite its slow rate of molecular evolution, the plastid trnL intron is suggested as a possible marker because of its conserved sites; hence, it could be a useful tool for evolutionary studies at higher taxonomic levels [8][16][21,37]. Taberlet et al. [25][51] established primers that work for 19 species tested including algae, bryophytes, pteridophytes, gymnosperms, and angiosperms.

7. psbK-psbI (Intergenic Spacer)

The psbK and psbI loci encode two low molecular weight polypeptides, K and I, of the photosystem II [26][52]. The non-coding psbK-psbI intergenic spacer is conserved and can be easily amplified with PCR, sequenced, and aligned [27][28][53,54]. It also demonstrates high discriminatory power but low sequence quality and universality [12][35]. Despite its discriminatory power, the CBOL Plant Working Group propose its use as a supplementary locus due to the inconsistency in obtaining bidirectional unambiguous sequences [5][30]. Nevertheless, it constitutes a suitable marker for bryophyte, lycophyte, and monilophyte identification [18][47].

8. atpF-atpH (Intergenic Spacer)

The non-coding, plastid region atpF-atpH could be used as a universal DNA barcoding marker for species-level identification but its discriminatory power is medium. The genes atpF and atpH encode ATP synthase subunits CFO I and III. The length of atpF-atpH sequences vary from 598 to 613 bp and the alignment of these sequences is difficult despite easy PCR amplification. For this reason, it could be useful only as supplementary marker in plant DNA barcoding, providing better resolution on specific projects and taxonomic groups [18][29][30][31][33,47,55,56]. According to Wang W. et al. [30][55], it is a suitable marker for duck seed identification.