2. DNA-Based Molecular Markers
2.1. Restriction Fragment Length Polymorphism (RFLP)
In Restriction Fragment Length Polymorphism (RFLP) technique, DNA samples are digested with specific endonucleases resulting in a profile of fragments of different lengths which is characteristic of each species. RFLPs have the advantage of resulting in medium polymorphic variability and do not require prior knowledge of the genome sequence analyzed. However, as disadvantages, these markers have high development and running costs and require high quality and quantity of DNA [
12].
Several derived techniques, such as Polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP) [
13], Terminal-RFLP (T-RFLP) [
14] and Multiplex PCR-terminal RFLP (MT-RFLP), [
15] have been developed over the years to particular applications.
RFLP and derived techniques have been used to identify and detect fungal species associated with human health (41.6% of the papers analyzed), the study of soil microorganisms (28.7%), and plant health (11.9%). In human health, the most frequently studied fungi were Candida spp. (48.4%), the etiological agent of diseases such as candidiasis and candidemia, Trichophyton spp. (15.6%), responsible for dermatomycosis and onychomycosis, and Aspergillus spp. (10.2%), associated with superficial and invasive infections. In the soil, aspects such as the effects of geographical distance (12.5%), plant composition (12.5%), and application of fertilizers (9.4%) on microbial composition were studied. The most studied plant pathogens were Colletotrichum spp. (20.8%), Fusarium spp. (12.5%), and Penicillium spp. (8.3%) (Figure 1).
Figure 1. Distribution of the main fungal species according to the molecular markers used. The number of published articles for each molecular marker was normalized to allow a direct comparison between the different molecular markers (only normalized values higher than 1 are represented).
2.2. Random Amplification of Polymorphic DNA (RAPD)
Random Amplification of Polymorphic DNA (RAPD) [
16,
17] uses short random PCR primers (8–15 nucleotides) complementary to several genomic regions, generating complex PCR profiles characteristic to each species [
18]. This technique uses high polymorphic molecular markers, requires a medium quantity of DNA, and presents intermediate technical development and running costs. Among the disadvantages is the low reproducibility of the results [
18].
RAPD have been commonly used for the study of fungal genetic diversity (34.06% of the papers analyzed), the study of diverse plant pathogens (29.0%), and the authentication or safety verification of products such as food and drinks (15.9%). Genetic diversity in fungi was the focus of studies related to diseases in plants and humans (i.e., Fusarium—19.0%; Aspergillus—9.5%; Rhizoctonia—7.1%). These studies include vegetative compatibility, fingerprinting of toxigenic and non-toxigenic strains, and antifungal susceptibility. Among the plant pathogens, Fusarium spp. (26.9%) was associated with wilts in different crops and pokkah boeng disease in sugarcane; Alternaria spp. (7.7%), associated with diseases such as brown spot, leaf spot, and black rot or being endophytic fungi; and Phaeoacremonium spp. (7.7%), responsible for grapevine diseases such as esca. In the case of the identification of food products, the main species identified were the edible mushrooms Pleurotus (30.8%) and Agaricus (3.8%), while the main species concerning food contamination and spoilage were Aspergillus spp. (17.3%) and Penicillium (7.7%) (Figure 1).
2.3. Amplified Fragment Length Polymorphism (AFLP)
In the Amplified Fragment Length Polymorphism (AFLP) technique [
19], DNA samples are digested using two restriction enzymes after the annealing of adapters, which create cut boundaries that act as primer binding sites for PCR amplification. Polymorphisms are recognized by the presence or absence of DNA fragments following analysis on polyacrylamide gels [
20]. These markers are highly polymorphic and abundant in the genome.
The main limitations of this technique are the high development and running costs, high quality and quantity of DNA requirements, prior knowledge of the DNA sequence, and show intermediate reproducibility and low automation capacity [
21,
22].
AFLP have been mainly used for the study of plant health (50.5%), fungal genetic variability (12.6%), and food and beverage products (10.7%). Concerning phytopathogens, the most frequently studied crop diseases were rots (14.3%) caused by fungal species such as Coniophora species-complex, Fusarium spp., Macrophomina phaseolina, Nigrospora oryzae, and Sclerotinia sclerotiorum; rusts (11.9%) induced by Veronaea botryose, Puccinia striiformis, Hemileia vastatrix, and Phakopsora pachyrhizi; and anthracnose (9.5%) caused by species belonging to the Colletotrichum genus; rice blast (8.1%) caused by Magnoporthe spp.. In the case of fungal genomic diversity were studied aspect as the difference among strains among the same species (41.2%), differential gene expression (17.6%), phylogeny (11.8%), or population structure (11.8%) (Figure 1).
2.4. Inter-Simple Sequence Repeats (ISSR)
Inter-Simple Sequence Repeats (ISSR) consists of amplifying a DNA segment located at an amplifiable distance between two identical microsatellites (16–25 bp long) oriented in opposite directions [
23].
These markers present several advantages over the above-mentioned markers: high polymorphism level and low development and running costs, and requirements for low quality and quantity of DNA [
24].
ISSR have been used to study plant pathogens (42.2%) and fungal genetic diversity (25.0%), and to identify food and beverage products (17.2%). Amongst the 51 fungal species studied, the most representative ones were Fusarium spp. (15.5%) as the etiological agent of the yellowing disease, wilt, and storage rot in different cultures, Colletotrichum (13.8%), causing bitter rot and anthracnose, and Alternaria (5.2%), responsible for symptoms of brown leaf spot and blight. In terms of genomic diversity, the most common fungi studied were Fusarium (14.7%), Aspergillus (11.8%), and Sclerotium (8.8%), comprehending aspects such as genetic diversity (65%), virulence (10%), and vegetative compatibility (5%). Finally, in the identification of food and beverage products, the most frequently studied fungal species were Pleurotus (21.9%), Agaricus (9.4%), and Aspergillus (9.4%) (Figure 1).
2.5. Variable Number of Tandem Repeats (VNTR)
Variable Number of Tandem Repeats (VNTR) includes minisatellites and microsatellites. Minisatellites are repeat motifs mostly about 9 to 30 bp long [
25,
26]. Microsatellites (Simple Sequence Repeats (SSRs) or Short Tandem Repeats (STRs)) are repeat motifs mostly about 2 to 4 bp, consisting of tandem repeats of mono-, di-, tri-, tetra-, or pentanucleotide units arranged throughout the genome [
27]. Microsatellites are generally abundant and polymorphic in non-transcribed genomic regions, the reason why this marker is considered selectively neutral. Nevertheless, SSR loci can also occur in genomic regions involved in transcription, translation, chromatin organization, or recombination [
28]. Due to replication slippage, SSRs loci mutate from 10–100 thousand times more frequently per generation than single-nucleotide substitutions [
29]. Their high mutation rates and neutral evolution allow the accumulation of numerous population-specific alleles, which are significant for unveiling hidden population structures. Due to their multi-allelic nature, there is a higher probability to detect heterozygosity than, for instance, an equal number of bi-allelic markers. However, the unusually high variability of SSRs concerning other genomic regions might not necessarily reflect patterns of genome-wide genetic diversity [
30,
31,
32]. Furthermore, the rapid mutation rates of SSRs may also be a confounding signal of population structuring and divergence. For instance, frequent forward and backward mutations can create identical alleles in unrelated or genetically isolated populations. This undesirable effect can be compensated by increasing the number of polymorphic SSR loci used, but populations’ level of genetic differentiation that diverged a long time ago could still be underestimated [
33].
Minisatellites and microsatellites show a high level of polymorphism and genomic abundance, low requirements in terms of both DNA quality and quantity, and high reproducibility.
VNTR markers have been used in studies of plant (40.2%) and human health (15.0%), fungal diversity (15.0%), and food and beverage (8.4%). The most common plant pathogens studied were Puccinia (7.4%), the etiological agent of rust; Colletotrichum (6.4%) causing anthracnose; Fusarium (6.4%), responsible for diseases such as blight, rot, and wilt; Alternaria (3.2%), the etiological agent of blight and rot, Diaporthe (3.2%) as endophytic fungi; Ustilago (3.2%), inducing smut; and Rhizoctonia (3.2%), causing aggregate sheath spot and blight. The human pathogens most frequently studied were Candida spp. (40.0%), the etiological agent of candidiasis and candidemia; Aspergillus spp. (17.1%), inducing invasive infections; Cryptococcus spp. (14.3%), causing cryptococcosis. As for fungal genetic diversity, the most studied species were Aspergillus (13.5%), Fusarium (13.5%), and Epichloë (8.1%), comprehending topics such as genetic diversity (43.6%), mating-type (20.5%), and population structure (12.8%) (Figure 1).
2.6. Single-Nucleotide Polymorphisms (SNP)
Single-nucleotide polymorphisms (SNP) resulting from changes in a single nucleotide position in the DNA sequence [
34]. These markers occur twice as frequently in intergenic and non-coding regions of the genome than in coding regions [
35]. However, genome-wide association studies revealed that occasionally SNPs located in non-coding regions are often physically linked to functional or regulatory genomic sites, thus reflecting, for example, selection signatures [
36]. Given that SNPs are mostly bi-allelic, traditional population genetic statistics can easily be applied to them, but a greater number of loci sufficiently polymorphic might be necessary to reach the same power as multi-allelic SSR loci [
29]. The advent of next-generation sequencing techniques has considerably accelerated, simplified, and automated genome-wide SNP detection and genotyping. However, considering that also a relatively small number of highly polymorphic SNPs can potentially give a similar genetic resolution as randomly chosen and multi-allelic SSRs [
37], an alternative strategy to genome-wide SNP screening might be targeting polymorphic sites in unlinked single-copy genes, generally known to be conserved in the targeted phylum [
38]. As a result of single nucleotide replacements, these markers are biallelic, but rare cases exist of triallelism for the target position.
SNP are co-dominant markers with a high level of polymorphism and very high genomic abundance. The analyses of these markers require a low quantity of DNA, allowing a high automation capacity resulting in very reliable and reproducible data.
SNP have been used for the study of plant (42.9%) and human health (16.0%), and genomic variability (10.1%). Concerning plant pathogens, the most frequently studied fungal pathogens were Fusarium spp. (15.3%) and Monilinia spp. (5.1%), being the most prevalent diseases rusts (11.5%) and blight (7.7%). The most common studied crop was wheat (20.0%). In the case of plant pathogens, Aspergillus spp. (19.4%) and Exophiala spp. (16.1%) were frequently studied. The most studied diseases were aspergillosis (33.3%) and candidiasis (26.7%). Genomic diversity studies were conducted in Glomeromycota (50.0%) and Fusarium spp., covering aspects such as species genetic diversity (36.4%), population structure (27.3%), and toxin production (18.2%) (Figure 1).
2.7. Small Insertions or Deletions (InDels)
Insertions or Deletions are fragments of different sizes (ranging from 1 to 1000 bp) inserted or lost at a given location in the genome. These markers are very stable within the genome and can be relevant for population studies [39]. These markers are co-dominant, with high polymorphism and very abundant, presenting both high reliability and reproducibility.
InDels have been used in studies of plant (29.0%) and human health (17.0%), and food and beverage (17.0%). The most commonly studied plant pathogens were Fusarium spp. (16.7%), Botrytis spp. (10.0%), and Puccinia spp. (10%), covering aspects such as phytopathogen genetic diversity (23.7%), the distinction between the species of the complex (7.9%), resistance to fungicides (7.9%), in crops like wheat (21.4%), oilseed rape (14.3%), and soybean (14.3%). The most frequently studied human pathogens were Cryptococcus spp. (19.2%), Aspergillus spp. (11.5%), and Trichophyton spp. (11.5%), elucidating aspects such as genetic diversity (40.0%), recombination (15.0%), and virulence factors (15.0%), associated to diseases as (25.0%), diarrhea (16.7%), and other enteric diseases (16.7%) (Figure 1).
2.8. DNA Barcoding
DNA barcoding relies on the use of a single universal marker - DNA barcode - for the rapid and efficient identification and classification of species by non-specialists [40], similar to the Universal Product Code that identifies products in a supermarket [40].
Each Kingdom possesses a particular barcode gene or genes [41, 42]. Fungal ITS is frequently proposed as the first universal barcoding marker. However, this barcode does not provide sufficient resolution being common to use secondary barcodes [i.e., intergenic spacer (IGS), β-tubulin II (TUB2), DNA-directed RNA polymerase II largest (RPB1) and second largest (RPB2) subunits, translational elongation factor 1α (TEF-1α), DNA topoisomerase I (TOP1), phosphoglycerate kinase (PGK), and cytochrome c oxidase subunit I (COX1) and subunit II (COX2), 28S nrDNA (LSU), and 18S nrDNA (SSU)] [43,44].
These markers are co-dominant and present high genomic abundance, being their analysis highly reliable and reproducible. A disadvantage is that the same marker cannot be universally used for all fungal species, and prior knowledge of the DNA sequence is required [45].
DNA barcoding has been used in studies on plant (21.4%) and human health (13.4%), and food and beverage (12.5%). In plant health, the most used barcode genes were ITS (36.2%), TEF-1α (13.8%), and LSU (12.1%), applied to Fusarium spp. (11.6%), Alternaria spp. (2.9%), Curvularia spp. (2.9%), Diaporthe spp. (2.9%), Exserohilum spp. (2.9%), Mycosphaerella spp. (2.9%), and Neofusicoccum spp. (2.9%) collected from several crops (e.g., blueberry, chili, sorghum, Pinus). The aims of such studies were diagnosis (15.8%), identification of leaf spot pathogens (15.8%), and quarantine species (10.5%). In human health, the most common barcode genes were ITS (33.3%), β-TUB (18.5%), and TEF-1α (14.8%). The human pathogens included were Scedosporium spp. (9.7%), Aspergillus spp. (6.5%), Cunninghamella spp. (6.5%), and Fusarium spp. (6.5%); whereas, the major diseases were invasive fungal infections (28.6%), keratitis (21.4%), and opportunistic fungi (14.3%). In food and beverage production and spoilage, the barcodes were mainly ITS (43.3%), LSU (20.0%), and β-TUB (10.0%), applied to the study of Lactarius spp. (2.7%), Penicillium spp. (2.7%), and Pleurotus spp. (2.7%) associated either with edible mushrooms (53.3%) or dairy products (20.0%) (Figure 1)
2.9. Massive Parallel Sequencing (MPS)
The development of Massive Parallel Sequencing (MPS) techniques allow a deeper knowledge of the entire microbial composition of a given environment [46] or determining the complete genome sequence of a single microorganism [47].
MPS advantages include its high-throughput capacity; a single protocol can be applied for the identification and genotyping of all microorganisms; no need for DNA cloning; no need for an a priori knowledge about the sequence of a particular gene/genome; no need for isolation and culture of the microorganism to be studied; and reduced costs and the turnaround time [48,49]. As disadvantages, these techniques are associated with high requirements for data storage and the biases introduced by each step of the protocol [50].
Metabarcoding is the automated identification of various organisms present in a single bulk sample or from an environmental sample with degraded DNA using a species-specific genetic marker (DNA meta-barcode). Metabarcoding studies included plant health (14.9%), food and beverage (14.4%), and soil (10.9%). In plant health studies, the most frequently used metabarcodes were ITS2 (31.0%) and ITS1 (27.6%) either isolated or in a combination of both (13.8%). The selected works included studies on fungal community (30.8%), endophytes (26.9%), trunk disease (11.5%) applied to forest trees (27.3%), cereals (22.7%), and fruit trees (13.6%). In food and beverage production and spoilage, the same metabarcodes were used (ITS2: 36.0%; ITS1: 16.0%; ITS1+ITS2: 12.0%). Sequencing was used for the identification of species involved in beverage fermentation (26.9%), food contamination (19.2%), and the production of dairy products (15.4%). In the soil’s fungal diversity studies, the three usual metabarcodes were used (ITS2: 20.0%; ITS1: 15.0%; ITS1+ITS2: 15.0%). The selected soils were from forests (21.1%), agro-environment (15.8%), and public gardens (15.8%) (Figure 1).
Whole genome sequencing (WGS) includes two different techniques: de novo genome assembly (the species to be studied has not been previously sequenced and assembled) and re-sequencing, which identifies genome-wide variants comparing an existing reference assembly with a sequenced isolate through the alignment of sequence reads against the reference [47].
WGS has been used to elucidate aspects associated with plant (32.9%) and human health (13.4%), and food and beverage (12.8%). In plant health, the main fungal phytopathogens studied were Fusarium spp. (12.1%), Trichoderma spp. (6.1%), and Venturia spp. (6.1%); while the major diseases analyzed were rot (27.8%), blight (19.4%), and cancer (8.3%). The most common studied cultures were soybean (13.6%), grapevine (9.1%), sugarcane (9.1%), and wheat (9.1%). In human health, the most common fungal species were Candida spp. (42.1%), the etiological agent of candidiasis and candidemia (42.1%), and Aspergillus spp. (15.8%) causing aspergillosis (10.5%) and other infections (5.3%). In food and beverages production and spoilage, species such as Agaricus bisporus (4.8%), Auricularia spp. (4.8%), Flammulina spp. (4.8%), were studied for their interest as edible mushroom (55.0%); while Aspergillus spp. (23.8%) and Penicillium spp. (19.0%) were studied for their role in food spoilage (45.0%) (Figures 1).
3. Conclusions
Identification of fungal species is a crucial aspect of many fields in science. Mainly associated with plant, animal, and human health or food production and spoilage. Throughout the years, the scientific community has developed several molecular techniques to obtain information on these species. Nowadays, these markers range from non-PCR, PCR-based techniques to more advanced MPS-based techniques