Генетические маркеры для метабаркодирования пресноводных микроводорослей: History
Please note this is an old version of this entry, which may differ significantly from the current revision.

Методы метабаркодирования для изучения разнообразия пресноводных микроводорослей и рутинного
биомониторинга активно используются в современных исследованиях. Уже накоплен большой опыт
и решены многие методологические вопросы (такие, как влияние методов и сроков
консервации образцов, выделения ДНК и биоинформационной обработки). Воспроизводимость метода
проверена и подтверждена. Однако одна из главных проблем — выбор генетического
маркера для исследования — до сих пор не имеет однозначного ответа. В
исследованиях на эукариотических пресноводных микроводорослях используют 12 маркеров (разные ядерные области 18S и ITS
и пластиды rbcL, 23S и 16S). Каждый маркер имеет свои особенности; они усиливаются по-разному и имеют
различный уровень эффективности (изменчивости) у разных групп водорослей.
Наиболее часто используются области 18S и rbcL V4 и V9 . Выбор маркера влияет на таксономический состав сообщества.

  • barcode
  • metabarcoding
  • ecological assessment
  • microalgae
  • genetic markers

1. Введение

В настоящее время метабаркодирование эДНК является популярным методом изучения разнообразия и функционирования различных сообществ, от микробов до млекопитающих. Интерес к этому методу с каждым годом растет, а количество исследований увеличивается. Например, запрос в базе данных SCOPUS по ключевому слову «метабаркодирование» возвращает 2215 результатов; запрос с ключевым словом «eDNA» возвращает 26 034 результата (дата поиска 12 февраля 2023 г.).
Algae are a phylogenetically heterogeneous group of organisms that is very diverse in morphology and ecological preferences. In the eukaryotic tree of life, photosynthetic eukaryotes are spread across 12 separate phylogenetic lines at the level of phylum [1,2,3]. On a macrosystematic level, they belong to four to seven (according to different estimates) supergroups that also contain non-photosynthetic organisms in each clade [1,2,3,4,5,6]. This phylogenetical heterogeneity is connected with a gene locus “…which is variable enough to provide robust identification at the species level…” [7] (and references in it) “…and different markers are applied for species delimitation in different algal groups.” [8] (and references in it). For example, phylogenetic studies and species descriptions of diatoms and red algae do not use the rDNA ITS marker, whereas it is the main marker currently employed for DNA-based species of green microalgae, Dinoflagellates, Chrysophytes and Synurophytes [7,8] (and references in it). Well-documented nucleotide sequences are accumulated in databases, which are the basis of interpreting the metabarcoding data.
Thus, the choice of the barcode region and primer pairs, which can limit or bias the diversity of organisms observed, is a challenge with environmental metabarcoding studies [9,10]. The proportion of biodiversity covered by metabarcoding studies directly depends on the markers and primers used, so organisms that are not amplified by standard methods go undetected, even if they are common and play an important role in the ecosystem [11]. It is important for a “good barcode” to be taxonomically informative; it needs to be able to distinguish between species (i.e., the DNA region should mutate at the right rate), because most modern biomonitoring and biotic index programs require identification at the species level. At the same time, a barcode needs conserved primer binding areas, or degenerate primers, in order to be able to attach to the DNA of all the organisms in the sample [12]. The choice of primers also impacts the results of a biodiversity assessment of an ecosystem. Complete universality causes a loss of resolution and limits the depth of the biodiversity assessments of groups. Limiting the universality of the primers might, on the other hand, exclude important groups in the analysis and introduce biases, favoring some organisms or groups.
There is a lot of experience already gained in using next-generation sequencing (NGS) approaches for studying algae. One of the high-priority research areas is the integration of metabarcoding into routine biomonitoring. Many methodological questions have been answered; bioinformatics pipelines have been assessed [14,15], sampling, DNA extraction methods and applications of global eDNA have been discussed [16,17,18,19,20,21], and recently, it has been shown that the preservation time and sample preservation methods have little effect on DNA metabarcoding results [22], the experience of integrating eDNA metabarcoding into routine freshwater biomonitoring has been summarized [23,24,25] and the terminology “eDNA” has been clarified [26]). 

2. Genetic Markers for Metabarcoding of Freshwater Microalgae

2.1. Gene Markers and Primer Sets for Freshwater Microalgae Metabarcoding

Twelve various genetic regions are used in the studies. The nuclear regions V3, V4, V7, V9 and V9-ITS1 are used for analyzing whole eukaryotic communities, as well as communities of microalgae, also focusing on individual groups of algae (dinoflagellates and diatoms [31]). The ITS2 region has been chosen for studying green algae s.l. (Viridiplantae) in a series of studies of the Antarctic region [72,73,74,105]. The plastid region rbcL is widely used for diatom metabarcoding, and also, primers for identifying Eustigmatophyceae have been designed and tested [97].
Among nuclear markers, the V4 18S rRNA region is used for analyses most often. Seven options of primer sets, the most used of which was Set 6 (TAReuk454FWD1/TAReukREV3), developed by Stoeck et al. [43]. This set is widely used in metabarcoding of both marine and freshwater eukaryotic plankton. Sets 1 (DIV4for/DIV4rev3) and 2 (M13F-D512/M13R-D978rev) are aimed at diatoms and used in seven and four studies, respectively. The remaining sets are all mentioned in only one publication each, apart from Set 7 (TAReuk454FWD1/V4r), which has been recently accepted as the standard for using environmental DNA in Finnish marine phytoplankton monitoring. The V9 18S region was chosen as a barcode in nine publications. The universal primer Set 1 (1391F/EukBr) was mostly used for the amplification of this region (in eight publications out of nine). 

2.2. Reference Databases for Sequence Interpretation

The scholars used different databases for taxonomic attributions of sequences. Studies on diatoms that use the rbcL region always use “Diat. barcode” (Rsyst:diatom database), a curated barcode library for diatoms [105] for sequence interpretation. Taxonomic attributions of sequences of various 18S rRNA (V3, V4, V7, V8, V9 and combinations) regions are usually carried out using GenBank, as well as quality-controlled databases of ribosomal RNA gene sequences such as “SILVA” [106]. The PR2 (Protist Ribosomal Reference) database—a catalog of unicellular eukaryote small subunit rRNA sequences with curated taxonomy—is used less often. In a series of studies on Antarctic green algae [72,73,74], the sequences were annotated using a recently established reference dataset PLANiTS, which included the sequences of Viridiplantae ITS1, ITS2 and entire ITS sequences, including both Chlorophyta and Streptophyta [107]. To classify the 16S reads of freshwater diatom biofilm [103], PhytoREF, a reference database of the plastid 16S rRNA gene of photosynthetic eukaryotes, was used [108].

3.3. First Works on Testing Genetic Markers on Monoclonal Microalgal Cultures Provide Insight on the Effectiveness of Amplification and the Resolution of Species Differentiation

The first studies that tested the resolution of genetic markers for species differentiation were carried out using large collections of monoclonal algal cultures. It allowed to determine the effectiveness of primers in amplifying certain regions, directly compare the variability of sequences and morphological features (including cryptic species) and establish the regions that are most suitable for further research. These studies became the basis of choosing the markers for next-generation sequencing.
One of the first tests of diatom ”barcode” genes (COI, rbcL, 18S and ITS rDNA) was done by Evans et al. in 2007 [109]. The study aimed to determine the effectiveness of markers in distinguishing cryptic species within the model “morphospecies” Sellaphora pupula agg. As a result of their analysis, the authors suggested the barcode region COI as a valuable phylogenetic marker. However, they also reported some difficulties with the amplification of this gene (a large primer set was used, sequences for Seminavis cf. robusta and for centric diatoms could not be obtained and only partial sequences were obtained for the araphid pennate diatom Tabularia sp.). According to the acquired data, the plastid gene rbcL is less variable than COI, but it supports all the phylogenetic lines of the latter. As for ITS, this barcode has a lot of variability in the length of the region, and there is also the problem of intraindividual variations.
Later, Moniz and Kaczmarska [111] tested as a barcode the small ribosomal subunit (SSU, 1600 bp), a 5′ end fragment of the cytochrome c oxidase subunit 1 (COI, 430 bp), and the second internal transcribed spacer region combined with the 5.8S gene (5.8S + ITS2, 300–400 bp) on 28 species from 22 genera of diatoms. COI showed the lowest rates of amplification (only 29% of good quality DNA amplified with COI, and of those, only 30% were sequenced successfully and found to be diatom DNA). For SSU, the authors noted the highest of all three success rates in amplification and easy alignment; however, a long fragment is required for species delimitation. 5.8S + ITS2 showed a higher rate of successful amplification and sequencing (79% and 84%, respectively), as it was the most variable of the three markers, but its secondary structure was needed to aid in alignment. As a result, the 5.8S + ITS2 fragment was proposed as the best candidate for a diatom DNA barcode. 
A search for a universal marker for diatoms was carried out by Hamsher et al. [113]. The authors assessed the following markers: ∼1400 bp of rbcL, 748 bp at the 3′ end of rbcL (rbcL-3P), LSU D2/D3 and UPA. As a result, rbcL-3P was suggested as the primary marker for diatom barcoding, since it had the power to distinguish all species and could be sequenced more easily. LSU D2/D3 could distinguish all but the most closely related species (96%). UPA showed low resolution, distinguishing only 20% of the species. Relying on the authors’ personal experiences (several copies were amplified, and the resulting sequences were different in length and unreadable), as well as the literary data, it was concluded that ITS is not a good barcode for diatoms.
The effectiveness of rbcL was discussed by M. MacGillivary and I. Kaczmarska [114]. A 540-bp fragment 417 bp downstream of the start codon of the rbcL gene was tested on a large selection of diatom taxa from classes Mediophyceae and Bacillariophyceae (381 sequences representing 66 genera and 245 species). This fragment was chosen after preliminary testing as the most variable. As a result, this fragment of rbcL correctly segregated 96% and 93% of the morphological congeners, respectively. 
The effectiveness of three markers (SSU rDNA, rbcL and COI) for metabarcoding was tested on a mock community of diatom algae (30 strains belonging to 21 species) by Kermarrec et al. [115]. These markers are the primary ones used for the molecular identification of diatoms. The markers ITS and LSU were not considered in this study because of their high interclonal variability and the lack of available data for the establishment of reference libraries. In order to interpret the acquired sequences, reference libraries were created for each marker. Sequences from the authors’ own collection and from GenBank were included in these libraries. Gene marker rbcL showed the best species composition assessment of the mock community, and SSU rDNA was next (it did not differentiate the complexes Nitzschia palea and Gomphonema parvulum at the intraspecific level). COI is variable and provides high resolution, but it was not recommended for routine metabarcoding due to difficulties in amplification and low representativity of the reference library.
A large work on assessing the utility of the gene markers COI, rbcL, ITS, tufA, UPA and 18S for freshwater green algae was done by Hall et al. [116]. They tested representatives of seven distantly related species groups from classes Chlorophyceae, Charophyceae and Zygnematophyceae (151 strains, 40 species total). As a result, the authors concluded that 18S, UPA and COI would be poor choices for a DNA barcode in green algae (18S and UPA proved insufficiently variable and COI difficult to amplify). ITS, rbcL and tufA were sufficiently variable to distinguish most species of Chlorophyceae, but additional primers were sometimes needed for amplification. For the charophytes, rbcL was noted as the most suitable primer but with a remark that it was impossible to differentiate species using this marker alone.

3.4. 18S—Choosing a Variable Barcode Region for Eukaryotes In Silico

The eukaryotic gene 18S-rRNA is used for species delimitation in almost all groups of freshwater algae [8]. It contains nine hypervariable regions (V1 to V9), each of which has been considered as a short barcode for species identification (with the exception of V6, because this region is more conserved in eukaryotes) [121] (and references in it). The question of using hypervariable regions as barcode markers for eukaryotes in silico has been discussed in several publications.
Based on an alignment of eukaryotes containing 24,793 positions from the SILVA database, the characterization of the 18S rRNA gene and the design of universal eukaryote specific primers were provided by Hadziavdic et al. [13]. To describe the nucleotide variation in the alignment, the authors used Shannon entropy values. The results suggested that the V2, V4 and V9 regions were best suited for biodiversity assessments (they yielded the highest taxonomic resolutions at cut-off values ranging 95–100% for the sequence identity). The V1 region is rather short (ca 100 nt) and contains a highly conserved core segment, and the V3 and V5 regions lack highly variable segments and are not very long. V7 has a highly variable core of approximately 20–25 nt. The V8 region is over 150 nucleotides long with variable and conserved positions interspersed across the region, with a conserved segment towards the 3′ end. 

3.5. 18S rRNA Gene Metabarcoding: V4 vs. V9

Bradley et al. [54] examined the effect of PCR/sequencing bias of the V4 and V8–V9 regions on community structure and membership using seven microalgal mock communities consisting of 12 algal species across five major divisions of eukaryotic marine and freshwater microalgae. The authors found a critical shortcoming of the V4 primer set as used in the literature [43] and described the failed sequencing runs. The V4 region failed to reliably capture 2 of the 12 mock community members (the haptophytes Prymnesium parvum and Isochrysis galbana), whereas the V8–V9 hypervariable region more accurately represented the mean relative abundance and alpha and beta diversity. Bradley et al. [54] found that degeneracies on the 3′ end of the current V4-specific primers impacted the read length and mean relative abundance. They modified the TAReukREV3 reverse primer and suggested the V4r primer without degeneracies on the 3′ end for the subsequent sequencing.
A comparative analysis of the V4 and V9 regions of 18S rDNA of the eukaryotic community of a pond [53] showed a remarkable discrepancy: the inventory of the major subdivision groups in the V9 region dataset did not correspond to that in the V4 region dataset. Eukaryotic OTUs for the V9 region were 20% more abundant than those for the V4 region at a 97% identity threshold. V9 also showed a larger diversity from the point of view of taxonomic coverage. The classes Karyorelictea, Prostomatea and Nassophorea in Ciliophora and the family Perkinsida (‘Alveolata’ group) were not detected using the V4 sequencing data, whereas they were detected using the V9 sequencing data. V4 missed Echinamoebida, Eumycetozoa and Euamoebida and green microalgae classes Chloropicophyceae, Pyramimonadophyceae and Mamiellophyceae. The authors noted “… the simultaneous application of two biomarkers may be suitable for understanding the molecular phylogenetic relationships”.
In an investigation dedicated to a eukaryotic community in anaerobic wastewater treatment systems [48], the V4 and V9 regions also detected different taxonomic groups. The authors suggested that commonly used V4 and V9 primer pairs could produce a bias in eukaryotic community analyses. The number of sequences of the amplicon library for the V9 region was almost two times larger than the number of sequences of the V4 amplicon library (340,054 vs. 180,678). The V4 region-specific primer pair showed that the dominant group was fungi.
A comparison of the 18S rRNA V4 and V9 regions for coastal phytoplankton communities with a focus on Chlorophyta [122] showed that the V9 region provided 20% more OTUs built at 97% identity than V4. Interestingly, the expectations were the opposite: the authors assumed that V4 as the longer region would detect more OTUs. The authors noted that both markers work “…equally well to describe global communities at different taxonomic levels from the division to the genus and provided similar Chlorophyta distribution patterns”. The authors concluded that V9 was the better choice for Chlorophyta, as it was more discriminating than V4. In the same cases for prasinophytes clade VII, V9 OTUs allowed to discriminate all subclades defined to date, while, in V4, several clades collapsed together. However, there was also an opposite example: “The V9 region of some Chlamydomonas is very similar to that of prasinophytes clade VII A5”. The authors emphasized the importance of the existence of reference sequences in databases, the absence of which, for instance, prevented the assessment of Dolichomastigales (Chlorophyta and Mamiellophyceae) diversity using V9. Similar results were demonstrated on marine picoeukaryotes [123], amoebae [124] and zoonotic trichomonads [125].

3.6. Internal Transcribed Spacer Ribosomal DNA (ITS) in Metabarcoding Researches

The ITS region is the accepted DNA barcode for fungi and a strong locus for delimiting or identifying species from different algal groups, such as Chlorophyta, Dinophyceae, Chrysophyceae, Xanthophyceae and Eustigmatophyceae [7,8]. Therefore, the usage of this region for metabarcoding has positive prospects, with a high probability of identifying nucleotide sequences at the species level.
The V9-ITS1 region of the 18S was chosen for the large-scale research of freshwater protists from 217 freshwater lakes across Europe [68,69,70]. The studies were aimed at identifying the diversity dynamic of the protist communities relative to the geographic distance and mountain range structures [68], centers of endemism [70] and models of interactions between the protist community and bacteria [69]. In regard to algae, the diversity of the following groups was determined in these studies: Dinophyceae, Chrysophyceae, diatoms, Cryptophyta and Viridiplantae (green algae). 
The ITS2 gene region is the best marker for DNA barcoding of Chlorophyta. This marker resolves major green algae lineages (some with high bootstrap support), has a high resolution for taxonomic assessment (enables the most species to be distinguished) and a high level of universality (i.e., in primers for PCR) [127] (and references in it). This region was successfully used in the first studies of the diversity of Viridiplantae (including green microalgae) in the Antarctic using the metabarcoding approach in soil and rock surfaces samples [72,128], sediments from lakes [74] and glacial ice [73]. The interpretation of sequences was carried out using the PLANiTS2 database [107], and most of the taxa were identified to the species level.

3.7. Gene Markers for Diatoms

Diatoms are well-known ecological indicators of aquatic ecosystems and are widely used for routine monitoring. Indexes of the water quality in rivers and lakes have been developed on the basis of diatoms and are used in EU countries (the Water Framework Directive in Europe), the USA (the National Water Quality Assessment Program in the USA), Canada, Australia and New Zealand [105,129,130,131]. Therefore, adapting the metabarcoding method for use as a tool for ecological assessment is a relevant task of modern research.
Visco et al. [32] showed a strong similarity between the DI-CH (the Swiss Diatom Index) values inferred from microscopic and V4 18S NGS analyses of diatom communities. However, the authors noted that the interspecies variability of this barcode might change between different genera, and its effectiveness would depend on the taxonomic composition of the diatom community. The V4 resolution did not allow to unambiguously assign Navicula species, but it was sufficient to distinguish most of the species of Nitzschia and Gomphonema.
The rbcL gene marker has a wider application for studying diatom communities, and thanks to the establishment of a quality reference database Diat.barcode/R-syst:diatom [105], it can already be considered the standard for diatom metabarcoding.
The resolution of the rbcL 312 bp marker on the level of intraspecific and cryptic diversity was successfully demonstrated by Pérez-Burillo et al. [80]. Benthic diatom samples (n = 610) were studied with a special focus on several ecologically important diatom species that are also key for the Water Framework Directive monitoring of European rivers: Fistulifera saprophila, Achnanthidium minutissimum, Nitzschia inconspicua and Nitzschia soratensis. As a result, it was shown that intraspecific and cryptic diversity can be assessed and understood through the application of DNA metabarcoding. 
A longer rbcL region (331 bp) was suggested as a result of large-scale research (500 benthic samples from 250 sites in England) with the aim of adopting a metabarcoding approach for ecological status assessment using diatoms [17]. The choice of region was based on an analysis of 390 sequences from a database. Eleven conservative regions of the rbcL gene with >96% identity were identified. These regions were used for developing primers. Variable regions were also analyzed, and four of these showed good potential for species delimitation. 

3.8. Specific Primers Targeted to rbcL Region Detected a High Diversity of Eustigmatophyceae

Высокое разнообразие Eustigmatophyceae было обнаружено в образцах ДНК из окружающей среды с помощью новых специфических праймеров, нацеленных на L-область rbc [ 97 ]. Авторы сравнили свои результаты с предыдущими исследованиями Eustigmatophyceae и пришли к выводу, что разнообразие этой группы недооценено. Разработанные праймеры позволили выявить 184 гаплотипа АСВ, которые относятся либо к Eustigmatophyceae (179), либо, возможно, к Eustigmatophyceae (15), в то время как в предыдущих работах представители этой группы встречались лишь в виде редких или единичных находок. Чувствительность праймеров rbc L, направленных на эустигматофиты, была выше по сравнению с универсальными эукариотическими праймерами 18S.

3.9. Сравнение маркеров rbcL и 18S для биомониторинга пресноводных диатомовых водорослей

Неубедительные результаты были получены при использовании маркеров rbc L и V4 18S рРНК [ 34 ] для биомониторинга донных диатомей в пресноводных местообитаниях Северной Европы. Классы экологического состояния существенно различались в зависимости от использованного метода: только 48 % проб с маркером 18S и 37,5 % проб с маркером rbc L имели тот же экологический статус, что и при морфологическом анализе. Оценка экологических условий по разным маркерам дала разные результаты. Авторы связали это с различиями в таксономическом охвате соответствующих справочных баз данных и специфичности праймеров.

3.10. Пластидный маркер 23S рДНК для одновременной детекции эукариотических водорослей и цианобактерий

Универсальный пластидный ампликон (УПА) представляет собой вариабельный домен V гена пластидной рРНК 23S длиной ~330 п.н. Эта область была предложена Шервудом и Престингом [ 99 ] в качестве маркера пластидсодержащих организмов, т. е. всех линий эукариотических водорослей и цианобактерий. В этом исследовании была разработана одна пара универсальных праймеров, и было указано, что эти точные последовательности праймеров присутствуют только у цианобактерий и пластид. Однако сравнения с другими маркерами показали недостаточную эффективность УПА.

3.11. Ген 16S рРНК как маркер для одновременной детекции прокариот и эукариот

Ген 16S рРНК был впервые предложен в качестве маркера метабаркодирования Eiler et al. в 2013 [ 101] on the basis of it being universally present in prokaryotes (including cyanobacteria), as well as in chloroplasts of eukaryotes. This enabled the simultaneous detection of prokaryotic and eukaryotic phytoplankton taxa. The authors analyzed the phytoplankton diversity from 49 lakes, including three seasonal surveys, and assessed the data using NGS and microscopy. The NGS approach detected 1.5–2 times more OTUs than there were taxa found by the microscopy approach. A more detailed comparison of taxonomic groups revealed that Heterokonta, Euglenophyta, Cryptophyta and Dinophyta were overrepresented in the microscopic biovolume dataset compared to the NGS data, whereas Cyanobacteria were proportionally overrepresented in the NGS dataset compared to microscopic biovolume data. The authors noted that Dinophyta, a major phylum in microscopic data, was poorly detected by NGS in some lakes. Discrepancies also included Euglenophyta and Heterokonta that were scarce in the NGS but were frequently detected by microscopy. The NGS approach detected a deep-branching taxonomically unclassified cluster that could not be linked to any group identified by microscopy.
Недавно Bonfantine и соавт. [ 103 ] исследовали потенциал стандартной пары праймеров V4 515F-806RB для восстановления последовательностей 16SrРНК диатомовых пластид. PhytoREF был использован для классификации прочтений 16S из 72 образцов пресноводной биопленки. Основываясь на выравнивании нуклеотидов Clustal, авторы подтвердили различия между последовательностями эукариотических хлоропластов и прокариот. «Прочтения Ochrophyta и других эукариот показали высокую консервативность последовательности без 3'-несоответствий в последних 5 основаниях как прямого, так и обратного праймеров 16S v4-515F и V4-806R. Два несовпадения с 16S рРНК E. coli (GT против TA) наблюдались во всех выровненных не- E. coli16S РНК последовательности на 15 оснований выше сайта связывания праймера V4-806». Было идентифицировано более 90% прочтений диатомовых водорослей в каждом образце речной биопленки. Авторы обнаружили значительное бета-разнообразие в сообществах диатомовых водорослей и различение речных сегментов. На примере трех австралийских экологических наборов данных 16S рРНК, выбранных из NCBI-SRA, было показано, что большинство OTU диатомовых водорослей (67 из 71) были обнаружены в других австралийских экосистемах. В результате авторы пришли к выводу, что гены 16S рРНК диатомовых пластид легко амплифицируются со стандартными наборами праймеров. «Поэтому объем существующих наборов данных ампликонов 16S рРНК, первоначально созданных для профилирования микробного сообщества, также можно использовать для обнаружения, характеристики и картирования распределения диатомовых водорослей для информирования о филогении и оценках экологического здоровья,

3.12. Сравнение подходов: метабаркодирование и морфологическая идентификация (соответствие между методами)

Сравнение результатов, полученных с помощью LM и NGS, позволяет выявить расхождения между этими методами и причины этих расхождений, определить эффективность амплификации выбранных генетических маркеров и выявить проблемы в биоинформатической обработке и таксономической атрибуции. Каждая отдельная работа, в которой сравнивались морфологический и молекулярный подходы к изучению разнообразия сообществ водорослей, указывала на существенную разницу в полученных таксономических списках. Количество таксонов, обнаруженных обоими подходами, колеблется от 7,4 до 25,7%.
Some studies have shown that diversity detected by NGS is much higher than that found in LM. In a research of diatom diversity, Zimmermann et al. [132] reported that about 2.5 times more taxa were found by the NGS approach (263 taxa vs. 102 taxa in LM). In an example of studying benthic diatoms, Bailet et al. [34] showed that the metabarcoding method using the 18S marker revealed 27% more taxa than the morphological method and 38% more taxa using the rbcL marker. 
The challenges in metabarcoding analysis:
  • Gap in the reference database [16,27,32,34,38,46,62,82,90,99], etc.
  • The natural intraspecific and intragenomic variabilities of the barcoding marker (single taxon has multiple genotypes at the barcoding region, and members of that taxon might cluster into different Molecular Operational Taxonomic Units (MOTUs)) [35].
  • Cryptic diversity—a single morphological species can represent different genetic groups (e.g., diatoms Sellaphora pupula, Pinnularia borealis, Hantzschia amphioxys and Nitzschia inconspicua and species of Stichococcus, Coccomyxa, Chlorokybus, Cryptomonas, etc.) [78,137,138,139,140,141,142].
  • MOTU richness can be artificially inflated through technical errors at different steps of sample processing during amplification and sequencing [35].
  • The MOTU delimitation approach influences the richness estimation and interpretation [35] (and references in it) (assessment of the bioinformatics pipelines provided in [14,15]).
  • Complete absence of amplification on the whole due to a mismatch of the primer set used. For example, Salmaso et al. [27] did not find any species belonging to the Euglenales in the HTS results (with universal eukaryotic primers (TAReuk454FWD1 and TAReukREV3) for V4 18S), although they were present in LM. Hanžek et al. [66] reported that the taxa that contributed most to the biomass (Actinotaenium/Mesotaenium sp. and the species Cosmarium tenue, Pantocsekiella comensis, Sphaerocystis schroeteri and Synedropsis roundii) were not identified by eDNA metabarcoding (V9 18S region was amplified using the universal primer pair 1391F and EukB). Proeschöld and Darienko [140] noted that, although Stichococcus-like organisms are widely distributed in almost all habitats, they are not recorded in environmental studies based on HTS approaches, because the V4 or V9 regions of the SSU contain introns that obstruct amplification. Groendahl et al. [42] reported that Monorhaphidium sp., Selenastrum sp. and Trachelomonas sp. detected using the morphology-based approach were not identified by the metabarcoding approach, despite the fact that all three genera are included in the reference database.
  • Uncertainties and lack of sensitivity of reference databases for the selected DNA markers [27].
The challenges related to morphological identification:
  • Diatom extracellular skeletons are counted in LM even if they come from dead cells. The valves of dead cells can be transported from locations other than the target assemblage. Metabarcoding will not detect these dead cells [35,78].
  • The proportion of live diatoms found in environmental samples varies greatly, ranging from 2 to 98% [35].
  • Small-celled species and pico-sized cells are often overlooked or underestimated by the morphological approach. For example, the valves of Fistulifera saprophila tend to dissolve during sample processing, which can explain why this species is often missed during morphological identification [78,80,82,99].
  • LM misidentification (false LM positives) [27,78,99].
  • Differences in the detection limits of the two methods: morphological and molecular approaches do not give the same insight into communities of algae and, therefore, do not have the same detection capacity for species [27,92,99].
  • The different sample volumes settled for microscopy and metabarcoding [143].
  • Underlying units used for microscopy (individual cells) and those used for metabarcoding (ASV sequences) are quite different, making direct comparisons imperfect [24,143].
  • A short barcode gene fragment may have limited the taxonomic resolution [143]. For example, the resolution of the V4 18S region does not allow to unambiguously identify some species of Navicula [32]. For the V7–9 18S marker, a lack of intergenus taxonomic resolution was found (the MOTUs matched multiple genera, e.g., Alexandrium pseudogonyaulax and A. hiranoi, Chaetoceros neogracile and C. curvisetus and Thalassiosira eccentrica and T. antarctica) [144]. In some Chlamydomonas, the V9 region is very similar to that of prasinophytes clade VII A5 [122].

3. Conclusions

Metabarcoding has already been accepted as an alternative (faster and more economical) method to the traditional microscopy method for the ecological assessment and monitoring of freshwater bodies of water, rivers and seas based on microalgae. Protocols, technical guidelines or standards for eDNA monitoring are developed and/or approved in many countries [145,146,147,148,149]. Currently, there is no one perfect marker for identifying microalgae across the whole diversity. Among the most popular genetic barcodes for freshwater metabarcoding, it can highlight the nuclear regions V4 and V9 18S rRNA (which allow to determine the composition of auto- and heterotrophic eukaryotes) and the region of the plastid gene rbcL (for diatoms). The regions ITS1 and ITS2 might be underestimated, but they show good potential for usage as a microalgae barcode; they can be easily amplified with the standard primers and are variable enough to identify sequences to the species level.
Below, the main advantages (+) and disadvantages (−) of markers that are used for freshwater microalgae metabarcoding are summarized.
rbcL.
“+” widely used for diatom metabarcoding, distinguishes between taxa at the species and intraspecies levels; a high-quality curated reference database for taxonomic attribution Diat.barcode [105].
“−” extremely heterogeneous in green algae; does not have a set of universal primers [127]; only diatoms are identified well.
Notes: the majority of studies use the region with the length of 263 bp (312 bp including primers) and a complicated set of primers (three forward primers and two reverse primers) suggested by Vasselon et al. [16,90] (bcL primers Set 1). However, recently, it has been shown that a longer region of 331 bp (common region 263 bp, proposed by Kelly et al. [17,18], primers set rbcL 646F-rbcL 998R) has a higher resolution for species and intraspecific variants [86].
V4 18S.
«+» широко используется для метабаркодирования морских и пресноводных эукариот; успешно амплифицируется универсальным набором праймеров. Область V4 (названная пре-штрих-кодом) была определена в качестве отправной точки для идентификации протистов в рамках проекта Международного консорциума штрих-кодов жизни (iBOL, http://www.ibol.org/ (по состоянию на 22 мая 2023 г.) и Рабочей группы по протистам (ProWG) [ 150 ]. Дает представление о молекулярных филогенетических отношениях.
«-» по сравнению с V9 18S пропускает гаптофиты [ 54 ], многие группы гетеротрофов и зеленые водоросли из классов Chloropicophyceae, Pyramimonadophyceae и Mamiellophyceae [ 53 ]. Область V4 менее вариабельна по сравнению с областью V9. Часто не дифференцирует виды.
В9 18С.
«+» широко используется для метабаркодирования морских и пресноводных эукариот; успешно амплифицируется универсальным набором праймеров; был выбран для амплификации эукариот в рамках глобального проекта «The Earth Microbiome Project» (EMP; http://www.earthmicrobiome.org (по состоянию на 22 мая 2023 г.)); V9 более изменчив, чем V4; обеспечивает больше OTU и разнообразие таксонов более высокого уровня (надгруппа и тип).
«-» короткий участок (96–134 пн [ 36 ]; иногда не дифференцирует виды.
Примечания: области V4 и V9 обнаруживают различные таксономические профили на уровне рода, семейства и макро-таксонов. Оба маркера рекомендуются для более полного понимания структур сообщества.
ITS (регионы V9-ITS1 или ITS2 используются для меташтрихкодирования).
«+» — сильный локус для некоторых водорослей (Chlorophyta, Dinophyceae, Eustigmatophyceae и Xanthophyceae) [ 7 ]; достаточно изменчива; позволяет различать виды; амплифицируется универсальным набором праймеров. Для интерпретации данных, включая ITS1, ITS2 и целые последовательности ITS Viridiplantae, существует специализированный набор эталонных данных «PLANiTS» [ 107 ].
«-» для диатомовых водорослей этот штрих-код имеет большую вариабельность длины региона и проблему внутрииндивидуальной изменчивости [ 110 , 113 ].
23С (УПА) .
«+» — специфические для водорослей маркеры, ориентированные на пластидсодержащие эукариотические водоросли и цианобактерии; достаточно амплифицируется универсальным набором праймеров.
«-» консервативная область; низкое разрешение; идентифицирует таксоны только до уровня рода или выше; не является сильным местом для водорослей.
16S (регионы V3–V4 и V4 используются для меташтрихкодирования).
«+» ориентирован на хлоропласты эукариот и прокариот (цианобактерии); достаточно амплифицируется универсальным набором праймеров; может использоваться для одновременного обнаружения прокариот и эукариотических водорослей.
«-» смещен в сторону бактерий в сообществе; может неточно отражать разнообразие фитопланктона из-за эндосимбиотического происхождения хлоропластов; не является сильным местом для водорослей.
Примечания: используется очень редко для одновременного комплексного анализа прокариот и эукариотических водорослей.
В целом следует учитывать, что каждый маркер будет демонстрировать разный образ сообщества, что зависит от успешной амплификации выбранного региона. Выявление таксономического состава и уровня таксономической принадлежности зависит от региональной изменчивости и качества справочных баз данных.

This entry is adapted from the peer-reviewed paper 10.3390/biology12071038

This entry is offline, you can click here to edit this entry!
Video Production Service