Генетические маркеры для метабаркодирования пресноводных микроводорослей: Comparison
Please note this is a comparison between Version 2 by Elena Mikhailovna Kezlya and Version 1 by Elena Mikhailovna Kezlya.

МетодыThe metabarcoding approach is widely метабаркодирования для изучения разнообразия пресноводных микроводорослей и рутинногоused for studying the diversity and distribution of

биомониторингаfreshwater активноmicroalgae используются в современных исследованиях. Уже накоплен большой опытand for routine biomonitoring. Due to microalgae being a

иphylogenetically diverse group, the choice of a genetic marker directly affects решены многие методологические вопросы (такие, как влияние методов и сроковthe metabarcoding results. Specific markers are good for identifying only

консервацииconcrete образцовgroups, выделения ДНК и биоинформационной обработки). Воспроизводимость методаwhile universal markers may miss classes or lack

проверенаthe иvariability подтверждена. Однако одна из главных проблем — выбор генетическогоnecessary for differentiating taxa at the species and sometimes

маркераgenus дляlevels. исследования — до сих пор не имеет однозначного ответа. ВAn analysis of publications on the subject showed that metabarcoding

исследованияхstudies наof эукариотических пресноводных микроводорослях используют 12 маркеров (разные ядерные области 18S иeukaryotic freshwater microalgae used 12 markers (different nuclear regions 18S and ITS

иand plastid пластидыregions rbcL, 23S иand 16S). Каждый маркер имеет свои особенности; они усиливаются по-разному и имеют Studies that compared outcomes from different markers show that the resulting lists of taxa do not match. The plastid marker rbcL is widely used for diatom metabarcoding, as it differentiates taxa at the species and intraspecies levels, and there is a specific set of primers designed for identifying Eustigmatophyceae. The V9 18S region is more variable than V4 18S and

различныйprovides more diversity at higher уровень эффективности (изменчивости) у разных групп водорослейtaxonomic levels (supergroup and phylum). The ITS1 and ITS2 regions are used rarely and may be underestimated. These barcodes amplify well with the standard primers and are variable enough to identify sequences at the species level.

НаиболееPlastid частоmarkers используются области 18S и rbcL V4 и V9 . Выбор маркера влияет на таксономический состав сообщества(23S and 16S rDNA) focused on the plastid-containing eukaryotic algae and Cyanobacteria, conserved regions, identify taxa to the genus level and higher. Using specialized curated databases for data interpretation significantly improves the quality of the results.

  • barcode
  • metabarcoding
  • ecological assessment
  • microalgae
  • genetic markers

1. ВведениеIntroduction

ВCurrently eDNA metabarcoding настоящее время метабаркодирование эДНК является популярным методом изучения разнообразия и функционирования различных сообществ, от микробов до млекопитающих. Интерес к этому методуis a popular method for studying the diversity and functioning of various communities, from microbes to mammals. Interest in this method

grows сevery каждым годом растет, а количество исследований увеличивается. Например, запрос в базе данныхyear, and the number of studies increases. For example, a query in the SCOPUS по ключевому слову «метабаркодирование» возвращаетdatabase with the keyword “metabarcoding” returns 2215 результатов; запрос с ключевым словом «eDNA» возвращает 26 034 результата (дата поиска 12 февраля 2023 г.results; a query with the keyword “eDNA” returns 26,034 results (date of search 12 February 2023). 
Algae are a phylogenetically heterogeneous group of organisms that is very diverse in morphology and ecological preferences. In the eukaryotic tree of life, photosynthetic eukaryotes are spread across 12 separate phylogenetic lines at the level of phylum [1,2,3]. On a macrosystematic level, they belong to four to seven (according to different estimates) supergroups that also contain non-photosynthetic organisms in each clade [1,2,3,4,5,6]. This phylogenetical heterogeneity is connected with a gene locus “…which is variable enough to provide robust identification at the species level…” [7] (and references in it) “…and different markers are applied for species delimitation in different algal groups.” [8] (and references in it). For example, phylogenetic studies and species descriptions of diatoms and red algae do not use the rDNA ITS marker, whereas it is the main marker currently employed for DNA-based species of green microalgae, Dinoflagellates, Chrysophytes and Synurophytes [7,8] (and references in it). Well-documented nucleotide sequences are accumulated in databases, which are the basis of interpreting the metabarcoding data.
Thus, the choice of the barcode region and primer pairs, which can limit or bias the diversity of organisms observed, is a challenge with environmental metabarcoding studies [9,10]. The proportion of biodiversity covered by metabarcoding studies directly depends on the markers and primers used, so organisms that are not amplified by standard methods go undetected, even if they are common and play an important role in the ecosystem [11]. It is important for a “good barcode” to be taxonomically informative; it needs to be able to distinguish between species (i.e., the DNA region should mutate at the right rate), because most modern biomonitoring and biotic index programs require identification at the species level. At the same time, a barcode needs conserved primer binding areas, or degenerate primers, in order to be able to attach to the DNA of all the organisms in the sample [12]. The choice of primers also impacts the results of a biodiversity assessment of an ecosystem. Complete universality causes a loss of resolution and limits the depth of the biodiversity assessments of groups. Limiting the universality of the primers might, on the other hand, exclude important groups in the analysis and introduce biases, favoring some organisms or groups.
There is a lot of experience already gained in using next-generation sequencing (NGS) approaches for studying algae. One of the high-priority research areas is the integration of metabarcoding into routine biomonitoring. Many methodological questions have been answered; bioinformatics pipelines have been assessed [14,15], sampling, DNA extraction methods and applications of global eDNA have been discussed [16,17,18,19,20,21], and recently, it has been shown that the preservation time and sample preservation methods have little effect on DNA metabarcoding results [22], the experience of integrating eDNA metabarcoding into routine freshwater biomonitoring has been summarized [23,24,25] and the terminology “eDNA” has been clarified [26]). 

2. Genetic Markers for Metabarcoding of Freshwater Microalgae

2.1. Gene Markers and Primer Sets for Freshwater Microalgae Metabarcoding

Twelve various genetic regions are used in the studies. The nuclear regions V3, V4, V7, V9 and V9-ITS1 are used for analyzing whole eukaryotic communities, as well as communities of microalgae, also focusing on individual groups of algae (dinoflagellates and diatoms [31]). The ITS2 region has been chosen for studying green algae s.l. (Viridiplantae) in a series of studies of the Antarctic region [72,73,74,105]. The plastid region rbcL is widely used for diatom metabarcoding, and also, primers for identifying Eustigmatophyceae have been designed and tested [97].
Among nuclear markers, the V4 18S rRNA region is used for analyses most often. Seven options of primer sets, the most used of which was Set 6 (TAReuk454FWD1/TAReukREV3), developed by Stoeck et al. [43]. This set is widely used in metabarcoding of both marine and freshwater eukaryotic plankton. Sets 1 (DIV4for/DIV4rev3) and 2 (M13F-D512/M13R-D978rev) are aimed at diatoms and used in seven and four studies, respectively. The remaining sets are all mentioned in only one publication each, apart from Set 7 (TAReuk454FWD1/V4r), which has been recently accepted as the standard for using environmental DNA in Finnish marine phytoplankton monitoring. The V9 18S region was chosen as a barcode in nine publications. The universal primer Set 1 (1391F/EukBr) was mostly used for the amplification of this region (in eight publications out of nine). 

2.2. Reference Databases for Sequence Interpretation

The scholars used different databases for taxonomic attributions of sequences. Studies on diatoms that use the rbcL region always use “Diat. barcode” (Rsyst:diatom database), a curated barcode library for diatoms [105] for sequence interpretation. Taxonomic attributions of sequences of various 18S rRNA (V3, V4, V7, V8, V9 and combinations) regions are usually carried out using GenBank, as well as quality-controlled databases of ribosomal RNA gene sequences such as “SILVA” [106]. The PR2 (Protist Ribosomal Reference) database—a catalog of unicellular eukaryote small subunit rRNA sequences with curated taxonomy—is used less often. In a series of studies on Antarctic green algae [72,73,74], the sequences were annotated using a recently established reference dataset PLANiTS, which included the sequences of Viridiplantae ITS1, ITS2 and entire ITS sequences, including both Chlorophyta and Streptophyta [107]. To classify the 16S reads of freshwater diatom biofilm [103], PhytoREF, a reference database of the plastid 16S rRNA gene of photosynthetic eukaryotes, was used [108].

3.3. First Works on Testing Genetic Markers on Monoclonal Microalgal Cultures Provide Insight on the Effectiveness of Amplification and the Resolution of Species Differentiation

The first studies that tested the resolution of genetic markers for species differentiation were carried out using large collections of monoclonal algal cultures. It allowed to determine the effectiveness of primers in amplifying certain regions, directly compare the variability of sequences and morphological features (including cryptic species) and establish the regions that are most suitable for further research. These studies became the basis of choosing the markers for next-generation sequencing. One of the first tests of diatom ”barcode” genes (COI, rbcL, 18S and ITS rDNA) was done by Evans et al. in 2007 [109]. The study aimed to determine the effectiveness of markers in distinguishing cryptic species within the model “morphospecies” Sellaphora pupula agg. As a result of their analysis, the authors suggested the barcode region COI as a valuable phylogenetic marker. However, they also reported some difficulties with the amplification of this gene (a large primer set was used, sequences for Seminavis cf. robusta and for centric diatoms could not be obtained and only partial sequences were obtained for the araphid pennate diatom Tabularia sp.). According to the acquired data, the plastid gene rbcL is less variable than COI, but it supports all the phylogenetic lines of the latter. As for ITS, this barcode has a lot of variability in the length of the region, and there is also the problem of intraindividual variations. Later, Moniz and Kaczmarska [111] tested as a barcode the small ribosomal subunit (SSU, 1600 bp), a 5′ end fragment of the cytochrome c oxidase subunit 1 (COI, 430 bp), and the second internal transcribed spacer region combined with the 5.8S gene (5.8S + ITS2, 300–400 bp) on 28 species from 22 genera of diatoms. COI showed the lowest rates of amplification (only 29% of good quality DNA amplified with COI, and of those, only 30% were sequenced successfully and found to be diatom DNA). For SSU, the authors noted the highest of all three success rates in amplification and easy alignment; however, a long fragment is required for species delimitation. 5.8S + ITS2 showed a higher rate of successful amplification and sequencing (79% and 84%, respectively), as it was the most variable of the three markers, but its secondary structure was needed to aid in alignment. As a result, the 5.8S + ITS2 fragment was proposed as the best candidate for a diatom DNA barcode.  A search for a universal marker for diatoms was carried out by Hamsher et al. [113]. The authors assessed the following markers: ∼1400 bp of rbcL, 748 bp at the 3′ end of rbcL (rbcL-3P), LSU D2/D3 and UPA. As a result, rbcL-3P was suggested as the primary marker for diatom barcoding, since it had the power to distinguish all species and could be sequenced more easily. LSU D2/D3 could distinguish all but the most closely related species (96%). UPA showed low resolution, distinguishing only 20% of the species. Relying on the authors’ personal experiences (several copies were amplified, and the resulting sequences were different in length and unreadable), as well as the literary data, it was concluded that ITS is not a good barcode for diatoms. The effectiveness of rbcL was discussed by M. MacGillivary and I. Kaczmarska [114]. A 540-bp fragment 417 bp downstream of the start codon of the rbcL gene was tested on a large selection of diatom taxa from classes Mediophyceae and Bacillariophyceae (381 sequences representing 66 genera and 245 species). This fragment was chosen after preliminary testing as the most variable. As a result, this fragment of rbcL correctly segregated 96% and 93% of the morphological congeners, respectively.  The effectiveness of three markers (SSU rDNA, rbcL and COI) for metabarcoding was tested on a mock community of diatom algae (30 strains belonging to 21 species) by Kermarrec et al. [115]. These markers are the primary ones used for the molecular identification of diatoms. The markers ITS and LSU were not considered in this study because of their high interclonal variability and the lack of available data for the establishment of reference libraries. In order to interpret the acquired sequences, reference libraries were created for each marker. Sequences from the authors’ own collection and from GenBank were included in these libraries. Gene marker rbcL showed the best species composition assessment of the mock community, and SSU rDNA was next (it did not differentiate the complexes Nitzschia palea and Gomphonema parvulum at the intraspecific level). COI is variable and provides high resolution, but it was not recommended for routine metabarcoding due to difficulties in amplification and low representativity of the reference library. A large work on assessing the utility of the gene markers COI, rbcL, ITS, tufA, UPA and 18S for freshwater green algae was done by Hall et al. [116]. They tested representatives of seven distantly related species groups from classes Chlorophyceae, Charophyceae and Zygnematophyceae (151 strains, 40 species total). As a result, the authors concluded that 18S, UPA and COI would be poor choices for a DNA barcode in green algae (18S and UPA proved insufficiently variable and COI difficult to amplify). ITS, rbcL and tufA were sufficiently variable to distinguish most species of Chlorophyceae, but additional primers were sometimes needed for amplification. For the charophytes, rbcL was noted as the most suitable primer but with a remark that it was impossible to differentiate species using this marker alone.

3.4. 18S—Choosing a Variable Barcode Region for Eukaryotes In Silico

The eukaryotic gene 18S-rRNA is used for species delimitation in almost all groups of freshwater algae [8]. It contains nine hypervariable regions (V1 to V9), each of which has been considered as a short barcode for species identification (with the exception of V6, because this region is more conserved in eukaryotes) [121] (and references in it). The question of using hypervariable regions as barcode markers for eukaryotes in silico has been discussed in several publications. Based on an alignment of eukaryotes containing 24,793 positions from the SILVA database, the characterization of the 18S rRNA gene and the design of universal eukaryote specific primers were provided by Hadziavdic et al. [13]. To describe the nucleotide variation in the alignment, the authors used Shannon entropy values. The results suggested that the V2, V4 and V9 regions were best suited for biodiversity assessments (they yielded the highest taxonomic resolutions at cut-off values ranging 95–100% for the sequence identity). The V1 region is rather short (ca 100 nt) and contains a highly conserved core segment, and the V3 and V5 regions lack highly variable segments and are not very long. V7 has a highly variable core of approximately 20–25 nt. The V8 region is over 150 nucleotides long with variable and conserved positions interspersed across the region, with a conserved segment towards the 3′ end. 

3.5. 18S rRNA Gene Metabarcoding: V4 vs. V9

Bradley et al. [54] examined the effect of PCR/sequencing bias of the V4 and V8–V9 regions on community structure and membership using seven microalgal mock communities consisting of 12 algal species across five major divisions of eukaryotic marine and freshwater microalgae. The authors found a critical shortcoming of the V4 primer set as used in the literature [43] and described the failed sequencing runs. The V4 region failed to reliably capture 2 of the 12 mock community members (the haptophytes Prymnesium parvum and Isochrysis galbana), whereas the V8–V9 hypervariable region more accurately represented the mean relative abundance and alpha and beta diversity. Bradley et al. [54] found that degeneracies on the 3′ end of the current V4-specific primers impacted the read length and mean relative abundance. They modified the TAReukREV3 reverse primer and suggested the V4r primer without degeneracies on the 3′ end for the subsequent sequencing. A comparative analysis of the V4 and V9 regions of 18S rDNA of the eukaryotic community of a pond [53] showed a remarkable discrepancy: the inventory of the major subdivision groups in the V9 region dataset did not correspond to that in the V4 region dataset. Eukaryotic OTUs for the V9 region were 20% more abundant than those for the V4 region at a 97% identity threshold. V9 also showed a larger diversity from the point of view of taxonomic coverage. The classes Karyorelictea, Prostomatea and Nassophorea in Ciliophora and the family Perkinsida (‘Alveolata’ group) were not detected using the V4 sequencing data, whereas they were detected using the V9 sequencing data. V4 missed Echinamoebida, Eumycetozoa and Euamoebida and green microalgae classes Chloropicophyceae, Pyramimonadophyceae and Mamiellophyceae. The authors noted “… the simultaneous application of two biomarkers may be suitable for understanding the molecular phylogenetic relationships”. In an investigation dedicated to a eukaryotic community in anaerobic wastewater treatment systems [48], the V4 and V9 regions also detected different taxonomic groups. The authors suggested that commonly used V4 and V9 primer pairs could produce a bias in eukaryotic community analyses. The number of sequences of the amplicon library for the V9 region was almost two times larger than the number of sequences of the V4 amplicon library (340,054 vs. 180,678). The V4 region-specific primer pair showed that the dominant group was fungi. A comparison of the 18S rRNA V4 and V9 regions for coastal phytoplankton communities with a focus on Chlorophyta [122] showed that the V9 region provided 20% more OTUs built at 97% identity than V4. Interestingly, the expectations were the opposite: the authors assumed that V4 as the longer region would detect more OTUs. The authors noted that both markers work “…equally well to describe global communities at different taxonomic levels from the division to the genus and provided similar Chlorophyta distribution patterns”. The authors concluded that V9 was the better choice for Chlorophyta, as it was more discriminating than V4. In the same cases for prasinophytes clade VII, V9 OTUs allowed to discriminate all subclades defined to date, while, in V4, several clades collapsed together. However, there was also an opposite example: “The V9 region of some Chlamydomonas is very similar to that of prasinophytes clade VII A5”. The authors emphasized the importance of the existence of reference sequences in databases, the absence of which, for instance, prevented the assessment of Dolichomastigales (Chlorophyta and Mamiellophyceae) diversity using V9. Similar results were demonstrated on marine picoeukaryotes [123], amoebae [124] and zoonotic trichomonads [125].

3.6. Internal Transcribed Spacer Ribosomal DNA (ITS) in Metabarcoding Researches

The ITS region is the accepted DNA barcode for fungi and a strong locus for delimiting or identifying species from different algal groups, such as Chlorophyta, Dinophyceae, Chrysophyceae, Xanthophyceae and Eustigmatophyceae [7,8]. Therefore, the usage of this region for metabarcoding has positive prospects, with a high probability of identifying nucleotide sequences at the species level. The V9-ITS1 region of the 18S was chosen for the large-scale research of freshwater protists from 217 freshwater lakes across Europe [68,69,70]. The studies were aimed at identifying the diversity dynamic of the protist communities relative to the geographic distance and mountain range structures [68], centers of endemism [70] and models of interactions between the protist community and bacteria [69]. In regard to algae, the diversity of the following groups was determined in these studies: Dinophyceae, Chrysophyceae, diatoms, Cryptophyta and Viridiplantae (green algae).  The ITS2 gene region is the best marker for DNA barcoding of Chlorophyta. This marker resolves major green algae lineages (some with high bootstrap support), has a high resolution for taxonomic assessment (enables the most species to be distinguished) and a high level of universality (i.e., in primers for PCR) [127] (and references in it). This region was successfully used in the first studies of the diversity of Viridiplantae (including green microalgae) in the Antarctic using the metabarcoding approach in soil and rock surfaces samples [72,128], sediments from lakes [74] and glacial ice [73]. The interpretation of sequences was carried out using the PLANiTS2 database [107], and most of the taxa were identified to the species level.

3.7. Gene Markers for Diatoms

Diatoms are well-known ecological indicators of aquatic ecosystems and are widely used for routine monitoring. Indexes of the water quality in rivers and lakes have been developed on the basis of diatoms and are used in EU countries (the Water Framework Directive in Europe), the USA (the National Water Quality Assessment Program in the USA), Canada, Australia and New Zealand [105,129,130,131]. Therefore, adapting the metabarcoding method for use as a tool for ecological assessment is a relevant task of modern research. Visco et al. [32] showed a strong similarity between the DI-CH (the Swiss Diatom Index) values inferred from microscopic and V4 18S NGS analyses of diatom communities. However, the authors noted that the interspecies variability of this barcode might change between different genera, and its effectiveness would depend on the taxonomic composition of the diatom community. The V4 resolution did not allow to unambiguously assign Navicula species, but it was sufficient to distinguish most of the species of Nitzschia and Gomphonema. The rbcL gene marker has a wider application for studying diatom communities, and thanks to the establishment of a quality reference database Diat.barcode/R-syst:diatom [105], it can already be considered the standard for diatom metabarcoding. The resolution of the rbcL 312 bp marker on the level of intraspecific and cryptic diversity was successfully demonstrated by Pérez-Burillo et al. [80]. Benthic diatom samples (n = 610) were studied with a special focus on several ecologically important diatom species that are also key for the Water Framework Directive monitoring of European rivers: Fistulifera saprophila, Achnanthidium minutissimum, Nitzschia inconspicua and Nitzschia soratensis. As a result, it was shown that intraspecific and cryptic diversity can be assessed and understood through the application of DNA metabarcoding.  A longer rbcL region (331 bp) was suggested as a result of large-scale research (500 benthic samples from 250 sites in England) with the aim of adopting a metabarcoding approach for ecological status assessment using diatoms [17]. The choice of region was based on an analysis of 390 sequences from a database. Eleven conservative regions of the rbcL gene with >96% identity were identified. These regions were used for developing primers. Variable regions were also analyzed, and four of these showed good potential for species delimitation. 

3.8. Specific Primers Targeted to rbcL Region Detected a High Diversity of Eustigmatophyceae

ВысокоеA high diversity разнообразиеof Eustigmatophyceae было обнаружено в образцах ДНК из окружающей среды с помощью новых специфических праймеров, нацеленных на L-областьwas found in environmental DNA samples with the help of new specific primers targeted at the rbcL rbcregion [97] 97(Table ]1). АвторыThe authors

compared сравнилиtheir свои результаты с предыдущими исследованиямиresults to previous studies concerning Eustigmatophyceae и пришли к выводу, что разнообразие этой группы недооценено. Разработанные праймеры позволили выявить 184 гаплотипа АСВ, которые относятся либо кand concluded that diversity of this group was underestimated. The designed primers allowed to detect 184 ASV haplotypes that were either Eustigmatophyceae (179), либо, возможно, к or possibly Eustigmatophyceae (15), в то время как в предыдущих работах представители этой группы встречались лишь в виде редких или единичных находок. Чувствительность while, in previous works, representatives of this group were reported only as rare or single finds. The sensitivity of eustigmatophyte-directed rbcL primers was compared higher to universal eukaryotic 18S primers. The authors suggested that the employed

techniques can be used for future studies of праймеровthe rbcpopulation Lstructure, направленных на эустигматофиты, была выше по сравнениюecology, distribution and diversity of this class. 3.9. сComparison универсальными эукариотическими праймерами 18S.of rbcL and 18S Markers for Freshwater Diatoms Biomonitoring

3.9. Сравнение маркеров rbcL и 18S для биомониторинга пресноводных диатомовых водорослей

НеубедительныеInconclusive results результатыwere были получены при использовании маркеровobtained in a study using rbcthe rbcL иand V4 18S рРНК [rRNA markers [34] 34for ]benthic для биомониторинга донных диатомей в пресноводных местообитаниях Северной Европы.diatoms biomonitoring in freshwater habitats of Northern Europe. The

classes Классыof экологического состояния существенно различались в зависимости от использованного метода: только 48 % проб с маркером 18S и 37,5 % проб с маркером rbcecological condition differed significantly depending on the used method: only48% of samples with the 18S marker and 37.5% of samples with the LrbcL имелиmarker тот жеhad the

same экологическийecological статус, что и при морфологическом анализе. Оценка экологических условий по разным маркерам дала разные результаты. Авторы связали это с различиями в таксономическом охвате соответствующих справочных баз данных и специфичности праймеровstatus as with the morphological analysis. The assessment of the ecological conditions gave different results using different markers.

3.10. Пластидный маркерA 23S рДНК для одновременной детекции эукариотических водорослей и цианобактерийrDNA Plastid Marker for Simultaneous Detection of Eukaryotic Algae and Cyanobacteria

УниверсальныйThe universal пластидный ампликон (УПА) представляет собой вариабельный домен V гена пластидной рРНК 23S длиной ~330 п.н. Эта область была предложена Шервудом и Престингом [plastid amplicon (UPA) is the variable Domain V of the 23S plastid rRNA gene ∼330 bp in length. This region was proposed by Sherwood and Presting [99] 99as ]a в качестве маркера пластидсодержащих организмов, т. е. всех линий эукариотических водорослей и цианобактерий. В этом исследовании была разработана одна пара универсальных праймеров, и было указано, что эти точные последовательности праймеров присутствуют только у цианобактерий и пластид. Однако сравнения с другими маркерами показали недостаточную эффективность УПАmarker for plastid-containing organisms, i.e., all lineages of eukaryotic algae and Cyanobacteria. In this research, a single pair of universal primers was designed, and it was indicated that these exact priming sequences are present only in cyanobacteria and plastids. However, comparisons with other markers showed the insufficient effectiveness of UPA.

3.11. ГенThe 16S рРНК как маркер для одновременной детекции прокариот и эукариотrRNA Gene as a Marker for Simultaneous Detection of Prokaryotes and Eukaryotes

ГенThe 16S рРНК был впервые предложен в качестве маркера метабаркодированияrRNA gene was first proposed as a metabarcoding marker by Eiler et al. в 2013 [ 101] on the basis of it being universally present in prokaryotes (including cyanobacteria), as well as in chloroplasts of eukaryotes. This enabled the simultaneous detection of prokaryotic and eukaryotic phytoplankton taxa. The authors analyzed the phytoplankton diversity from 49 lakes, including three seasonal surveys, and assessed the data using NGS and microscopy. The NGS approach detected 1.5–2 times more OTUs than there were taxa found by the microscopy approach. A more detailed comparison of taxonomic groups revealed that Heterokonta, Euglenophyta, Cryptophyta and Dinophyta were overrepresented in the microscopic biovolume dataset compared to the NGS data, whereas Cyanobacteria were proportionally overrepresented in the NGS dataset compared to microscopic biovolume data. The authors noted that Dinophyta, a major phylum in microscopic data, was poorly detected by NGS in some lakes. Discrepancies also included Euglenophyta and Heterokonta that were scarce in the NGS but were frequently detected by microscopy. The NGS approach detected a deep-branching taxonomically unclassified cluster that could not be linked to any group identified by microscopy. НедавноRecently, Bonfantine и соавт. [et al. [103] 103explored ]the исследовали потенциал стандартной пары праймеров potential of a standard V4 515F-806RB для восстановления последовательностей 16SrРНК диатомовых пластидprimer pair in recovering diatom plastid 16SrRNA sequences. PhytoREF был использован для классификации прочтений 16S из 72 образцов пресноводной биопленки. Основываясь на выравнивании нуклеотидов was used to classify the 16S reads from 72 freshwater biofilm samples. Based on the Clustal, авторы подтвердили различия между последовательностями эукариотических хлоропластов и прокариот. «Прочтения nucleotide alignment, the authors confirmed the differences between eukaryotic chloroplast and prokaryotic sequences. “The Ochrophyta и других эукариот показали высокую консервативность последовательности без 3'-несоответствий в последних 5 основаниях как прямого, так и обратного праймеров , and other eukaryote reads, showed high sequence conservation with no 30 mismatches in the last 5 bases of both forward and reverse 16S v4-515F иand V4-806R. Два несовпадения с 16S рРНК primers. Two mismatches to the E. coli 16S rRNA E. coli (GT противvs. TA) наблюдались во всех выровненных не-wereobserved across all aligned non-E. E. colicoli 16S РНКRNA последовательности на 15 оснований выше сайта связывания праймера sequences 15 bases upstream of the V4-806». Было идентифицировано более 90% прочтений диатомовых водорослей в каждом образце речной биопленки. Авторы обнаружили значительное бета-разнообразие в сообществах диатомовых водорослей и различение речных сегментов. На примере трех австралийских экологических наборов данных 16S рРНК, выбранных из primer-binding site.” More than 90% of the diatom reads in each stream biofilm sample were identified. The authors found significant beta-diversity in diatom assemblages and discrimination among river segments. In an example of the three Australian environmental 16S rRNA datasets selected from NCBI-SRA, было показано, что большинство OTU диатомовых водорослей it was shown that most of the diatom OTUs

(67 изout of 71) были обнаружены в других австралийских экосистемах. В результате авторы пришли к выводу, что гены 16S рРНК диатомовых пластид легко амплифицируются со стандартными наборами праймеров. «Поэтому объем существующих наборов данных ампликонов 16S рРНК, первоначально созданных для профилирования микробного сообщества, также можно использовать для обнаружения, характеристики и картирования распределения диатомовых водорослей для информирования о филогении и оценках экологического здоровья,were detected in other Australian ecosystems. As a result, the authors concluded that diatom plastid 16S rRNA genes are readily amplified with the standard primer sets. “Therefore, the volume of existing 16S rRNA amplicon datasets initially generated for microbial community profiling can also be used to detect, characterize, and map diatom distribution to inform phylogeny and ecological health assessments, and can be extended into a range of ecological and industrial applications.”

3.12. СравнениеComparing подходов: метабаркодирование и морфологическая идентификация (соответствие между методамиApproaches: Metabarcoding vs. Morphological Identification (Congruency between Methods)

СравнениеComparing the результатов, полученных с помощью LM и NGS, позволяет выявить расхождения между этими методами и причины этихresults acquired by using LM and NGS allows to reveal discrepancies between these methods and the causes of these discrepancies to determine the efficiency of

the расхождений,amplification определить эффективность амплификации выбранных генетических маркеров и выявить проблемы в биоинформатической обработке и таксономической атрибуции. Каждая отдельная работа, в которойof the chosen genetic markers and to identify problems in bioinformatical processing and taxonomic attribution. Every single work that compared the morphological

and сравнивалисьmolecular морфологический и молекулярный подходы к изучению разнообразия сообществ водорослей, указывала на существенную разницу в полученных таксономических списках. Количество таксонов, обнаруженных обоими подходами,approaches for studying the diversity of algal communities indicated a significant difference in the resulting taxonomic lists. The number of taxa detected by both

approaches колеблетсяfalls от 7,4 до 25,between 7.4 and 25.7%. Some studies have shown that diversity detected by NGS is much higher than that found in LM. In a research of diatom diversity, Zimmermann et al. [132] reported that about 2.5 times more taxa were found by the NGS approach (263 taxa vs. 102 taxa in LM). In an example of studying benthic diatoms, Bailet et al. [34] showed that the metabarcoding method using the 18S marker revealed 27% more taxa than the morphological method and 38% more taxa using the rbcL marker.  The challenges in metabarcoding analysis:
  • Gap in the reference database [16,27,32,34,38,46,62,82,90,99], etc.
  • Diatom extracellular skeletons are counted in LM even if they come from dead cells. The valves of dead cells can be transported from locations other than the target assemblage. Metabarcoding will not detect these dead cells [35,78].
    The natural intraspecific and intragenomic variabilities of the barcoding marker (single taxon has multiple genotypes at the barcoding region, and members of that taxon might cluster into different Molecular Operational Taxonomic Units (MOTUs)) [35].
  • The proportion of live diatoms found in environmental samples varies greatly, ranging from 2 to 98% [35].
    Cryptic diversity—a single morphological species can represent different genetic groups (e.g., diatoms Sellaphora pupula, Pinnularia borealis, Hantzschia amphioxys and Nitzschia inconspicua and species of Stichococcus, Coccomyxa, Chlorokybus, Cryptomonas, etc.) [78,137,138,139,140,141,142].
  • Small-celled species and pico-sized cells are often overlooked or underestimated by the morphological approach. For example, the valves of Fistulifera saprophila tend to dissolve during sample processing, which can explain why this species is often missed during morphological identification [78,80,82,99].
    MOTU richness can be artificially inflated through technical errors at different steps of sample processing during amplification and sequencing [35].
  • LM misidentification (false LM positives) [27,78,99].
    The MOTU delimitation approach influences the richness estimation and interpretation [35] (and references in it) (assessment of the bioinformatics pipelines provided in [14,15]).
  • Differences in the detection limits of the two methods: morphological and molecular approaches do not give the same insight into communities of algae and, therefore, do not have the same detection capacity for species [27,92,99].
    Complete absence of amplification on the whole due to a mismatch of the primer set used. For example, Salmaso et al. [27] did not find any species belonging to the Euglenales in the HTS results (with universal eukaryotic primers (TAReuk454FWD1 and TAReukREV3) for V4 18S), although they were present in LM. Hanžek et al. [66] reported that the taxa that contributed most to the biomass (Actinotaenium/Mesotaenium sp. and the species Cosmarium tenue, Pantocsekiella comensis, Sphaerocystis schroeteri and Synedropsis roundii) were not identified by eDNA metabarcoding (V9 18S region was amplified using the universal primer pair 1391F and EukB). Proeschöld and Darienko [140] noted that, although Stichococcus-like organisms are widely distributed in almost all habitats, they are not recorded in environmental studies based on HTS approaches, because the V4 or V9 regions of the SSU contain introns that obstruct amplification. Groendahl et al. [42] reported that Monorhaphidium sp., Selenastrum sp. and Trachelomonas sp. detected using the morphology-based approach were not identified by the metabarcoding approach, despite the fact that all three genera are included in the reference database.
  • The different sample volumes settled for microscopy and metabarcoding [143].
    Uncertainties and lack of sensitivity of reference databases for the selected DNA markers [27].
The challenges related to morphological identification:
  • Underlying units used for microscopy (individual cells) and those used for metabarcoding (ASV sequences) are quite different, making direct comparisons imperfect [
  • ,
  • ].
  • A short barcode gene fragment may have limited the taxonomic resolution [143]. For example, the resolution of the V4 18S region does not allow to unambiguously identify some species of Navicula [32]. For the V7–9 18S marker, a lack of intergenus taxonomic resolution was found (the MOTUs matched multiple genera, e.g., Alexandrium pseudogonyaulax and A. hiranoi, Chaetoceros neogracile and C. curvisetus and Thalassiosira eccentrica and T. antarctica) [144]. In some Chlamydomonas, the V9 region is very similar to that of prasinophytes clade VII A5 [122].

3. Conclusions

Metabarcoding has already been accepted as an alternative (faster and more economical) method to the traditional microscopy method for the ecological assessment and monitoring of freshwater bodies of water, rivers and seas based on microalgae. Protocols, technical guidelines or standards for eDNA monitoring are developed and/or approved in many countries [145,146,147,148,149]. Currently, there is no one perfect marker for identifying microalgae across the whole diversity. Among the most popular genetic barcodes for freshwater metabarcoding, it can highlight the nuclear regions V4 and V9 18S rRNA (which allow to determine the composition of auto- and heterotrophic eukaryotes) and the region of the plastid gene rbcL (for diatoms). The regions ITS1 and ITS2 might be underestimated, but they show good potential for usage as a microalgae barcode; they can be easily amplified with the standard primers and are variable enough to identify sequences to the species level. Below, the main advantages (+) and disadvantages (−) of markers that are used for freshwater microalgae metabarcoding are summarized. rbcL. “+” widely used for diatom metabarcoding, distinguishes between taxa at the species and intraspecies levels; a high-quality curated reference database for taxonomic attribution Diat.barcode [105]. “−” extremely heterogeneous in green algae; does not have a set of universal primers [127]; only diatoms are identified well. Notes: the majority of studies use the region with the length of 263 bp (312 bp including primers) and a complicated set of primers (three forward primers and two reverse primers) suggested by Vasselon et al. [16,90] (bcL primers Set 1). However, recently, it has been shown that a longer region of 331 bp (common region 263 bp, proposed by Kelly et al. [17,18], primers set rbcL 646F-rbcL 998R) has a higher resolution for species and intraspecific variants [86]. V4 18S. «+» широкоwidely used используется для метабаркодирования морских и пресноводных эукариот; успешно амплифицируется универсальным набором праймеров. Область V4 (названная пре-штрих-кодом) была определена в качестве отправной точки для идентификации протистов в рамках проекта Международного консорциума штрих-кодов жизниfor metabarcoding of marine and freshwater eukaryotes; successfully amplified by a universal primer set. The V4 region (named pre-barcode) was designated as the starting point for the identification of protists in the International Barcode of Life Consortium Project (iBOL, http://www.ibol.org/http://www.ibol.org/  (поaccessed состоянию на 22 маяon 22 May 2023 г.2023) иand Рабочей группы по протистамthe Protist Working Group (ProWG) [ 150 150]. ДаетProvides an представление о молекулярных филогенетических отношенияхunderstanding of molecular phylogenetic relationships. «-» поcompared сравнению с with V9 18S пропускает гаптофиты [, misses haptophytes [54], 54many ],groups многие группы гетеротрофов и зеленые водоросли из классовof heterotrophs and green algae from the classes Chloropicophyceae, Pyramimonadophyceae иand Mamiellophyceae [ 53 53]. ОбластьThe V4 менее вариабельна по сравнению с областью V9. Часто не дифференцирует видыregion is less variable compared to the V9 region. Often does not differentiate species. ВV9 18СS. «+» широкоwidely используется для метабаркодирования морских и пресноводных эукариот; успешно амплифицируется универсальным набором праймеров; был выбран для амплификации эукариот в рамках глобального проекта «used for marine and freshwater eukaryotes metabarcoding; successfully amplified by a universal primer set; was chosen to amplify eukaryotes in the global project “The Earth Microbiome Project» (EMP; http://www.earthmicrobiome.org (поaccessed состоянию на 22 мая 2023 г.on 22 May 2023)); V9 более изменчив, чем V4; обеспечивает больше OTU и разнообразие таксонов более высокого уровня (надгруппа и типis more variable than V4; provides more OTUs and diversity on higher level taxa (supergroup and phylum). «-» короткийa short участок (96–134 пн [ 36 region (96 bp–134 bp [36]; иногдаsometimes does не дифференцирует видыnot differentiate species. ПримечанияNotes: области V4 и V9 обнаруживают различные таксономические профили на уровне рода, семейства и макро-таксонов. Оба маркера рекомендуются для более полного пониманияthe V4 and V9 regions detect different taxonomic profiles at the genus, family and macro-taxa levels. Both markers are recommended for a more complete understanding

of структурcommunity сообществаstructures. ITS (регионы V9-ITS1 илиor ITS2 используются для меташтрихкодированияregions are used for metabarcoding). «+» is a сильный локус для некоторых водорослейstrong locus for some algae (Chlorophyta, Dinophyceae, Eustigmatophyceae иand Xanthophyceae) [ 7 7]; достаточноsufficiently изменчива; позволяет различать виды; амплифицируется универсальнымvariable; allows to differentiate species; amplified by a

universal наборомprimer праймеров. Для интерпретации данных, включаяset. A specialized curated reference dataset “PLANiTS” [107] exists for interpreting data, including ITS1, ITS2 и целые последовательности ITS and entire ITS sequences of Viridiplantae, существует специализированный набор эталонных данных «PLANiTS»  [ 107 ]. «-» дляfor diatoms, this barcode has a диатомовых водорослей этот штрих-код имеет большую вариабельность длины региона и проблему внутрииндивидуальной изменчивостиgreat variability in the length of region and a problem in intraindividual variation [ 110 , 113 ]. 23СS (УПАUPA) . «+»  — специфические для водорослей маркеры, ориентированные на пластидсодержащие эукариотические водоросли и цианобактерии; достаточно амплифицируется универсальным набором праймеровalgal-specific markers focused on plastid-containing eukaryotic algae and Cyanobacteria; sufficiently  amplified by a universal primer set. «-» консервативнаяa conserved область; низкое разрешение; идентифицирует таксоны только до уровня рода или выше; не является сильным местом для водорослейregion; low resolution; identifies taxa only to the genus level or higher; is not a strong locus for algae. 16S (регионы V3–V4 и V4 используются для меташтрихкодированияand V4 regions is used for metabarcoding). «+» ориентированfocused on the на хлоропласты эукариот и прокариот (цианобактерии); достаточно амплифицируется универсальным набором праймеров; может использоваться для одновременного обнаружения прокариот и эукариотических водорослейchloroplasts of eukaryotes and prokaryotes (Cyanobacteria); sufficiently amplified by a universal primer set; can be used to simultaneously detect prokaryotes and eukaryotic algae. «-» смещенbiased в сторону бактерий в сообществе; может неточно отражать разнообразие фитопланктона из-за эндосимбиотического происхождения хлоропластов; не является сильным местом дляtowards bacteria in the community; might not accurately reflect the phytoplankton diversity due to the endosymbiotic origin of chloroplasts; is not a strong locus

for водорослейalgae.  ПримечанияNotes: используется очень редко для одновременного комплексного анализа прокариот и эукариотических водорослейis used very rarely for the simultaneous complex analysis of prokaryotes and eukaryotic algae. ВIn general, it should be taken into account that every целом следует учитывать, что каждый маркер будет демонстрировать разный образ сообщества, что зависит от успешной амплификации выбранного регионаmarker will demonstrate a different image of the community that depends on successful amplification of the chosen region.

The Выявлениеidentification of the taxonomic composition and the таксономического состава и уровня таксономической принадлежности зависит от региональной изменчивости и качества справочных баз данныхlevel of taxonomic attributiondepends on the region variability and the quality of the reference databases. 
Video Production Service