Natural resources are considered a promising source of microorganisms responsible for producing biocatalysts with great relevance in several industrial areas. However, a significant fraction of the environmental microorganisms remains unknown or unexploited due to the limitations associated with their cultivation in the laboratory through classical techniques. Metagenomics has emerged as an innovative and strategic approach to explore these unculturable microorganisms through the analysis of DNA extracted from environmental samples.
1. Introduction
The natural resources available on Earth have played an important role in the history of human civilization since they have provided the necessary materials and energy for the preservation and proliferation of life. Besides the basic sustenance, proper use of these resources can also contribute to the improvement of our comfort, protection and well-being [1]. Over the centuries, we have witnessed the continuous exploitation of some finite natural resources, which resulted in the degradation of important ecosystems, subsequently creating severe environmental, economic and technological issues [2,3]. This overconsumption of materials and energy will eventually lead to inevitable resource depletion. Consequently, there is a need to moderate the demand for finite natural resources and simultaneously search for efficient and sustainable ways to extract and convert energy/materials from renewable resources [4]. In this context, biocatalysts can play a crucial role. Due to their high specificity and selectivity, the biocatalysts can generally ensure the effective conversion of substrates, minimizing the formation of undesirable side-products and reducing the energetic costs associated with the process. Hence, the use of suitable and robust biocatalysts can greatly contribute to the implementation of greener and sustainable bioprocesses that efficiently compete with the classical chemical routes. Currently, the use of biocatalysts for the valorization of alternative non-finite resources, under the concept of bioeconomy and the EU Green Deal, has gained increased attention.
Unexplored or slightly explored environments, such as soil and water, are interesting sources of novel and promising biocatalysts. Despite the clear differences at the physicochemical level, soil and water are both regarded as natural bio-reservoirs with great microbial diversity. For this reason, these environments have been the focus of several microbial studies in the last few decades. Microorganisms are considered important suppliers of various bioproducts with applications in several industrial areas, such as enzymes. In fact, in the last decade, we have seen a significant increase in the demand for enzymes [5], which is easily explained by their great biotechnological potential. However, the presence of a significant number of unculturable microorganisms both in the soil and water can limit or make unfeasible some microbial studies to find novel biocatalysts. In this context, metagenomics can play a crucial role.
Metagenomics has emerged as a culture-independent technique that allows exploring the genetic material of whole microbial communities present in a given environment [6]. This technique has been successfully used to identify novel enzymes with promising catalytic activities and some of them have been patented and already translated to the market [7]. Two different metagenomic approaches have been described, namely, sequence-based or function-based metagenomics. In both cases, an initial step of DNA extraction from an environmental sample is needed [8]. The sequence-based studies allow the identification of candidate genes, while the function-based screenings include the detection and isolation of clones from metagenomic libraries with a positive response to the desired phenotype [9]. The construction of a metagenomic library requires the selection of the most suitable expression vector, in which the environmental DNA fragment will be inserted. In addition to other aspects, such as the quality and size of the environmental DNA, this selection depends on the purpose of the functional screening. Plasmids can be used when DNA fragments are small (≤15 kb insert size) and contain only individual genes. On the other hand, some expression vectors, such as fosmids and cosmids (<40 kb insert size), or bacterial artificial chromosomes (BACs) and yeast artificial chromosomes (YACs) (>40 kb insert size), allow the recovery of large biosynthetic gene clusters that encode the production of one or more specialized metabolites. Besides the expression vector, expression systems should be selected in such a way that gene expression and target gene detection are maximized. Escherichia coli has been widely used due to the extensive genetic knowledge on this microorganism that makes it suitable for effective and profitable cloning and protein expression [10,11]. However, the demand for robust biocatalysts requires functional screenings at temperatures other than the growth temperatures of mesophilic hosts. Therefore, other expression hosts, such as Thermus thermophilus, have already been shown to be good candidates as expression hosts in the functional metagenome analysis [12].
2. Soil
Soil is more than just land; it represents one of the most important natural resources available on Earth. It is considered the basis of agriculture, an important water filter, a natural reserve of carbon and water and also the regular habitat for several living organisms. Soils originate from rocks through complex chemical, mechanical and biological processes which result in the formation of small particles and grains [13]. In terms of composition, soils are generally constituted by solid, liquid and gas fractions in a 2:1:1 ratio (volume basis). The solid fraction is mainly composed of inorganic materials but also contains a small portion of organic matter (around 1–5%) [14]. The liquid phase is generally an aqueous solution of electrolytes (e.g., micronutrients and macronutrients). On the other hand, the main gases present in soil are water vapor, CO2, O2 and N2 [13]. Depending on its origin and composition, different types of soil can be defined. Microbial diversity and functionality are strongly dependent on the specific characteristics of each type of soil.
2.1. Raw Resources
Depending on the geographical location, the soil may present itself in various forms at the Earth’s surface as the result of environmental conditions, organic matter content and type of vegetation [15]. An abiotic factor with high importance in any ecosystem is temperature since it can directly or indirectly affect microbial activity and, consequently, the composition of microbial communities. Each microbial species has a different temperature tolerance range and is capable of producing distinct types of biocatalysts at different production rates. Still, certain microorganisms can thrive and function well metabolically in adverse conditions, notably at extreme temperatures [16,17].
Therefore, the sampling locations in which the soil was directly collected and evaluated “as found in the nature” (raw resources) were classified according to the recorded temperature: low temperature (location temperature below 20 °C); moderate temperature (location temperature between 20 and 45 °C); and high temperature (location temperature above 45 °C) [17].
As shown in Figure 3a, a greater number of studies were accomplished in low-temperature environments (27 studies), followed by high-temperature environments (26 studies) and, finally, moderate temperature environments (5 studies). The unique characteristics of extreme environments, namely, extreme temperatures, make them more interesting places to find promising and robust enzymes. This fact is probably the main reason for the reduced number of studies accomplished in moderate temperature environments. Furthermore, there is an increasing need to find highly stable biocatalysts with efficient activities capable of acting in various industrial processes that make use of severe conditions, similar to what occurs in hot and cold environments [18].
2.1.1. Low-Temperature Environments
Cold soils, in addition to being exposed to very low temperatures, are also subject to other harsh conditions, such as freeze-thaw cycles, UV radiation and restricted availability of water and nutrients. Indeed, arid/semiarid regions, aquatic environments, polar regions, mountainous and forest areas are examples of cold habitats that have been the focus of metagenomic studies. Although they usually have low biodiversity, the microorganisms that inhabit them have acquired survival machinery and, for this reason, these types of habitats constitute stimulating reservoirs of biotechnological molecules [74].
Deep-sea sediments, mainly sands and clays from several aquatic environments, are the type of samples that most contribute to the exploration of enzymes in these cold habitats. The South China Sea is a marginal sea and one of the most studied environments since it constitutes an important reservoir of sediments rich in organic materials [75]. Other regions of the Pacific Ocean, such as Suruga Bay in Japan [31], as well as the depths of the Atlantic Ocean [51] and the Arctic Ocean, notably in the Barents Sea area [48], have also been studied to search not only for lipolytic enzymes but also enzymes belonging to the glycosyl hydrolases class. Additionally, the submarine tufa columns of the Ikka Fjord in Greenland [21] and the karstic lake in Spain (Lake Arreo) [40], which, in addition to being permanently cold, are alkaline, and valuable sources of microorganisms adapted to these environments. In addition to aquatic sediments, other types of samples with metagenomic exploitation potential come from arid/semiarid and mountain soils, for example in the Ladakh region [70] and the Apharwat mountain [41,44], respectively, both in the north-western Himalayas that have distinct geo-climatic characteristics like extremely cold and dry weather, high altitude and glacial and permafrost soils. Relevant examples of this are the Karuola glacier in Tibetan Plateau [45] and the Kolyma Lowland permafrost in north-eastern Siberia [47], which are extremely hostile environments and inhabited by unique microbial communities.
2.1.2. Moderate Temperature Environments
There are few studies in which sampling sources are identified as moderate temperature environments. Still, certain environments of this nature have other characteristics that also make them interesting for metagenomic purposes. An example is the Caatinga biome of João Câmara (Brazil) which presents sandy loam soil and constitutes an ecosystem of high biological relevance due to the features of the area, such as the semiarid climate, the high exposure to UV radiation and the long periods of drought [71,76]. Another example is the solar saltern of Goa, which differs by its high salinity and represents an important source for metagenomic studies given the difficulty in cultivating halophilic microorganisms through conventional techniques [52].
2.1.3. High-Temperature Environments
High-temperature environments have also proven to be important sources of very useful thermostable enzymes with applications in various industrial fields, such as food and chemical synthesis industries. In addition to geothermally heated environments, such as hot springs and hydrothermal vents, arid/semiarid regions and environments subject to natural composting processes are often good targets for the application of metagenomic tools [77]. Compost samples from Expo Park in Japan, produced from leaves and branches, are a good example of natural composting since they have been studied over a few years [24,42,59,69]. Once thermophilic composting reaches high temperatures, there is a greater predominance of microorganisms capable of degrading complex molecules, with this type of environment being a potential source of lignocellulose-degrading enzymes and, for this reason, an interesting subject of study [78]. Several arid/semiarid regions have been explored given the typical characteristics of these environments, including deserts [66] and also other sites—more specifically, the Turpan Basin, which represents China’s hottest place and has proven to be a valuable source of different types of highly thermostable enzymes [57]. Hot springs and hydrothermal vents from different portions of the planet, e.g., Caldeirão hot spring in Portugal [28], Solnechny hot spring in Russia [46] and Solfatara-Pisciarelli hydrothermal pool in Italy [34] have also contributed to finding robust enzymes through the construction of metagenomic libraries using DNA extracted from wet mud and/or sediments collected from these places. For all other metagenomic studies, the expression host used was E. coli, except for the metagenomic library constructed from sediments of a hot spring in the Azores, Portugal, which used the T. thermophilus as the host. Using this thermophilic host, Leis and co-workers intended to increase the probability of detecting genes derived from extreme environments that would encode for new thermostable biocatalysts and allow the screening of phenotypes that are not observable in E. coli [43].
In some studies performed from these raw resources, an additional enrichment step was performed to provide favourable growth conditions for certain microorganisms of interest, often present in small abundance, to the detriment of others [79]. These enrichments were implemented by introducing specific substrates, such as cellulose, xylan, chitin, starch and glucose [31,33,38], and even olive oil [47], that stimulate specific microbial activities. On the other hand, culture enrichment also occurred by controlling environmental conditions, in particular the temperature, which is generally in agreement with the temperature of the sampling locations [26].
Over the past decade, in addition to function-based metagenomic screenings, sequence-based metagenomic screenings have also been performed. Sequential metagenomics showed that the phyla that predominate high-temperature environments are Crenarchaeota, Thaumarchaeota, Acidobacteria and Proteobacteria capable of mineral-based metabolism and generally associated with soil, found more specifically in sediments from hot springs and hydrothermal vents [21,31,34,36,43].
2.2. Human Manipulated Resources
Soil, in addition to being “discovered” as it is exposed in nature, without any kind of alteration, can be studied in a variety of scenarios, including contaminated/polluted, agricultural and controlled composting as a consequence of human manipulations.
2.2.1. Polluted Environments
The intensification of industrialization, urbanization and mining have negatively affected the soil as a natural source. It has been observed that soil has been contaminated by different factors, namely, industrial sewages, solid wastes and urban activities. Some organic and inorganic pollutants have been responsible for soil contamination, such as heavy metals, alkaline or acidic constituents, toxins, oil contaminants and others [146].
It was found that the following categories of polluted samples have been used as the object of metagenomic studies: soils contaminated by oil and its constituents (such as polycyclic aromatic hydrocarbons (PAHs)), fertilizers and other alkaline pollutants, industrial sludges and sediments. Oil production sites [80], soils where oil spills or runoffs have occurred [50,101,111,139], soils near industrial areas [50,87,117,140] and soils treated with fertilizers [136] were particularly analyzed. Since pollutants are rich in toxic compounds, they affect the activity and diversity of microbial communities present in these adulterated soils. Therefore, metagenomic studies have been developed in this type of compromised environment to unravel the gene clusters that encode enzymes involved in the biodegradation of the various pollutants already mentioned. Only the microorganisms that transport machinery capable of resisting and degrading these types of recalcitrant compounds can survive in such environments. The toxic compounds can even act as substrates and enrich some specific microorganisms [147].
The acid mine drainage in Carnoulès (France) is considered an interesting reservoir of enzymes capable of degrading polymers and pollutants simultaneously producing antimicrobial agents, since this polyextreme environment, in addition to being highly acidic, presents high concentrations of heavy metals, such as iron and arsenic, as a consequence of mining [100]. Another example is the saline–alkali soil of Lop Nur (China) which is characterized by extreme aridity and is a location that suffers from severe human manipulation. Since it serves as a basis for monitoring and verification of nuclear tests [148], microorganisms present in this soil are certainly subject to a high degree of stress and, for this reason, it may present an interesting microbial diversity and functionality [143].
Other areas exposed to other components such as fats [99,129] or chitin [97,98] have also been the focus of metagenomic studies as they are potential sources of new genes encoding groups of specific enzymes (lipolytic and chitinolytic enzymes, respectively). Activated sludge from different municipal [50,88,135,145] or industrial effluents, such as pesticide [138], swine [113] and paper and pulp [50] industries, are also a rich source of microorganisms producing enzymes capable of degrading protein, lipids and other pollutants. Of the thirty-one studies, two of them performed the analysis of the 16S rRNA gene libraries constructed, one of activated sludge from a swine wastewater treatment facility and the other one from soil contaminated and enriched with chitin. In both studies, it was found that the most predominant phylum was Proteobacteria [98,113]. Nevertheless, different samples can unravel other dominant ones, since the composition of the sources (activated sludge from industrial or municipal wastewater treatment plants or treated soils) and the type of treatment accomplished may influence the bacterial diversity.
2.2.2. Agricultural Lands and Grassland
Land use and management have a great influence on the functioning of the soil ecosystem. Microbial diversity and functionality are sensitive to land use considering the important role of soil microorganisms in soil formation processes and nutrient cycling [149].
Hence, several metagenomic studies were implemented in fields designed for agriculture and/or grasslands, many of them subject to the ploughing and cultivation of different crops. Rhizosphere soils of, for example, red pepper plants and strawberry plants represent a complex but interesting ecosystem due to the symbiosis and parasitism interactions that happen between plants and microorganisms in these soil regions [92,115,142]. Cotton [132], wheat [116,141], sugarcane [83,90,103], corn [81,144], straw [86] and paddy [94,142] fields are examples of agricultural environments from which samples were collected, in particular from topsoil, to be analysed through metagenomic approaches. The selection of topsoil samples and not samples at higher depths is due to the presence of a higher soil microbial biomass on the surface since there is larger evidence of litter composition and root turnover rates in this type of land [149]. Another important fact is that decomposition of a variety of lignocellulosic residues occurs in these environments making them attractive to isolate lignocellulose-degrading enzymes relevant to several industrial applications.
Three large-scale research landscapes [133] in Germany (Hainich-Dün, Schorfheide-Chorin and Schwäbische Alb) are defined as exploratory environments. They present different geological and climatic conditions and are characterized by the different intensities of use and management of agricultural fields and grasslands. Therefore, they certainly have great microbial diversity. In these environments, 37 novel lipolytic enzymes, the vast majority belonging to the hormone-sensitive lipase family, were reported. These exploratory environments, together with the Oak Park research facility in Ireland [128], are valuable sources of promising biocatalysts, notably lipases/esterases, as their land is essentially fertilized with compost and/or manure and is subject to crop rotation. The crop rotation system benefits certain chemical and physical properties of the soil, which is very important and favourable for soil microorganisms [150].
2.2.3. Industrial Composting
Among the different manipulated sources, composting is considered one of the most important bio-reactions for renewable bioenergy on the planet due to the huge variety of microorganisms capable of degrading lignocellulosic biomass. Composting is a sustainable and efficient microbiological process in which the stabilization of the organic matter occurs due to the passage through a thermophilic phase promoted by the proliferation of thermophilic microorganisms [78,151].
The great contribution of composting to the circular economy has led to an increase in the number of composting facilities that are responsible for the production of compost, rich in humic substances, with high agronomic value in organic fertilization of agricultural soils. Different parameters are controlled and adjusted throughout the industrial composting process, such as temperature, pH, humidity, nature of organic materials, particle size and C/N ratio [151].
Some raw materials used in industrial composting are agricultural and agro-industrial residues, including animal faeces [91,102,107,114,118,119], household wastes [108,121], residues from crop harvesting [82,89,91,102,107,109,110,114,119], green wastes [82,104,108,109,118,121], wood chips and sawdust [43,118] and the organic putrescible fraction of municipal solid waste [93,145].
For the metagenomic studies in which composting samples were used, the sample collection essentially occurred during the thermophilic phase of composting that reaches high temperatures (above 45 °C) due to microbial metabolic activity. Certain studies refer to microbial diversity and confirm the prevalence of thermophilic microorganisms, namely, Actinobacteria, Bacteroidetes, Firmicutes and Proteobacteria, producing enzymes able to degrade the complex molecules that compose the lignocellulosic biomass [43,82,108,118]. Additionally, some studies also report enrichment with lignocellulosic substrates, namely, switchgrass, steam-exploded spruce, cutter chips, Whatman filter paper, pre-treated Miscanthus giganteus and wheat straw. These substrates are firstly incubated with the composting samples at a constant temperature and pH-defined according to the phase at which they were collected to increase the abundance of thermophilic microorganisms and develop a suitable consortium able to degrade lignocellulosic biomass. Over the years, several lignocellulose degrading enzymes have been reported through the analysis of industrial composting metagenomes [82,93,104,108,109].
2.3. Unspecified Resources
The designation “unspecified” includes studies from soil resources that do not fall into either the raw resources section or the section of resources subject to human manipulation, since neither the temperature of the sample collected, the sampling site nor the existence of any human activity which may alter the natural properties of the sample is mentioned.
A large part of the unspecified resources are forests that are made up of microorganisms responsible for mediating biogeochemical cycles in terrestrial ecosystems. Additionally, forest soils have a high microbial diversity due to the great accumulation of organic matter [15]. Several metagenome studies have been developed, notably in soil samples from the Amazon rainforest [152], mangrove forests [153,154,155,156,157], peat-swamp forests [158,159], beech forests [160] and chestnut groves [161], to assess microbial diversity, as well as the metabolic capacities of these communities to decompose natural biomass. A metagenome from the Eucalyptus sp. forest in Brazil revealed that two of the most abundant phyla are Actinobacteria and Firmicutes, commonly associated with forest soils [15,162].
This category also includes samples from soils with high moisture content, such as alluvial soils from Eulsukdo Island (South Korea) [142,163], soils collected from lakes [64,145] and even soils with a significant salinity content [145,164], mountain soils such as an acid peatland site in Germany [159] and arid areas, namely, soils of the Cerrado region of Brazil [165,166] that present a high clay content, low pH and high iron levels. All these environments have interesting characteristics that give them unique ecosystems with high enzymatic potential.
1. Introduction
The natural resources available on Earth have played an important role in the history of human civilization since they have provided the necessary materials and energy for the preservation and proliferation of life. Besides the basic sustenance, proper use of these resources can also contribute to the improvement of comfort, protection and well-being [1]. Over the centuries, the continuous exploitation of some finite natural resources, which resulted in the degradation of important ecosystems, subsequently creating severe environmental, economic and technological issues [2][3]. This overconsumption of materials and energy will eventually lead to inevitable resource depletion. Consequently, there is a need to moderate the demand for finite natural resources and simultaneously search for efficient and sustainable ways to extract and convert energy/materials from renewable resources [4]. Biocatalysts can play a crucial role. Due to their high specificity and selectivity, the biocatalysts can generally ensure the effective conversion of substrates, minimizing the formation of undesirable side-products and reducing the energetic costs associated with the process. Hence, the use of suitable and robust biocatalysts can greatly contribute to the implementation of greener and sustainable bioprocesses that efficiently compete with the classical chemical routes. Currently, the use of biocatalysts for the valorization of alternative non-finite resources, under the concept of bioeconomy and the EU Green Deal, has gained increased attention.
Unexplored or slightly explored environments, such as soil and water, are interesting sources of novel and promising biocatalysts. Despite the clear differences at the physicochemical level, soil and water are both regarded as natural bio-reservoirs with great microbial diversity. For this reason, these environments have been the focus of several microbial studies in the last few decades. Microorganisms are considered important suppliers of various bioproducts with applications in several industrial areas, such as enzymes. In fact, in the last decade, a significant increase in the demand for enzymes [5], which is easily explained by their great biotechnological potential. However, the presence of a significant number of unculturable microorganisms both in the soil and water can limit or make unfeasible some microbial studies to find novel biocatalysts.
Metagenomics has emerged as a culture-independent technique that allows exploring the genetic material of whole microbial communities present in a given environment [6]. This technique has been successfully used to identify novel enzymes with promising catalytic activities and some of them have been patented and already translated to the market [7]. Two different metagenomic approaches have been described, namely, sequence-based or function-based metagenomics. In both cases, an initial step of DNA extraction from an environmental sample is needed [8]. The sequence-based studies allow the identification of candidate genes, while the function-based screenings include the detection and isolation of clones from metagenomic libraries with a positive response to the desired phenotype [9]. The construction of a metagenomic library requires the selection of the most suitable expression vector, in which the environmental DNA fragment will be inserted. In addition to other aspects, such as the quality and size of the environmental DNA, this selection depends on the purpose of the functional screening. Plasmids can be used when DNA fragments are small (≤15 kb insert size) and contain only individual genes. On the other hand, some expression vectors, such as fosmids and cosmids (<40 kb insert size), or bacterial artificial chromosomes (BACs) and yeast artificial chromosomes (YACs) (>40 kb insert size), allow the recovery of large biosynthetic gene clusters that encode the production of one or more specialized metabolites. Besides the expression vector, expression systems should be selected in such a way that gene expression and target gene detection are maximized. Escherichia coli has been widely used due to the extensive genetic knowledge on this microorganism that makes it suitable for effective and profitable cloning and protein expression [10][11]. However, the demand for robust biocatalysts requires functional screenings at temperatures other than the growth temperatures of mesophilic hosts. Therefore, other expression hosts, such as Thermus thermophilus, have already been shown to be good candidates as expression hosts in the functional metagenome analysis [12].
2. Soil
Soil is more than just land; it represents one of the most important natural resources available on Earth. It is considered the basis of agriculture, an important water filter, a natural reserve of carbon and water and also the regular habitat for several living organisms. Soils originate from rocks through complex chemical, mechanical and biological processes which result in the formation of small particles and grains [13]. In terms of composition, soils are generally constituted by solid, liquid and gas fractions in a 2:1:1 ratio (volume basis). The solid fraction is mainly composed of inorganic materials but also contains a small portion of organic matter (around 1–5%) [14]. The liquid phase is generally an aqueous solution of electrolytes (e.g., micronutrients and macronutrients). On the other hand, the main gases present in soil are water vapor, CO2, O2 and N2 [13]. Depending on its origin and composition, different types of soil can be defined. Microbial diversity and functionality are strongly dependent on the specific characteristics of each type of soil.
2.1. Raw Resources
Depending on the geographical location, the soil may present itself in various forms at the Earth’s surface as the result of environmental conditions, organic matter content and type of vegetation [15]. An abiotic factor with high importance in any ecosystem is temperature since it can directly or indirectly affect microbial activity and, consequently, the composition of microbial communities. Each microbial species has a different temperature tolerance range and is capable of producing distinct types of biocatalysts at different production rates. Still, certain microorganisms can thrive and function well metabolically in adverse conditions, notably at extreme temperatures [16][17].
Therefore, the sampling locations in which the soil was directly collected and evaluated “as found in the nature” (raw resources) were classified according to the recorded temperature: low temperature (location temperature below 20 °C); moderate temperature (location temperature between 20 and 45 °C); and high temperature (location temperature above 45 °C) [17].
A greater number of studies were accomplished in low-temperature environments (27 studies), followed by high-temperature environments (26 studies) and, finally, moderate temperature environments (5 studies). The unique characteristics of extreme environments, namely, extreme temperatures, make them more interesting places to find promising and robust enzymes. This fact is probably the main reason for the reduced number of studies accomplished in moderate temperature environments. Furthermore, there is an increasing need to find highly stable biocatalysts with efficient activities capable of acting in various industrial processes that make use of severe conditions, similar to what occurs in hot and cold environments [18].
2.1.1. Low-Temperature Environments
Cold soils, in addition to being exposed to very low temperatures, are also subject to other harsh conditions, such as freeze-thaw cycles, UV radiation and restricted availability of water and nutrients. Indeed, arid/semiarid regions, aquatic environments, polar regions, mountainous and forest areas are examples of cold habitats that have been the focus of metagenomic studies. Although they usually have low biodiversity, the microorganisms that inhabit them have acquired survival machinery and, for this reason, these types of habitats constitute stimulating reservoirs of biotechnological molecules [19].
Deep-sea sediments, mainly sands and clays from several aquatic environments, are the type of samples that most contribute to the exploration of enzymes in these cold habitats. The South China Sea is a marginal sea and one of the most studied environments since it constitutes an important reservoir of sediments rich in organic materials [20]. Other regions of the Pacific Ocean, such as Suruga Bay in Japan [21], as well as the depths of the Atlantic Ocean [22] and the Arctic Ocean, notably in the Barents Sea area [23], have also been studied to search not only for lipolytic enzymes but also enzymes belonging to the glycosyl hydrolases class. Additionally, the submarine tufa columns of the Ikka Fjord in Greenland [24] and the karstic lake in Spain (Lake Arreo) [25], which, in addition to being permanently cold, are alkaline, and valuable sources of microorganisms adapted to these environments. In addition to aquatic sediments, other types of samples with metagenomic exploitation potential come from arid/semiarid and mountain soils, for example in the Ladakh region [26] and the Apharwat mountain [27][28], respectively, both in the north-western Himalayas that have distinct geo-climatic characteristics like extremely cold and dry weather, high altitude and glacial and permafrost soils. Relevant examples of this are the Karuola glacier in Tibetan Plateau [29] and the Kolyma Lowland permafrost in north-eastern Siberia [30], which are extremely hostile environments and inhabited by unique microbial communities.
2.1.2. Moderate Temperature Environments
There are few studies in which sampling sources are identified as moderate temperature environments. Still, certain environments of this nature have other characteristics that also make them interesting for metagenomic purposes. An example is the Caatinga biome of João Câmara (Brazil) which presents sandy loam soil and constitutes an ecosystem of high biological relevance due to the features of the area, such as the semiarid climate, the high exposure to UV radiation and the long periods of drought [31][32]. Another example is the solar saltern of Goa, which differs by its high salinity and represents an important source for metagenomic studies given the difficulty in cultivating halophilic microorganisms through conventional techniques [33].
2.1.3. High-Temperature Environments
High-temperature environments have also proven to be important sources of very useful thermostable enzymes with applications in various industrial fields, such as food and chemical synthesis industries. In addition to geothermally heated environments, such as hot springs and hydrothermal vents, arid/semiarid regions and environments subject to natural composting processes are often good targets for the application of metagenomic tools [34]. Compost samples from Expo Park in Japan, produced from leaves and branches, are a good example of natural composting since they have been studied over a few years [35][36][37][38]. Once thermophilic composting reaches high temperatures, there is a greater predominance of microorganisms capable of degrading complex molecules, with this type of environment being a potential source of lignocellulose-degrading enzymes and, for this reason, an interesting subject of study [39]. Several arid/semiarid regions have been explored given the typical characteristics of these environments, including deserts [40] and also other sites—more specifically, the Turpan Basin, which represents China’s hottest place and has proven to be a valuable source of different types of highly thermostable enzymes [41]. Hot springs and hydrothermal vents from different portions of the planet, e.g., Caldeirão hot spring in Portugal [42], Solnechny hot spring in Russia [43] and Solfatara-Pisciarelli hydrothermal pool in Italy [44] have also contributed to finding robust enzymes through the construction of metagenomic libraries using DNA extracted from wet mud and/or sediments collected from these places. For all other metagenomic studies, the expression host used was E. coli, except for the metagenomic library constructed from sediments of a hot spring in the Azores, Portugal, which used the T. thermophilus as the host. Using this thermophilic host, Leis and co-workers intended to increase the probability of detecting genes derived from extreme environments that would encode for new thermostable biocatalysts and allow the screening of phenotypes that are not observable in E. coli [45].
In some studies performed from these raw resources, an additional enrichment step was performed to provide favourable growth conditions for certain microorganisms of interest, often present in small abundance, to the detriment of others [46]. These enrichments were implemented by introducing specific substrates, such as cellulose, xylan, chitin, starch and glucose [21][47][48], and even olive oil [30], that stimulate specific microbial activities. On the other hand, culture enrichment also occurred by controlling environmental conditions, in particular the temperature, which is generally in agreement with the temperature of the sampling locations [49].
Over the past decade, in addition to function-based metagenomic screenings, sequence-based metagenomic screenings have also been performed. Sequential metagenomics showed that the phyla that predominate high-temperature environments are Crenarchaeota, Thaumarchaeota, Acidobacteria and Proteobacteria capable of mineral-based metabolism and generally associated with soil, found more specifically in sediments from hot springs and hydrothermal vents [21][24][44][45][50].
2.2. Human Manipulated Resources
Soil, in addition to being “discovered” as it is exposed in nature, without any kind of alteration, can be studied in a variety of scenarios, including contaminated/polluted, agricultural and controlled composting as a consequence of human manipulations.
2.2.1. Polluted Environments
The intensification of industrialization, urbanization and mining have negatively affected the soil as a natural source. It has been observed that soil has been contaminated by different factors, namely, industrial sewages, solid wastes and urban activities. Some organic and inorganic pollutants have been responsible for soil contamination, such as heavy metals, alkaline or acidic constituents, toxins, oil contaminants and others [51].
It was found that the following categories of polluted samples have been used as the object of metagenomic studies: soils contaminated by oil and its constituents (such as polycyclic aromatic hydrocarbons (PAHs)), fertilizers and other alkaline pollutants, industrial sludges and sediments. Oil production sites [52], soils where oil spills or runoffs have occurred [53][54][55][56], soils near industrial areas [53][57][58][59] and soils treated with fertilizers [60] were particularly analyzed. Since pollutants are rich in toxic compounds, they affect the activity and diversity of microbial communities present in these adulterated soils. Therefore, metagenomic studies have been developed in this type of compromised environment to unravel the gene clusters that encode enzymes involved in the biodegradation of the various pollutants already mentioned. Only the microorganisms that transport machinery capable of resisting and degrading these types of recalcitrant compounds can survive in such environments. The toxic compounds can even act as substrates and enrich some specific microorganisms [61].
The acid mine drainage in Carnoulès (France) is considered an interesting reservoir of enzymes capable of degrading polymers and pollutants simultaneously producing antimicrobial agents, since this polyextreme environment, in addition to being highly acidic, presents high concentrations of heavy metals, such as iron and arsenic, as a consequence of mining [62]. Another example is the saline–alkali soil of Lop Nur (China) which is characterized by extreme aridity and is a location that suffers from severe human manipulation. Since it serves as a basis for monitoring and verification of nuclear tests [63], microorganisms present in this soil are certainly subject to a high degree of stress and, for this reason, it may present an interesting microbial diversity and functionality [64].
Other areas exposed to other components such as fats [65][66] or chitin [67][68] have also been the focus of metagenomic studies as they are potential sources of new genes encoding groups of specific enzymes (lipolytic and chitinolytic enzymes, respectively). Activated sludge from different municipal [53][69][70][71] or industrial effluents, such as pesticide [72], swine [73] and paper and pulp [53] industries, are also a rich source of microorganisms producing enzymes capable of degrading protein, lipids and other pollutants. Of the thirty-one studies, two of them performed the analysis of the 16S rRNA gene libraries constructed, one of activated sludge from a swine wastewater treatment facility and the other one from soil contaminated and enriched with chitin. In both studies, it was found that the most predominant phylum was Proteobacteria [68][73]. Nevertheless, different samples can unravel other dominant ones, since the composition of the sources (activated sludge from industrial or municipal wastewater treatment plants or treated soils) and the type of treatment accomplished may influence the bacterial diversity.
2.2.2. Agricultural Lands and Grassland
Land use and management have a great influence on the functioning of the soil ecosystem. Microbial diversity and functionality are sensitive to land use considering the important role of soil microorganisms in soil formation processes and nutrient cycling [74].
Hence, several metagenomic studies were implemented in fields designed for agriculture and/or grasslands, many of them subject to the ploughing and cultivation of different crops. Rhizosphere soils of, for example, red pepper plants and strawberry plants represent a complex but interesting ecosystem due to the symbiosis and parasitism interactions that happen between plants and microorganisms in these soil regions [75][76][77]. Cotton [78], wheat [79][80], sugarcane [81][82][83], corn [84][85], straw [86] and paddy [77][87] fields are examples of agricultural environments from which samples were collected, in particular from topsoil, to be analysed through metagenomic approaches. The selection of topsoil samples and not samples at higher depths is due to the presence of a higher soil microbial biomass on the surface since there is larger evidence of litter composition and root turnover rates in this type of land [74]. Another important fact is that decomposition of a variety of lignocellulosic residues occurs in these environments making them attractive to isolate lignocellulose-degrading enzymes relevant to several industrial applications.
Three large-scale research landscapes [88] in Germany (Hainich-Dün, Schorfheide-Chorin and Schwäbische Alb) are defined as exploratory environments. They present different geological and climatic conditions and are characterized by the different intensities of use and management of agricultural fields and grasslands. Therefore, they certainly have great microbial diversity. In these environments, 37 novel lipolytic enzymes, the vast majority belonging to the hormone-sensitive lipase family, were reported. These exploratory environments, together with the Oak Park research facility in Ireland [89], are valuable sources of promising biocatalysts, notably lipases/esterases, as their land is essentially fertilized with compost and/or manure and is subject to crop rotation. The crop rotation system benefits certain chemical and physical properties of the soil, which is very important and favourable for soil microorganisms [90].
2.2.3. Industrial Composting
Among the different manipulated sources, composting is considered one of the most important bio-reactions for renewable bioenergy on the planet due to the huge variety of microorganisms capable of degrading lignocellulosic biomass. Composting is a sustainable and efficient microbiological process in which the stabilization of the organic matter occurs due to the passage through a thermophilic phase promoted by the proliferation of thermophilic microorganisms [39][91].
The great contribution of composting to the circular economy has led to an increase in the number of composting facilities that are responsible for the production of compost, rich in humic substances, with high agronomic value in organic fertilization of agricultural soils. Different parameters are controlled and adjusted throughout the industrial composting process, such as temperature, pH, humidity, nature of organic materials, particle size and C/N ratio [91].
Some raw materials used in industrial composting are agricultural and agro-industrial residues, including animal faeces [92][93][94][95][96][97], household wastes [98][99], residues from crop harvesting [92][93][94][95][97][100][101][102][103], green wastes [96][98][99][100][102][104], wood chips and sawdust [45][96] and the organic putrescible fraction of municipal solid waste [71][105].
For the metagenomic studies in which composting samples were used, the sample collection essentially occurred during the thermophilic phase of composting that reaches high temperatures (above 45 °C) due to microbial metabolic activity. Certain studies refer to microbial diversity and confirm the prevalence of thermophilic microorganisms, namely, Actinobacteria, Bacteroidetes, Firmicutes and Proteobacteria, producing enzymes able to degrade the complex molecules that compose the lignocellulosic biomass [45][96][98][100]. Additionally, some studies also report enrichment with lignocellulosic substrates, namely, switchgrass, steam-exploded spruce, cutter chips, Whatman filter paper, pre-treated Miscanthus giganteus and wheat straw. These substrates are firstly incubated with the composting samples at a constant temperature and pH-defined according to the phase at which they were collected to increase the abundance of thermophilic microorganisms and develop a suitable consortium able to degrade lignocellulosic biomass. Over the years, several lignocellulose degrading enzymes have been reported through the analysis of industrial composting metagenomes [98][100][102][104][105].
2.3. Unspecified Resources
The designation “unspecified” includes studies from soil resources that do not fall into either the raw resources section or the section of resources subject to human manipulation, since neither the temperature of the sample collected, the sampling site nor the existence of any human activity which may alter the natural properties of the sample is mentioned.
A large part of the unspecified resources are forests that are made up of microorganisms responsible for mediating biogeochemical cycles in terrestrial ecosystems. Additionally, forest soils have a high microbial diversity due to the great accumulation of organic matter [15]. Several metagenome studies have been developed, notably in soil samples from the Amazon rainforest [106], mangrove forests [107][108][109][110][111], peat-swamp forests [112][113], beech forests [114] and chestnut groves [115], to assess microbial diversity, as well as the metabolic capacities of these communities to decompose natural biomass. A metagenome from the Eucalyptus sp. forest in Brazil revealed that two of the most abundant phyla are Actinobacteria and Firmicutes, commonly associated with forest soils [15][116].
This category also includes samples from soils with high moisture content, such as alluvial soils from Eulsukdo Island (South Korea) [77][117], soils collected from lakes [71][118] and even soils with a significant salinity content [71][119], mountain soils such as an acid peatland site in Germany [113] and arid areas, namely, soils of the Cerrado region of Brazil [120][121] that present a high clay content, low pH and high iron levels. All these environments have interesting characteristics that give them unique ecosystems with high enzymatic potential.
3. Water
Water is essential in life. Although frequently perceived as just an ordinary substance, it plays a vital role on the Earth since life and it could not exist without water. As a natural resource, water is fully distributed over the planet. The oceans contain the greatest fraction of water (96.5%), while continental water is composed of freshwater (2.5%) and saline groundwater (1%) [122]. Water is present in its three physical forms (liquid, solid and vapour) depending on the climate conditions. Aqueous environments (saline and freshwater) are considered important sources of microorganisms capable of inhabiting and resisting in different physical forms, such as icebergs (important spots for marine life), seas and oceans, thermal springs, glaciers, lakes and ponds. The microbial diversity and functionality found in water are strongly dependent on the environmental conditions (e.g., salinity, temperature, physical form, nutrients, pH or depth). It is known that a large part of the aquatic microbial resources remains unexplored mainly due to access limitations. Nevertheless, the use of metagenomic tools has significantly aided the discovery of novel biocatalysts from aquatic metagenomes [123].
3.1. Raw Resources
As mentioned before, the microbial communities of natural water resources are strongly associated with environmental factors, one of which is temperature [124]. Indeed, in addition to the most common aquatic environments on the planet, such as seawater or lakes, extreme temperature environments also allow exploring a diversity of enzymes capable of catalysing reactions at reasonable or extreme conditions [125]. Therefore, raw water resources were classified in the same way and according to the same temperature ranges as soil resources: low temperature (location temperature below 20 °C); moderate temperature (location temperature between 20 and 45 °C); and high temperature (location temperature above 45 °C) [17].
3.1.1. Low-Temperature Environments
When thinking about low-temperature aquatic environments, thoughts are immediately associated with marine environments and/or high depth environments that are intrinsically related to high pressures. The decrease in temperature and the increase in depth cause a decreased diffusion of nutrients and energy and decreased abundance of prokaryotic cells, respectively. These extreme conditions require that microorganisms found in these zones have their adapted metabolism, for example, low concentrations of nutrients, which make them fascinating targets for the bioprospection of novel microbial capabilities and, accordingly, promising enzymes [126][127].
Low-temperature water samples were collected essentially in different geographical locations in the Atlantic Ocean. Surface seawater was collected at opposite ocean regions, particularly from an ecosystem observation site in New Jersey, USA [128], and the brackish Baltic Sea in Poland [129]. The latter recorded the lowest water temperature (0.8 °C), probably due to the fact that the Baltic sea was formed as a consequence of the last glaciation that occurred 10,000–15,000 years ago and, therefore, has undergone remarkable changes in its physicochemical characteristics [130]. In these different geographical locations, different groups of enzymes were reported, namely, a ribulose 1,5-bisphosphate carboxylase/oxygenase and a cytosolic β-glucosidase with a wide range of catalytic activities. Additionally, the chemocline of the Urania basin in the Mediterranean Sea was the subject of a metagenomic study to find interesting carboxylesterases, since it is a deep-sea anoxic hypersaline basin [53]. The extreme factors that characterize the Urania basin (hypersalinity, low temperature and anoxia) together with the typical features of chemoclines make this habitat accommodate a highly diverse microbial community with pronounced microbial activities, such as CO2 fixation and exoenzyme activities [131].
3.1.2. Moderate Temperature Environments
In moderate temperature environments, samples of different types of water are included: groundwater/freshwater, hyper/saline water and brackish water.
Hyper/saline water samples from the surface of the South China Sea were the ones that most contributed to the metagenomic studies in this category. The seasonal average water temperature that falls into the range defined for this group (20–45 °C) and the unique environmental properties of the South China Sea potentially contribute to the diversity, novelty and uniqueness of genes encoding for valuable enzymes. Effectively, different groups of enzymes have already been explored in the South China Sea, including β-glucosidases, laccases and esterases [20][132][133][134].
As the lakes of the Amazon region remain unexplored, the freshwater metagenome of Lake Poraquê was functionally analysed. Being the largest hydrographic basin on Earth, the great genetic and metabolic diversity of microorganisms present in this important region may result in the discovery of new enzymes of biotechnological interest, such as enzymes involved in the degradation of plant cells walls [135].
The brackish samples of the Caspian Sea were also accessed since this environment presents a salinity and ionic concentration very similar to the human serum. In this way, there is a high probability that the secretory enzymes (more specifically, L-asparaginases) found in the Caspian Sea microbiome exhibit greater stability in the physiologic conditions of the human serum which can render them an interesting therapeutic applicability [136].
3.1.3. High-Temperature Environments
High-temperature habitats (>45 °C) are inhabited by heat-resistant microorganisms and some of these environments also combine other extreme conditions, for example, alkalinity, acidity, salinity, pressure and heavy metals [17]. Samples have been studied mainly from hyper/saline water environments and groundwater/freshwater environments.
Among the different aquatic environments, the hypersaline anoxic deep-sea basins in the Red Sea, namely, the Atlantis II deep brine pool, have received increased attention. These are characterized by a high temperature, extreme salinity, acidic pH, extremely low levels of light and oxygen and high concentrations of heavy metals. In this way, this extreme environment is expected to be an attractive location for the search for biocatalysts that can function under harsh conditions, not just those that characterize the Red Sea. An esterase capable of acting in the presence of heavy metals; a mercuric reductase extremely relevant in the detoxification system for mercuric/organomercurial species; a nitrilase useful in bioremediation processes, fine chemicals and pharmaceutics; a 3′-aminoglycoside phosphotransferase and a beta-lactamase with potential application as thermophilic selection markers; and a thioredoxin reductase important in the maintenance of the redox balance and counteracting oxidative stress inside cells are some examples of biocatalysts found in this extreme environment [137][138][139][140][141].
Another source commonly known for its high temperatures are the hot springs. A metagenomic library constructed from a groundwater sample from the Lobios hot spring in Spain was evaluated, by sequence-based and functional metagenomics approaches, given its high temperatures and alkaline pH values. Moreover, the microbial biodiversity and metagenomic potential of this source have not been sufficiently explored.
3.2. Human Manipulated Resources
Anthropogenic activities interfere negatively in many ways with the natural water cycle. Several water bodies, such as oceans, rivers and groundwater have been contaminated not only by natural events but particularly due to human interventions [142].
3.2.1. Groundwater/Freshwater
The main anthropogenic sources of water contamination are refineries, mines, factories and wastewater treatment plants, among others [142].
Over the past 10 years, different metagenomic studies have been executed in contaminated groundwater and freshwater sources for the acquisition of novel enzymes. Some examples of contaminated sources are the formation of water in a coalbed in Jharia coalfield (India) which is defined as an extreme environment [143], Eryuan Niujie hot spring in Yunnan (China), which has a high content of fats due to the wastes resulting from the livestock slaughter that occurs in the vicinity of the hot spring [144], and groundwater from an area in the Czech Republic that has been incessantly contaminated with various products from an oil industry for over 50 to 70 years [145].
The activities associated with each of the sites employ a certain continuous selective pressure on the microorganisms that live in these environments to develop enzymes capable of acting in the production of biodiesel, degradation of organophosphorus compounds and halogenated pollutants, respectively [143][144][145].
3.2.2. Coastal Water
The category of coastal waters is essentially composed of water samples from oil-contaminated harbours due to the numerous ships circulating in the waters of these areas each year and also due to the unintentional spills of hydrocarbons that may occur during the loading and unloading of petroleum-derived substances. The main pollutant responsible for the contamination of these sites is oil and the studies have mainly been focused on the Mediterranean Sea in Italy and the Barents Sea in Russia, allowing the finding of robust carboxylesterases [53].
The importance of the metagenomic analysis of water samples of these types of sources is justified by the abundance of microbial species capable of degrading hydrocarbons which can be potentially applied in the bioremediation of ecosystems. To this end, additional enrichments with crude oil or specific hydrocarbons (e.g., pyrene, naphthalene and phenanthrene) are carried out to mimic the place from which they are isolated.
3.3. Unspecified Resource
The tropical underground water of the Yucatán Aquifer in Mexico was the only resource considered unspecified. Nevertheless, it is a very interesting resource of freshwater since it consists of very permeable and porous limestone that allows the infiltration of water into the deepest layers of the soil. Additionally, the Yucatán Aquifer presents cracks or interconnected spaces that allow water from distant zones and sources to move freely, carrying a collection of microorganisms from different places. Thus, although it should be a natural selection process that favours some microorganisms by eliminating others, it may also present a high microbial diversity from diverse origins [146].
3. Water
Water is essential in our life. Although frequently perceived as just an ordinary substance, it plays a vital role on the Earth since life as we know it could not exist without water. As a natural resource, water is fully distributed over the planet. The oceans contain the greatest fraction of water (96.5%), while continental water is composed of freshwater (2.5%) and saline groundwater (1%) [182]. Water is present in its three physical forms (liquid, solid and vapour) depending on the climate conditions. Aqueous environments (saline and freshwater) are considered important sources of microorganisms capable of inhabiting and resisting in different physical forms, such as icebergs (important spots for marine life), seas and oceans, thermal springs, glaciers, lakes and ponds. The microbial diversity and functionality found in water are strongly dependent on the environmental conditions (e.g., salinity, temperature, physical form, nutrients, pH or depth). It is known that a large part of the aquatic microbial resources remains unexplored mainly due to access limitations. Nevertheless, the use of metagenomic tools has significantly aided the discovery of novel biocatalysts from aquatic metagenomes [183]. In this review, the water samples used in the metagenomic studies were divided according to their origin and main characteristics . In the last decade, a total of 26 metagenomic studies with water samples to find promising enzymes were reported.
3.1. Raw Resources
As mentioned before, the microbial communities of natural water resources are strongly associated with environmental factors, one of which is temperature [184]. Indeed, in addition to the most common aquatic environments on the planet, such as seawater or lakes, extreme temperature environments also allow exploring a diversity of enzymes capable of catalysing reactions at reasonable or extreme conditions [185]. Therefore, raw water resources were classified in the same way and according to the same temperature ranges as soil resources: low temperature (location temperature below 20 °C); moderate temperature (location temperature between 20 and 45 °C); and high temperature (location temperature above 45 °C) [17].
3.1.1. Low-Temperature Environments
When thinking about low-temperature aquatic environments, thoughts are immediately associated with marine environments and/or high depth environments that are intrinsically related to high pressures. The decrease in temperature and the increase in depth cause a decreased diffusion of nutrients and energy and decreased abundance of prokaryotic cells, respectively. These extreme conditions require that microorganisms found in these zones have their adapted metabolism, for example, low concentrations of nutrients, which make them fascinating targets for the bioprospection of novel microbial capabilities and, accordingly, promising enzymes [200,201].
Low-temperature water samples were collected essentially in different geographical locations in the Atlantic Ocean. Surface seawater was collected at opposite ocean regions, particularly from an ecosystem observation site in New Jersey, USA [195], and the brackish Baltic Sea in Poland [187]. The latter recorded the lowest water temperature (0.8 °C), probably due to the fact that the Baltic sea was formed as a consequence of the last glaciation that occurred 10,000–15,000 years ago and, therefore, has undergone remarkable changes in its physicochemical characteristics [202]. In these different geographical locations, different groups of enzymes were reported, namely, a ribulose 1,5-bisphosphate carboxylase/oxygenase and a cytosolic β-glucosidase with a wide range of catalytic activities. Additionally, the chemocline of the Urania basin in the Mediterranean Sea was the subject of a metagenomic study to find interesting carboxylesterases, since it is a deep-sea anoxic hypersaline basin [50]. The extreme factors that characterize the Urania basin (hypersalinity, low temperature and anoxia) together with the typical features of chemoclines make this habitat accommodate a highly diverse microbial community with pronounced microbial activities, such as CO2 fixation and exoenzyme activities [203].
3.1.2. Moderate Temperature Environments
In moderate temperature environments, samples of different types of water are included: groundwater/freshwater, hyper/saline water and brackish water.
Hyper/saline water samples from the surface of the South China Sea were the ones that most contributed to the metagenomic studies in this category. The seasonal average water temperature that falls into the range defined for this group (20–45 °C) and the unique environmental properties of the South China Sea potentially contribute to the diversity, novelty and uniqueness of genes encoding for valuable enzymes. Effectively, different groups of enzymes have already been explored in the South China Sea, including β-glucosidases, laccases and esterases [75,186,190,192].
As the lakes of the Amazon region remain unexplored, the freshwater metagenome of Lake Poraquê was functionally analysed. Being the largest hydrographic basin on Earth, the great genetic and metabolic diversity of microorganisms present in this important region may result in the discovery of new enzymes of biotechnological interest, such as enzymes involved in the degradation of plant cells walls [188].
The brackish samples of the Caspian Sea were also accessed since this environment presents a salinity and ionic concentration very similar to the human serum. In this way, there is a high probability that the secretory enzymes (more specifically, L-asparaginases) found in the Caspian Sea microbiome exhibit greater stability in the physiologic conditions of the human serum which can render them an interesting therapeutic applicability [199].
3.1.3. High-Temperature Environments
High-temperature habitats (>45 °C) are inhabited by heat-resistant microorganisms and some of these environments also combine other extreme conditions, for example, alkalinity, acidity, salinity, pressure and heavy metals [17]. Samples have been studied mainly from hyper/saline water environments and groundwater/freshwater environments.
Among the different aquatic environments, the hypersaline anoxic deep-sea basins in the Red Sea, namely, the Atlantis II deep brine pool, have received increased attention. These are characterized by a high temperature, extreme salinity, acidic pH, extremely low levels of light and oxygen and high concentrations of heavy metals. In this way, this extreme environment is expected to be an attractive location for the search for biocatalysts that can function under harsh conditions, not just those that characterize the Red Sea. An esterase capable of acting in the presence of heavy metals; a mercuric reductase extremely relevant in the detoxification system for mercuric/organomercurial species; a nitrilase useful in bioremediation processes, fine chemicals and pharmaceutics; a 3′-aminoglycoside phosphotransferase and a beta-lactamase with potential application as thermophilic selection markers; and a thioredoxin reductase important in the maintenance of the redox balance and counteracting oxidative stress inside cells are some examples of biocatalysts found in this extreme environment [189,193,194,197,198].
Another source commonly known for its high temperatures are the hot springs. A metagenomic library constructed from a groundwater sample from the Lobios hot spring in Spain was evaluated, by sequence-based and functional metagenomics approaches, given its high temperatures and alkaline pH values. Moreover, the microbial biodiversity and metagenomic potential of this source have not been sufficiently explored. This study reported a novel esterase belonging to family VIII and showed that the dominant prokaryotic phyla in this location, as in other hot springs on the planet, were Deinococcus-Thermus, Proteobacteria, Firmicutes, Acidobacteria, Aquificae and Chloroflexi. Additionally, the dominant archaeal phylum was Thaumarchaeota [191].
3.2. Human Manipulated Resources
Anthropogenic activities interfere negatively in many ways with the natural water cycle. Several water bodies, such as oceans, rivers and groundwater have been contaminated not only by natural events but particularly due to human interventions [204].
3.2.1. Groundwater/Freshwater
The main anthropogenic sources of water contamination are refineries, mines, factories and wastewater treatment plants, among others [204].
Over the past 10 years, different metagenomic studies have been executed in contaminated groundwater and freshwater sources for the acquisition of novel enzymes. Some examples of contaminated sources are the formation of water in a coalbed in Jharia coalfield (India) which is defined as an extreme environment [206], Eryuan Niujie hot spring in Yunnan (China), which has a high content of fats due to the wastes resulting from the livestock slaughter that occurs in the vicinity of the hot spring [207], and groundwater from an area in the Czech Republic that has been incessantly contaminated with various products from an oil industry for over 50 to 70 years [208].
The activities associated with each of the sites employ a certain continuous selective pressure on the microorganisms that live in these environments to develop enzymes capable of acting in the production of biodiesel, degradation of organophosphorus compounds and halogenated pollutants, respectively [206,207,208].
3.2.2. Coastal Water
The category of coastal waters is essentially composed of water samples from oil-contaminated harbours due to the numerous ships circulating in the waters of these areas each year and also due to the unintentional spills of hydrocarbons that may occur during the loading and unloading of petroleum-derived substances. The main pollutant responsible for the contamination of these sites is oil and the studies have mainly been focused on the Mediterranean Sea in Italy and the Barents Sea in Russia, allowing the finding of robust carboxylesterases [50].
The importance of the metagenomic analysis of water samples of these types of sources is justified by the abundance of microbial species capable of degrading hydrocarbons which can be potentially applied in the bioremediation of ecosystems. To this end, additional enrichments with crude oil or specific hydrocarbons (e.g., pyrene, naphthalene and phenanthrene) are carried out to mimic the place from which they are isolated.
3.3. Unspecified Resource
The tropical underground water of the Yucatán Aquifer in Mexico was the only resource considered unspecified. Nevertheless, it is a very interesting resource of freshwater since it consists of very permeable and porous limestone that allows the infiltration of water into the deepest layers of the soil. Additionally, the Yucatán Aquifer presents cracks or interconnected spaces that allow water from distant zones and sources to move freely, carrying a collection of microorganisms from different places. Thus, although it should be a natural selection process that favours some microorganisms by eliminating others, it may also present a high microbial diversity from diverse origins [209].
In this way, this aquifer can represent a potential and interesting source for the acquisition of a catalogue of enzymes suitable for the degradation of natural polymers, including proteins.
In this way, this aquifer can represent a potential and interesting source for the acquisition of a catalogue of enzymes suitable for the degradation of natural polymers, including proteins.