The global demand and trade of industrial enzymes are continuously growing, and they are estimated to reach $7.0 billion USD in the next few years
[1]. In this scenario, great importance is given to microbial enzymes, presenting several advantages, such as high yields, activity, and reproducibility, in addition to economic production, exponential growth, use of cheap platforms, and easy optimization. Many industrial processes demand enzymes, such as producing food, pharmaceutical products, detergents, and textiles. In this context, recombinant gene technology, protein engineering, and directed evolution have revolutionized enzyme manufacturing and this industry. Enzymes with a hydrolytic activity are used in the degradation processes of various natural substances and are extensively applied in industry. Proteases, essential enzymes for the detergent and dairy industries, are also widely used. Those enrolled in carbohydrate metabolism, including amylases and cellulases, are extensively used in the textile, detergent, and food industries
[2]. Approximately 60% of industrial enzymes come from fungi, 24% from bacteria, 4% from yeasts, and most of the 12% remaining are obtained from plants and animals
[1][2]. However, although viruses represent a small portion of these enzymes, studies about their potential have been growing in the last 30 years (
Figure 1). They have a complex geometric structure and highly efficient genetic machinery, and, beyond that, they are sources of unique enzymes with great biotechnological potential
[3].
The concept that viruses carry only genes that support their viral replication and capsid production changed with the discovery of giant viruses, opening space for a new approach and understanding of their contribution to the evolution of life
[7]. Differently from the other viral groups and, similarly to bacteria and prokaryotes, they carry large genomes, with a diversity of genes capable of coding for numerous proteins, including DNA repair and even metabolic enzymes
[7][8]. This new approach to understanding not only enriches the primary refinement regarding these viruses and their hosts, but also the beginning of the potential of these organisms for several biotechnological purposes. These viruses were first discovered in the 1970s, infecting unicellular algae, and many different isolates have been identified since then
[9][10][11]. With the discovery of mimiviruses in the early 2000s and other giant amoeba viruses in the following years, the group of so-called nucleocytoplasmic large DNA viruses (NCLDV), currently classified in the phylum
Nucleocytoviricota, greatly expanded, and pushed forward the boundaries of the virosphere
[12][13][14].
2. Phycodnaviridae: The First Family of Giant Viruses of Protists
The
Phycodnaviridae family includes viruses with biochemical and genetic peculiarities, such as DNA error correction and post-replicative processing, that infect eukaryotic algae from freshwater or marine environments
[18][19]. Phylogenetic analysis using DNA polymerase B sequences from members of this family showed that they have a common ancestor with other NCLDV, thus corroborating their classification in the phylum
Nucleocytoviricota [14]. The family currently comprises six genera named
Coccolithovirus,
Phaeovirus,
Prasinovirus,
Raphidovirus,
Prymnesiovirus, and
Chlorovirus, that differ in terms of cycle type, host, genome topology, and gene content
[20][21].
Although genes enrolled in lipid metabolism are not the most abundant functional category in giant viruses, they are strongly present in coccolithoviruses (
Figure 2). Coccolithoviruses infect microalgae of the species
Emiliania huxleyi, commonly found in marine sediments. Emiliania huxleyi virus 86 (EhV-86), one the first known coccolithoviruses, was observed in 1999, having a viral particle around 200 nm, covered by a lipid membrane and a linear genome of 407 kbp
[22]. Curiously, a genomic characterization of the EhV-86 identified 472 coding sequencing (CDS) regions, but only 63 genes have a known function so far (
Table 1). An amount of 10% of them encode enzymes involved in the biosynthesis of sphingolipids: sterol desaturase, serine palmitoyltransferase, lipid phosphate phosphatase, and two genes encoding desaturases (
Figure 2).
Figure 2. Presence and distribution of enzymes in the Phycodnaviridae family. Representatives of each genus were included and data on the diversity and abundance of enzymes grouped into different functional categories were obtained from genomic annotations publicly available on GenBank. A network graph was constructed using Gephi 0.9.7 using a force-based algorithm (ForceAtlas2), followed by a manual arrangement of nodes for better visualization. Node sizes are proportional to the degree of connection. The thickness of the edges is proportional to the number of genes of the same function in the genome of a virus. Virus representatives: (1) Paramecium bursaria chlorella virus 1; (2) Emiliania huxleyi virus 86; (3) Feldmannia species virus; (4) Ostreococcus tauri virus 5; (5) Phaeocystis globosa virus; (6) Heterosigma akashiwo virus 01.
Table 1. General genomic data of representatives of different groups of giant viruses of protists.
Such enzymes are involved in the synthesis of ceramide, which induces apoptosis of the infected cell
[23]. Although the mechanism of apoptosis has already been observed in other viruses, no genes related to the synthesis of sphingolipids were found in their genomes, making these genes exclusive to coccolithoviruses
[24][25][26]. In addition, a proteome analysis showed that these enzymes enrolled in sphingolipid biosynthesis are present as early-class proteins, suggesting that they could be functional and also play an important role in initial infection
[27]. Besides that, this highlights that these genes are not only carried inside the viral particles, but that they are also translated into functional proteins in the host and can be explored as biotechnological enzymes. Sphingolipids are molecules found in eukaryotes and prokaryotes and perform structural, signaling, and biochemical functions. They have been mentioned as a potent food supplement, and as a cosmetic, as they prevent skin infection and inhibit bacteria and fungi proliferation
[28][29].
A comparative study performed by Nissimov and colleagues showed the presence of 25 to 29 CDS from other isolated viruses (EhV-201, EhV-207, and EhV-208) identical to sequences present in the EhV-86 genome. The predicted enzymes found were methyltransferases, glycosyltransferases, and RNase, and the majority of non-shared proteins, considered hypothetical ones, have unknown functions. On the other hand, the EhV-84 isolate showed many proteins (231 CDS) with identical homology with EhV-86 proteins
[30]. With a few available genomes, it is clear that there is a vast field to be explored, both for obtaining more information about the viruses’ biology and ecology, and to be investigated for biotechnological purposes.
Phaeoviruses infect filamentous algae, most from the genus Ectocarpus and Feldmania, in subtropical environments, and they are the only phycodnaviruses known so far to infect more than one host. Genomic analysis of the Ectocarpus siliculosus virus-1 (EsV-1) revealed 231 CDS regions, where only 50% had determined functional characterization. Among these include genes involved in DNA synthesis, polysaccharide metabolism, histidine protein kinases, integration, and transposition
[31]. Integrases catalyze site-specific DNA rearrangement, and transposases can bind in transposons on DNA and move small fragments along the genome. Both enzymes can be used for gene editing, gene therapy, and integrases are also studied as resistant markers
[32][33]. A close relative is the Feldmania species virus (FsV), a phaeovirus associated with the brown filamentous algae
Feldmania sp. This virus was considered the smallest giant viruses with a linear genome of 154 kbp and 150 CDS regions, of which only 25% had similarity with the database, such as those enrolled on DNA replication, transcription, nucleotide metabolism, and also lipid and protein metabolisms (
Figure 2 and
Figure 3)
[34].
Figure 3. Presence and abundance of enzymes in giant viruses of protists. Bubble chart containing representatives of different viral families of the phylum nucleocytoviricota associated with algal and amoeba hosts. Bubble sizes are proportional to the number of enzymes represented as a gene percentage of a functional category in the virus genome. Data on enzyme diversity and abundance of each virus grouped into different functional categories were obtained from genomic annotations publicly available on GenBank and classified according to NCVOG categories.
Prasinoviruses infect prasinophytes, considered the smallest free-living photosynthetic eukaryotes
[35]. Genomic analysis of the Osteococcus tauri virus (OtV-1) showed 232 CDS regions, where 31% showed functional similarity with previously described proteins, including methyltransferases and other enzymes involved in DNA, protein, and carbohydrate metabolism
[33]. The Osteococcus tauri virus OtV-5 genome has 268 CDSs, and only 57% of the predicted proteins had a known function, including those involved in DNA replication and viral particle formation. Interestingly, some host-related genes were also found, including a proline dehydrogenase, related to cellular oxidation protective metabolism
[36][37]. This virus has complex glycosylation machinery, with at least five glycosyltransferases and a galactosyltransferase, indicating relative independence of the host for glycosylating their own proteins (
Figure 2). It’s worth noting that other giant viruses also have glycosylation machinery, with many proteins involved in carbohydrate modification and sugar production, which could be explored in the biotechnology industry
[38][39].
Rhaphidoviruses have a wide variety of hosts. Among them is the single-celled seaweed bloom-forming
Heterosigma akashiwo, which can form surface aggregations toxic to the environment
[40]. The complete sequencing of the first virus strain infecting this alga (HaV53) was published in 2016, and genes related to DNA regulation, carbohydrate metabolism, signal transduction, and regulation of ubiquitin-related proteins were found. However, there is still a limited characterization of this genome
[41]. Similar to other members of the
Phycodnaviridae family, HaV01 has known glycosyltransferases that might be involved in viral protein glycosylation. In addition, proteins involved in transcription and RNA processing have also been identified, including a ribonuclease III and a mRNA-capping enzyme (
Figure 2). Ribonuclease III can cleave double-strand RNA (dsRNA), an essential step in the maturation and decay of coding and non-coding RNAs. The first characterized ribonuclease III was from
Escherichia coli, which is commercially available, and the enzyme is also present and well-conserved in plants, animals, fungi, and eukaryotic viruses
[42]. The mRNA-capping enzyme is a complex that promotes the first modification of RNA polymerase II transcripts. In this context, this complex can regulate cap-dependent protein synthesis and act in the protein export mechanism
[43]. Many types of mRNA-capping systems have also been described in viruses, such as influenza, orthomyxoviruses, alphaviruses, mimiviruses, and chloroviruses
[43][44]. It is interesting to note that New England Biolabs Inc. has recently announced that the Faustovirus capping enzyme is commercially available, an enzyme that demonstrates increased capping efficiency across a variety of mRNA 5′ structures than previous enzymes
[45].
Prymnesioviruses infect phytoplankton algae with high biomass formation, such as
Phaeocystis globosa. Genomic analysis of the strain Phaetocistis globosa virus-16T (PgV-16T) showed 434 CDS regions with no phylogenetic proximity with the other viruses that infect algae, even though they are part of the Megaviridae clade. Seventy percent of its genome is similar to other large double-stranded DNA (dsDNA) viruses, with genes related to many processes, such as DNA replication and repair, including methyltransferases and transposases
[46]. Seven coded genes seem unique in their genome among the group, which encode peculiar enzymes, such as phospholipase and asparagine synthetase homologs
[46]. Phospholipases are responsible for hydrolyzing phospholipids into other lipids and are widely used in industrial food processes, while asparagine synthetase is a target related to the growth of human tumor cells. These prokaryote enzymes have also been characterized
[47][48]. Compared to other phycodnaviruses, the difference between PgV-16T and these viruses’ genetic profile is clear, considering the functional clusters of genes (
Figure 2). Such a difference corroborates previous data, pointing to this virus as a member of the
Mimiviridae family
[46]. Another member of this group is the Chrysochromulina brevifilum virus PW1, the only recognized species by ICTV so far
[20]. A few viruses infecting
Chrysochromulina sp. have been identified in the last years, and genome analysis of C. parva viruses suggested limited gene machinery compared to other phycodnaviruses (
Figure 3).
The last-mentioned genus,
Chlorovirus, was the first to be created, comprising the first virus associated with chlorella-like green algae, back in the late 1970s
[49][50]. The first reported phycodnavirus, zoochlorella cell virus (ZCV), was isolated in the late 1970s in Japan from a
Chlorella sp. that lives in symbiosis with the protozoan Paramecium bursaria. The ZCV was able to infect only zoochlorella recently separated from its symbiotic protozoan
[9]. A few years later, viruses sharing many characteristics with ZCV were isolated from zoochlorella associated with
Hydra viridis (HVCV-1 and HVCV-2) and also with
Paramecium bursaria (PBCV-1), which would become the most studied algae viruses over the last decades
[10][49][50].