Your browser does not fully support modern features. Please upgrade for a smoother experience.
Modeling Virus Evolution by Deep Learning: History
View Latest Version
Please note this is an old version of this entry, which may differ significantly from the current revision.
Contributor: Robert Friedman

Modeling virus evolution requires knowledge of evolutionary and population processes. These processes are highly complex and involve numerous parameters in constructing an evolutionary model, a problem given the very sparse sampling of viral sequences from nature. Instead, the deep learning approaches of computer science are suited for modeling these non-linear dynamic processes of evolution, leading to utility in predicting the composition of future viral populations and the emergence of novel variants.

  • virology
  • machine learning
  • predictability
  • mutation
  • recombination
  • evolutionary processes
  • viral variant

1. Introduction

Viruses evolve by many evolutionary processes, including mutation, recombination, and natural selection. Modeling these processes is very difficult for viral types since the evolutionary process is highly complex and the historical record of viruses is incomplete[1][2]. It is possible to reconstruct the history of a viral population where the nucleic acid sequences are sufficiently sampled, however, this is a task of high complexity. The deep learning approaches from the computer science discipline[3] offer the potential to recover the model and parameters of virus evolution, since these same approaches have modeled other processes involving complex non-linear dynamics, such as for recovering the 3-d structure of a protein from nucleic acid sequences[4]. These models not only have the potential for revealing the history of a viral population, but also for predicting the population in the future. This information leads to predictions for new variants of a viral type.

2. Defining the Major Viral Groups

Predicting new variants of a viral type requires knowledge of the history of a virus. This knowledge is typically in a form of a type of nucleic acid sequence, and these viral types are assigned to a larger category by their mechanism of inheritance, a process dependent on the nucleic acid type and method of replication[5]. Examples are DNA or RNA, and these may also vary in the strandedness of the molecule. These molecular sequences provide the necessary information to reconstruct the history of a viral population, but the problem is that the populations are very large and not comparable to the smaller population sizes found in cellular life. The viruses are dependent on cellular life for replication, so it is expected that the numbers of individual viruses are orders of magnitude larger than in the populations of cellular hosts.

The viruses also vary in their ecological relationship with their cellular hosts[6]. This may be described as a predatory interaction with the host where the virus is a pathogen. Other interactions of interest include a beneficial interaction for both parties, or where the virus has the benefit of replication within a cellular host, but the host is mostly unharmed and without a significant benefit. The model that describes a viral population and its interactions is complex, so the predictions on its ecology requires both knowledge of the viral type, the organisms it interacts with, and the environment.

3. Prediction of Viral Variants

There is the problem of constructing an evolutionary model for one viral population and its reuse in the case of a different viral population, whether evolutionarily related or not. The model is expected to have less application in the case of sampling highly divergent viruses, or viruses with no apparent relatedness, particularly where their methods of replication vary. However, the deep learning approaches are viable methods for recovering the non-linear dynamics of natural processes. The reasons for their effectiveness is not fully understood, and Dr. Geoffrey Hinton has cited the "unreasonable effectiveness" of these many layered neural network approaches in their application[7].

3.1. Deep Learning Approach

Since the deep learning approaches are applicable for modeling a non-linear dynamic system, such as in the processes of viral evolution, then a major limitation is the size and breadth of the data sample. It also follows that the quality of the data is relevant, and that the sample is representative of the mechanisms of the system. However, in general, with a big data sample it is possible to capture many of the parameters in a complex natural system. The deep learning approaches also scale for the handling of a large number of parameters in big data sets, a feature of their software architecture, and novelty in hardware design.

Constructing these models allows for predictability in the case of a natural system, such as in the evolution of a viral population and the genetic sequences of new variants. Even though there is a problem of missing and sparse data sampling, the deep learning approaches have recovered many of the attributes of a non-linear dynamic system at high resolution, given the natural scale of interest.

Conceptually, the main advantage of the deep learning approach is its capability in constructing a non-linear dynamic model from large data sets, even though the parameters are not necessarily disentangled from the model. However, these big data driven models allow for comparability among different viral populations, even between those with high divergence. It is surmised that physical and biological processes are not infinite in complexity, and that there is repeatability in these processes, so these approaches are therefore capable of capturing the salient parts of one process for application to other similar processes, such as the recombinational processes.

One way to further describe the above concept is to suggest that there is commonality in the evolutionary mechanisms of viruses with different nucleic acid types, and that the recombinational processes are generalizable across taxonomic groups. More specifically, it should be possible to model the history of an influenza viral type and predict future variants, not just from sequence data of one population, but also by inclusion of data from other populations of viral influenza. This leads to the potential for deep learning approaches to learn the generalizable mechanisms of evolution and its processes. The learning process is opaque in the case of these methods, but the main goal is predictability, not necessarily explainability of the parameters for mechanistic knowledge of the system.

Even with the sparse and incomplete representation of viruses in the genetic sequence databases, the deep learning approaches are resistant to these limitations of data size. The architecture of the deep learning approach, and the structure of the data, are also important aspects for building a model that has utility. The potential for solutions is dependent on a feedback process of architecture design, the form of the data, and continual testing for validity against Nature.

3.2. Future Directions

These approaches are better at reconstructing the past, as expected, but their predictability of future viral variants is measurable and testable. They also may incorporate elements of pathogenicity and virulence for a better understanding of the viral types in the past and for new variants in the future. This applies to both the world of Nature and an imaginary world of viral populations.

3.2.1. Mutability of a Viral Type

Furthermore, the use of deep learning to generate new viral variants provides a measure of the mutability of a virus, particularly where compared against known mechanisms of change in viral populations. It may also capture some of these mechanisms, such as the constraint on the genetic composition of a virus. An example is in the Influenza type A virus and its genome of RNA segments[8]. This viral type is known to undergo a process of reassortment of these segments that leads to generation of new variants[9]. However, there is also physical constraint on the evolution of this virus and its protein shell, and this creates a restrictive upper bound on the number of the segments[10].

There are other constraints as enforced by natural selection in the world of viruses and their populations. Another example is the potential for the overlapping of genes[11] in the viral genome, particularly where the upper bound on genome size is overly restrictive on necessary functions for virus survival. Along with the method of viral replication, these constraints lead to favoring particular kinds of mutation for nature's experiments that lead to new viral variants.

3.2.2. Interactions between Host and Virus

The virus' life cycle is intertwined with the host and its cellular defenses. An example is in jawed vertebrates which its host cells learn of an intracellular pathogen by processing proteins into smaller peptides. Specific peptide sequences are then bound to receptor proteins on the surface of the host cell, and this protein interaction creates a structure that is detected by specific immune cells which are continually surveilling host cells for pathogenic peptides[12]. This is not a process without error, but this is a major pathway of vertebrate immunity to thwart the replication of intracellular pathogens.

The mutability of a virus is the key component for its ability to survive in a host population. The parameters and constraints on mutability are expected to be generalizable, and so these biological processes are expected to be captured by a big data model and a robust deep learning approach.

3.2.2. Pathogenicity and Virulence

A pathogenic viral type may be thought of as parts that are necessary for replication of its genes and other parts that are in effect pathogenic. The classification of these functions into these convenient categories allows for prediction of recombinational events. This is observed in the Influenza viral group where the segments are recombined among individuals in the larger viral population. This may lead to changes in virulence and pathogenicity.

Viruses may also acquire genetic pieces from unrelated viral types, but this is expectedly rarer in occurrence. Further, there are genetic interactions in a virus beyond reductive descriptions for single gene function, so it is difficult in predicting the virus function from genes alone.

The above examples illustrate the difficulty in predicting viral function from genetic sequence. This is another area where a deep learning approach has potential for predictability and modeling pathogenicity and virulence of a new variant. Lastly, the interactions of a pathogen with the immune system provides additional information, such as the pathogenic peptides processed by the host, for making better predictions about a new viral variant.

References

  1. Holmes, E.C., 2011. What does virus evolution tell us about virus origins?. Journal of virology, 85(11), pp.5247-5251.
  2. Lawrence, J.G., Hatfull, G.F. and Hendrix, R.W., 2002. Imbroglios of viral taxonomy: genetic exchange and failings of phenetic approaches. Journal of bacteriology, 184(17), pp.4891-4905.
  3. LeCun, Y., Bengio, Y. and Hinton, G., 2015. Deep learning. nature, 521(7553), pp.436-444.
  4. Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., Zídek, A., Potapenko, A. and Bridgland, A., 2021. Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), pp.583-589.
  5. Fauquet, C.M., 1999. Taxonomy, classification and nomenclature of viruses. Encyclopedia of virology, p.1730.
  6. van Dijk, J.G., Verhagen, J.H., Wille, M. and Waldenström, J., 2018. Host and virus ecology as determinants of influenza A virus transmission in wild birds. Current opinion in virology, 28, pp.26-36.
  7. Qin, Y., Frosst, N., Sabour, S., Raffel, C., Cottrell, G. and Hinton, G., 2019. Detecting and diagnosing adversarial images with class-conditional capsule reconstructions. arXiv:1907.02957.
  8. Enami, M., Sharma, G., Benham, C. and Palese, P., 1991. An influenza virus containing nine different RNA segments. Virology, 185(1), pp.291-298.
  9. Steel, J. and Lowen, A.C., 2014. Influenza A virus reassortment. Influenza Pathogenesis and Control-Volume I, pp.377-401.
  10. Brandes, N. and Linial, M., 2016. Gene overlapping and size constraints in the viral world. Biology direct, 11(1), pp.1-15.
  11. Chirico, N., Vianelli, A. and Belshaw, R., 2010. Why genes overlap in viruses. Proceedings of the Royal Society B: Biological Sciences, 277(1701), pp.3809-3817.
  12. Swain, S.L., 1983. T cell subsets and the recognition of MHC class. Immunological reviews, 74, pp.129-142.
More
This entry is offline, you can click here to edit this entry!
Academic Video Service