Artificial Intelligence at the Service of Protein Structure

Artificial Intelligence at the Service of Protein Structure: History

View Latest Version

Please note this is an old version of this entry, which may differ significantly from the current revision.

Contributor: Imad Boulos ,

Joy Jabbour

, Serena Khoury , Nehme Mikhael ,

Victoria Tishkova

Nadine Candoni

, Hilda E. Ghadieh , Stéphane Veesler ,

Youssef Bassim

Sami Azar

, Frédéric Harb

Artificial Intelligence (AI) is a field of computer science that aims to create intelligent machines that are capable of performing tasks normally requiring human intelligence, such as learning, problem-solving, and decision-making. AI systems can be designed to carry out a wide range of tasks, including simple ones like recognizing patterns or sorting data, as well as more complex tasks like language translation. Numerous industries, including healthcare, science, banking, transportation, and entertainment, have used AI in a variety of procedures. By automating processes, increasing productivity, and opening up new possibilities, AI has the ability to completely transform many facets of our life. The responsible and beneficial usage of AI technology, however, depends on a number of crucial elements, including ethical concerns, transparency, and responsible development.

biology
AI
algorithms

1. Application of Artificial Intelligence

There are various methods available for creating artificial intelligence (AI) systems, and researchers will discuss a few of them. While decision tree systems offer a visual representation of decisions and their possible outcomes in the form of a tree structure, rule-based systems explicitly encode the rules that the system must obey ^[1]. Machine learning algorithms rely on statistical models and data, making them particularly well-suited for tasks that require adaptability and the ability to learn from the provided data ^[2]. Large datasets may be used to train these algorithms, which can subsequently be used to generate predictions or take action ^[1]. It proved unmatched superiority to human intelligence in many fields, including strategic games like Chess and Go, in addition to other decision making grounds ^[3]. Currently, it is evolving in a non precedented way, affecting our everyday lives in various aspects including labor and daily life activities ^[4]. The enormous jump in establishing self-driving vehicles exemplifies the potential of AI ^[5]. Among these fields, AI has particularly proven its value in biology and healthcare. Automated learning techniques are being applied in molecular biology to analyze tremendous amount of data and build databases ^[6]. Pharmaceutical industries have taken advantage of the analytic and predictive capabilities of machine learning to accelerate drug development by markedly increasing the efficiency of clinical trials, resulting from a better model, conduction, and analysis ^[7]. Genomics studies have also implemented deep learning algorithms to process and analyze huge amounts of intricate datasets ^[8].

2. Artificial Intelligence Methods in Biology

One of the most promising implications of AI in Biology is the emergence and rapid development of AI systems, neural networks, accurately predicting protein structure from its corresponding amino acid sequence, such as AlphaFold, RoseTTAFold and ESMFold ^[9]^[10]. After its success at the Critical Assessment of Structure Prediction (CASP) CASP13 in 2018 ^[11] and further domination in CASP14 in 2020, DeepMind released AlphaFold2 source code to the public ^[12]. Numerous researchers have delved into the creation of neural networks as a result of the advancements in this field. As seen in CASP15, this has sped up the development of protein structure prediction tools. Additionally, the use of these AI systems in their study has helped hundreds of research publications ^[13].

2.1. Alphafold2

Alphafold2 is trained on a large dataset of experimentally sequenced proteins, taking into consideration geometric, physical, and evolutionary constraints affecting protein folding. It runs on a complex system encompassing various steps in generating a prediction. One of these steps is the generation of multi-sequence alignments (MSA) between an unknown sequence and similar sequences from other organisms. In addition to that, it employs transformers, tools that recognize patterns, enabling the system to take into consideration interactions between distant amino acids. No key step was identified experimentally, but rather every step in the system contributes a little in producing an accurate prediction ^[12]. Alphafold2 predicted 98.5% of the human proteome with 58% of confident predictions and 36% of very high confidence, which is a remarkable step forward in the field, since experimentally determined structures consist of 17% of the whole human proteome ^[12].

2.2. RoseTTAFold

RoseTTAFold modified Alphafold2 code, resulting in a neural network that takes into account three aspects simultaneously: the patterns present in protein sequences, the interactions between amino acids within a protein, and the potential three-dimensional structure of the protein. Alphafold2 made more accurate predictions than RoseTTAFold, despite RoseTTAFold accuracy. RoseTTAFold capacity to recognize and simulate multi-protein complexes was, however, one of its benefits ^[14]. This led DeepMind to release their own system precisely trained to predict multimeric protein structures, AlphaFold-Multimer, which successfully predicted 72% of homomeric interactions, of which, 36% are highly accurate, and 70% in heteromeric interactions, 26% predicted with high accuracy, with likelihood for improvements in the future ^[15].

2.3. ESMFold

ESMFold model, which also took inspiration from Alphafold2, presents a system with a different approach, where for example a large language model and disregarded MSA generation are added. Hence, the required processing resources are drastically reduced, and the speed of short sequence prediction is boosted by almost 60 times. However, doing so meant compromising precision. The enhanced prediction speed was utilized to carry out comprehensive structural analysis of proteins in metagenomics on a large scale. 617 million structures predictions from countless microorganisms were made, of which 225 million structures were predicted with high confidence, including proteins distinct from any empirically determined structures, giving biologists insight into some of the most unknown proteins ^[9].

2.4. Improvements

Although these AI systems have made great strides, they still need to be improved. One restriction imposed by GPU memory constraints on the size of protein complexes that may be predicted ^[16] may prevent broad use. Additionally, as the number of chains in the complex rises, accuracy tends to decline ^[16]. One massive disadvantage is its weakness in taking into consideration the effects exerted by the protein environment on its structure, especially the lipid bilayer. Although it excels in predicting isolated soluble proteins, it struggles in predicting membrane proteins ^[17]. Alphafold2 also struggles in performing some of its predictions, for example it cannot foresee uncommon conformations. Ligand interaction and the conformational change therefore induced, the effects of pre-trained model (PTM) on protein folding, in addition to intrinsically disordered proteins (IDPs) containing partly structured sequences, and effects of mutations are all limitations of alphafold2. In addition to that, it is unable to offer insight into protein dynamics and stability ^[18]. However, applying experimental techniques, such as NMR, along with alphafold2 would be especially valuable since they exhibit complementary characteristics that enhance each other’s strengths and compensate for each other’s weaknesses ^[18].

To enhance clarity and facilitate understanding, the information from this section has been consolidated into Table 1 below.

Table 1. Table summarizing AI techniques.

Artificial Intelligence
Technique	Description	Advantages	Limitations	References
RoseTTAFold	“three-track” neural network developed by Baker lab, to predict the 3D structure of proteins from their amino acid sequences	Accurate predictions Capacity to recognize and simulate multi-protein complexes	Limited ability to predict uncommon conformations Struggles with membrane proteins High computational power required	^[14]
AlphaFold2	Deep learning-based AI system developed by DeepMind that accurately predicts the 3D structure of proteins from their amino acid sequences	Highly accurate predictions	Weak in considering protein’s environment Unable to predict uncommon conformations Limited insights into protein dynamics and stability High computational power required Poor ability to recognize and simulate multi-protein complexes	^[12]^[18]
AlphaFold-Multimer	An Alphafold model trained to predict protein-protein complexes	Predicts multimeric protein structures accurately	Improvement potential Limited insights into protein dynamics and stability	^[15]
ESMFold2	AI system developed by meta that predicts protein structures using a large language model trained on a massive dataset of protein sequences.	Faster prediction speed Enables large-scale analysis Lower computational power required	Lower precision Struggles with membrane proteins, limited insights into protein dynamics and stability	^[9]

This entry is adapted from the peer-reviewed paper 10.3390/molecules28207176

References

Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; The MIT Press: Cambridge, MA, USA, 2016.
Russell, S.J.; Norvig, P. Artificial Intelligence: A Modern Approach, 3rd ed.; Pearson: Upper Saddle River, NJ, USA, 2016.
Crandall, J.W.; Oudah, M.; Chenlinangjia, T.; Ishowo-Oloko, F.; Abdallah, S.; Bonnefon, J.F.; Cebrian, M.; Shariff, A.F.; Goodrich, M.A.; Rahwan, I. Cooperating with machines. Nat. Commun. 2017, 9, 233.
Feijóo, C.; Kwon, Y.; Bauer, J.M.; Bohlin, E.; Howell, B.; Jain, R.; Potgieter, P.; Vu, K.; Whalley, J.; Xia, J. Harnessing artificial intelligence (AI) to increase wellbeing for all: The case for a new technology diplomacy. Telecomm. Policy 2020, 44, 101988.
Biggi, G.; Stilgoe, J. Artificial Intelligence in Self-Driving Cars Research and Innovation: A Scientometric and Bibliometric Analysis. Soc. Sci. Res. Netw. 2021, 28.
Rawlings, C.J.; Fox, J.P. Artificial intelligence in molecular biology: A review and assessment. Philos. Trans. R Soc. Lond. B Biol. Sci. 1994, 344, 353–362; discussion 362–353.
Kolluri, S.; Lin, J.; Liu, R.; Zhang, Y.; Zhang, W. Machine Learning and Artificial Intelligence in Pharmaceutical Research and Development: A Review. AAPS J. 2022, 24, 19.
Dias, R.; Torkamani, A. Artificial intelligence in clinical and genomic diagnostics. Genome Med. 2019, 11, 70.
Lin, Z.; Akin, H.; Rao, R.; Hie, B.; Zhu, Z.; Lu, W.; Smetanin, N.; Verkuil, R.; Kabeli, O.; Shmueli, Y.; et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 2023, 379, 1123–1130.
Lee, C.; Su, B.H.; Tseng, Y.J. Comparative studies of AlphaFold, RoseTTAFold and Modeller: A case study involving the use of G-protein-coupled receptors. Brief. Bioinform. 2022, 23, bbac308.
AlQuraishi, M. AlphaFold at CASP13. Bioinformatics 2019, 35, 4862–4865.
Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Zidek, A.; Potapenko, A.; et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–589.
Elofsson, A. Progress at protein structure prediction, as seen in CASP15. Curr. Opin. Struct. Biol. 2023, 80, 102594.
Baek, M.; DiMaio, F.; Anishchenko, I.; Dauparas, J.; Ovchinnikov, S.; Lee, G.R.; Wang, J.; Cong, Q.; Kinch, L.N.; Schaeffer, R.D.; et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 2021, 373, 871–876.
Evans, R.; O’Neill, M.; Pritzel, A.; Antropova, N.; Senior, A.; Green, T.; Žídek, A.; Bates, R.; Blackwell, S.; Yim, J.; et al. Protein complex prediction with AlphaFold-Multimer. bioRxiv 2022, preprint.
Bryant, P.; Pozzati, G.; Zhu, W.; Shenoy, A.; Kundrotas, P.; Elofsson, A. Predicting the structure of large protein complexes using AlphaFold and Monte Carlo tree search. Nat. Commun. 2022, 13, 6028.
Azzaz, F.; Yahi, N.; Chahinian, H.; Fantini, J. The Epigenetic Dimension of Protein Structure Is an Intrinsic Weakness of the AlphaFold Program. Biomolecules 2022, 12, 1527.
Laurents, D.V. AlphaFold 2 and NMR Spectroscopy: Partners to Understand Protein Structure, Dynamics and Function. Front. Mol. Biosci. 2022, 9, 906437.

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.