AlphaFold utilizes a deep-learning-based approach to predict protein structure, a problem of profound significance in biology. The AI model has been meticulously trained on a wealth of data derived from the Protein Data Bank, integrating a vast multitude of known protein structures into its learning framework. The system leverages this training to predict the arrangement of amino acids within a protein, generating a comprehensive three-dimensional model that illuminates the protein’s spatial conformation.
2. Historical Deep Learning Methods for Protein–Protein Interaction Analysis
The emergence and development of historical deep learning methodologies for PPI analysis have significantly facilitated the comprehensive understanding of complex cellular processes. They have been instrumental in enabling thorough investigation and prediction of these interactions. In this section, two representative frameworks (PIPR and DPPI) and their limitations are discussed.
The PIPR framework 
introduces an innovative approach for PPI prediction centered around amino acid sequences. This method is anchored in a Siamese architecture, leveraging a deep residual recurrent convolutional neural network (RCNN). The integration of recurrent and convolutional layers allows PIPR to accurately capture fundamental local and sequential attributes inherent in protein sequences. To further augment the feature extraction process, PIPR employs an automatic multi-granular feature selection mechanism. This assists PIPR in identifying and giving precedence to the most informative and distinguishing features within the sequences. In addition to this, PIPR amalgamates diverse aspects of PPI data, which includes sequence similarity, evolutionary preservation, and domain-domain interactions, to establish a comprehensive and thorough predictive model. The DPPI model addresses both homodimeric and heterodimeric protein interactions. It can also replicate binding affinities. The creation of the RCNN employed bidirectional gated recurrent units (i.e., bidirectional-GRU), yet GRUs have demonstrated limited learning efficiency and slow convergence 
The DPPI method 
introduces a distinct approach for PPI prediction by harnessing deep learning techniques. The use of deep Siamese-like CNNs, combined with random projection and data augmentation, allows DPPI to deliver accurate sequence-based PPI predictions. This method concentrates on capturing critical aspects of a protein pair’s composition, which includes the amino acid sequence and the co-occurrence of overlapping sequence motifs. DPPI employs PSI-BLAST to generate probabilistic sequencing profiles for each protein to extract pertinent features, offering a holistic description. The convolutional module, made up of multiple layers, identifies sequence patterns within each protein’s profile. Furthermore, DPPI applies random projection to the representations sourced from the convolutional module, projecting them into two unique spaces. The Siamese-based learning architecture captures the reciprocal influence of protein pairings, allowing for generalization in addressing diverse PPI prediction problems without the necessity for predefined features. However, based on 5-fold cross-validation, DPPI’s performance in terms of PPI prediction accuracy on the S.cerevisiae core dataset was found to be inferior to that of PIPR 
3. Graph Neural Networks for Protein–Protein Interactions
Graph Neural Networks (GNNs) 
have emerged as a versatile and powerful class of methods in the computational prediction of PPIs. They represent a specific form of deep learning architecture specially designed for dealing with data structured as graphs. Given the complex nature of biomolecular data, such as proteins, which can be naturally represented as graphs, GNNs provide a unique opportunity to capture intricate patterns and relationships within these datasets.
In essence, a graph can be seen as a collection of nodes and edges, where nodes represent entities (e.g., proteins), and edges denote relationships or interactions (e.g., PPIs). GNNs take advantage of this structured data format by applying various forms of convolutions directly on the graph, enabling them to learn from both local node features and the broader network topology. This ability is particularly useful in the study of PPIs, where the biological significance of an interaction often depends not only on the properties of the interacting proteins but also on their position and role within the larger protein network.
The unique capacity of GNNs to exploit the underlying structure of graph data is achieved through several key mechanisms. Firstly, GNNs use neighborhood aggregation or message-passing frameworks, wherein each node in the graph gathers information from its local neighbors to update its state. This allows GNNs to incorporate local context into node representations, thereby capturing the immediate interaction dynamics in PPIs. Secondly, through multiple rounds of these aggregations, GNNs can learn increasingly abstract representations of nodes, thereby modeling higher-order interaction effects and uncovering complex interaction patterns.
Various types of GNNs have been employed in the study of PPIs, with each offering unique advantages. Graph Convolutional Networks (GCNs) 
, for instance, are particularly adept at learning from homophily in networks, wherein nodes that are connected or nearby in the graph have similar features. Graph Attention Networks (GATs) 
add another level of sophistication by introducing attention mechanisms that allow different weights to be assigned to different neighbors during the aggregation process. These and other variants of GNNs provide a flexible and robust toolset for tackling the challenging task of PPI prediction.
Research leveraging GNNs for PPI prediction spans a wide range of applications, from identifying specific interaction sites on proteins, predicting the existence of interactions between protein pairs, to classifying proteins based on their interaction profiles. These studies typically involve formulating the PPI problem as a graph-based learning task, such as node classification, link prediction, or graph classification, and employing suitable GNN architectures to solve it.
Recent studies have witnessed a prominent trend in utilizing GNNs for PPI predictions. These studies have explored various models and techniques, aiming to enhance the accuracy and efficiency of PPI prediction tasks. Notably, researchers have focused on leveraging GNNs, such as augmented GATs and GCNs, to capture structural invariance, learn graph representations, and improve prediction performance. Additionally, the integration of multimodal data sources, biological features, and prior knowledge has emerged as a significant aspect of recent research efforts. These studies have demonstrated remarkable advancements in predicting PPIs and utilizing PPI information for various predictive tasks, reinforcing the critical role of deep learning methods, particularly GNNs and GCNs, in advancing our understanding of PPIs and their implications in biological systems. Continued research and methodological advancements are expected to drive further progress in this field. The summary of recent studies can be observed in Table 1.
Table 1. Summary of Contributions in Studies on Graph Neural Networks for Protein–Protein Interactions. Note that each study employed varied datasets, cross-validation methods, and simulation settings for evaluation, making direct comparisons potentially inconclusive. The highest reported accuracy is presented when models were assessed using multiple datasets.
3.1. Pairwise PPI Prediction
Albu et al. 
presented MM-StackEns, a deep multimodal stacked generalization approach for predicting PPIs, employing a Siamese neural network and graph attention networks, with superior performance on Yeast and Human datasets. Similarly, Jha et al. 
used Graph Convolutional Network (GCN) and Graph Attention Network (GAT) for PPI prediction, yielding superior results on Human and S. cerevisiae datasets.
3.2. PPI Network Prediction
Baranwal et al. 
offered Struct2Graph, a graph attention network for structure-based PPI predictions, potentially identifying residues contributing to protein–protein complex formation. Gao et al. 
designed the Substructure Assembling Graph Attention Network (SA-GAT) for graph classification tasks, including potential applications in PPI networks. Zaki et al. 
proposed a method for detecting protein complexes in PPI data using GCNs, formulating protein complex detection as a node classification problem and implementing the Neural Overlapping Community Detection (NOCD) model.
3.3. PPI Site Prediction
Quadrini et al. 
used Graph Convolutional Networks for PPI site prediction, exploring a novel abstraction of protein structure termed as hierarchical representations. Mahbub and Bayzid 
introduced EGRET, an edge aggregated graph attention network for PPI site prediction, reporting significant improvements in performance. Yuan et al. 
proposed GraphPPIS, a deep graph-based framework for PPI site prediction that delivered significantly improved performance over other methods.
Williams et al. 
developed DockNet, a high-throughput protein–protein interface contact prediction model utilizing a Siamese graph-based neural network. Reau et al. 
developed DeepRank-GNN, a graph neural network framework that converts protein–protein interfaces into graphs to learn interaction patterns.
3.5. Auxiliary PPI Prediction Tasks
Azadifar and Ahmadi 
introduced a semi-supervised learning method based on GCNs for prioritizing candidate disease genes. Dai et al. 
formulated PIKE-R2P, a graph neural network method incorporating PPIs for predicting protein abundance from scRNA-seq data. Hinnerichs and Hoehndorf 
developed DTI-Voodoo, a method combining molecular features and PPI networks to predict drug-target interactions. Kim et al. 
proposed DrugGCN for drug response prediction using gene expression data. Wang et al. 
developed SIPGCN, a GCN-based model for predicting self-interacting proteins (SIPs) from sequence information.
The range and depth of these studies underscore the crucial role deep learning methods, particularly GNNs and GCNs, continue to play in advancing PPI predictions. With ongoing research and methodological enhancements, the future promises continued progress in understanding and predicting PPIs and their influence on biological systems.
4. Multi-task or Multi-modal Deep Learning Models for Protein–Protein Interactions
The utilization of multi-task and multi-modal deep learning models 
has been increasingly recognized as an efficient approach to deal with the complexity and heterogeneity of PPI prediction problems. These models are designed to leverage multiple related tasks or multiple sources of information to improve predictive performance, offering a promising direction for the exploration and prediction of PPIs.
Multi-task learning models are designed to improve learning efficiency and predictive performance by learning multiple related tasks concurrently 
. The fundamental concept behind multi-task learning is the sharing of representations among tasks, which can improve the generalization performance by leveraging the commonalities and differences across tasks. In a standard multi-task learning framework, each task has its own specific layers (task-specific layers), while some layers (shared layers) are shared among all tasks. During training, each task’s loss function is typically optimized, and the overall objective is a weighted sum of these individual loss functions. The shared layers learn a representation that captures the common features among tasks, while the task-specific layers learn the unique features for each task.
Multi-modal deep learning models 
, on the other hand, aim to integrate information from multiple sources or modes. The basic principle of multi-modal learning is to construct a joint representation that leverages the complementarity and correlation among different modalities to improve prediction performance. In a standard multi-modal learning framework, the model first learns a representation for each modality using modality-specific layers and then integrates these representations using shared layers. The modalities can be different types of data (e.g., sequence data, structure data), each of which provides a unique perspective on the problem.
In the context of PPI prediction, these methodologies offer several advantages. Multi-task learning models can learn from multiple related tasks (e.g., predicting different types of protein interactions), thereby leveraging the shared information among tasks to improve prediction performance. Similarly, multi-modal models can integrate information from multiple sources (e.g., sequence data, structural data, functional data), thereby leveraging the complementarity among different types of data to obtain a more comprehensive understanding of the protein interaction mechanisms.
Given their potential for dealing with complex and heterogeneous PPI prediction problems, multi-task and multi-modal deep learning models have found broad applications in the PPI field. They have been used to leverage multiple related tasks or multiple sources of information, improving prediction performance and providing a more comprehensive understanding of the protein interaction mechanisms.
Recent studies have focused on the development of multi-task or multi-modal deep learning models to enhance the prediction of PPIs. These models aim to leverage multiple sources of information, such as protein sequences, structural annotations, gene features, multiomics data, and GO information, to improve the accuracy and robustness of PPI predictions. By incorporating various tasks or modalities into the learning process, these models have demonstrated superior performance compared to single-task methods. Additionally, efforts have been made to enhance the interpretability of deep learning models by incorporating explainable features or methodologies. These advancements in multi-task and multi-modal deep learning approaches have opened up new possibilities for predicting PPIs and expanding our understanding of complex biological interactions in diverse areas, including disease research and infectious disease studies. Table 2 outlines the main points from recent research.
Table 2. Summary of Contributions in Studies on Multi-task or Multi-modal Models for Protein-Protein Interactions. Note that each study employed varied datasets, cross-validation methods, and simulation settings for evaluation, making direct comparisons potentially inconclusive. The highest reported accuracy is presented when models were assessed using multiple datasets.
4.1. Pairwise PPI Prediction
A range of models have been proposed to predict pairwise PPIs. For instance, Capel et al. 
proposed a multi-task learning strategy to predict residues in PPI interfaces from protein sequences. Similarly, Li et al. 
developed EP-EDL, an ensemble deep learning model, to predict human essential proteins using protein sequence information. Thi Ngan Dong et al. 
employed a multitask transfer learning approach for predicting PPIs between viruses and human cells, showing the effectiveness of this method across multiple PPI prediction tasks.
4.2. PPI Network Prediction
Several models have been developed to predict PPI networks. Peng et al. 
introduced MTGCN, a multi-task learning method based on the Graph Convolutional Network, to identify cancer driver genes using gene features from the PPI network. Schulte-Sasse et al. 
developed EMOGI, which utilizes graph convolutional networks to integrate multiomics pan-cancer data with PPI networks for cancer gene prediction. Finally, Pan et al. 
proposed DWPPI, a network embedding-based approach that integrates deep neural networks for PPI prediction in plants, demonstrating superior performance across multiple datasets.
4.3. PPI Site Prediction
In the PPI site prediction, Capel et al. 
have demonstrated a promising approach, utilizing a multi-task learning strategy to predict residues in PPI interfaces from protein sequences, outperforming single-task methods significantly.
4.4. Auxiliary PPI Prediction Tasks
A variety of models have been proposed for auxiliary PPI prediction tasks. Linder et al. 
introduced scrambler networks, a feature attribution method designed for discrete sequence inputs, to improve the interpretability of neural networks for biological sequences. These networks have been used for interpreting effects of genetic variants, cis-regulatory elements interactions, and PPI binding specificity. Lastly, Zheng et al. 
developed DeepAraPPI, an integrative deep learning framework for predicting PPIs in Arabidopsis thaliana, demonstrating excellent performance and promising cross-species predictive ability.