Deep Learning Architectures from A Genomic Perspective: Comparison
Please note this is a comparison between Version 2 by Jessie Wu and Version 1 by Yuanxin Wang.

The data explosion driven by advancements in genomic research, such as high-throughput sequencing techniques, is constantly challenging conventional methods used in genomics. In parallel with the urgent demand for robust algorithms, deep learning has succeeded in various fields such as vision, speech, and text processing.

  • deep learning
  • genomics
  • large language model
  • computer vision
  • multi-modal machine learning

1. Introduction

Various deep learning algorithms have their own advantages to resolve particular types of problems in genomic applications (see a comprehensive list in Table 1). For example, convolutional neural networks (CNNs) that are famous for capturing features in image classification tasks have been widely adopted to automatically learn local and global characterizations of genomic data. RNNs that succeed in speech recognition problems are skillful at handling sequence data and thus were mostly used to deal with DNA sequences. Autoencoders are popular for both pre-training models and denoising or pre-processing the input data. LLMs are known for their emergent capabilities in dealing with extremely long-range interactions in sequences. When designing deep learning models, researchers could take advantage of these merits to efficiently extract reliable features and reasonably model the biological process. For example, with sufficient labeled data, traditional CNNs and RNNs might be used as solid baselines; when robust representations are needed for various downstream tasks, VAEsvariational autoencoders (VAEs) could be a good point to start; if the capability of coping with long input sequences is required, LLMs should come into play. 

2. Convolutional Neural Networks

Convolutional neural networks (CNNs) are one of the most successful deep learning models for image processing owing to their outstanding capacity to analyze spatial information. Early applications of CNNs in genomics relied on the fundamental building blocks of CNNs in computer vision [79][49] to extract features. Zeng et al. [34][4] described the adaptation of CNNs from the field of computer vision to genomics as accomplished by comprehending a window of genome sequence as an image.
The highlight of CNNs is the dexterity of automatically performing adaptive feature extraction during the training process. For instance, CNNs can be applied to discover meaningful recurring patterns with small variances, such as genomic sequence motifs. This makes CNNs suitable for motif identification and therefore binding classification [35][5].
Recently, CNNs have been shown to take a lead among current algorithms for solving several sequence-based problems. Alipanahi et al. [31][1], DeepBind, and [34][4] successfully applied CNNs to model the sequence specificity of protein binding. Zhou and Troyanskaya [32][2] (DeepSEA) developed a conventional three-layer CNN model to predict from only genomic sequence the effects of noncoding variants. Kelley et al. [36][6] and Basset adopted a similar architecture to study the functional activities of DNA sequences.
Although multiple researchers have demonstrated the superiority of CNNs over other existing methods, inappropriate structure design would still result in even poorer performance than conventional models. For example, Zeng et al. [34][4] conducted a comprehensive analysis of CNN networks of various architectures on the task of motif discovery and motif occupancy in genomic sequences, and they showed that although an increasing number of convolutional kernels generally increases model performance, the performance may be indifferent or even negatively impacted by an increasing number of convolutional layers and inappropriate pooling methods. Therefore, what remains is for researchers to master and optimize the ability of CNNs to skillfully match a CNN architecture to each particular given task. To achieve this, researchers should have an in-depth understanding of CNN architectures as well as take into consideration the biological background. Zeng et al. [34][4] developed a parameterized convolutional neural network to conduct a systematic exploration of CNNs on two classification tasks, motif discovery, and motif occupancy. They performed a hyper-parameter search using Mri (https://github.com/Mri-monitoring/Mri-docs/blob/master/mriapp.rst (accessed on 14 September 2023)) and mainly examined the performance of nine variants of CNNs, and they concluded that CNNs do not need to be deep for motif discovery tasks as long as the structure is appropriately designed. When applying CNNs in genomics, simply changing the network depth would not account for much improvement in model performance. This is because deep learning models are usually over-parameterized, meaning there are more parameters in the neural network than what is actually required to complete the task [80][50]. In this direction, Xuan et al. [44][14] designed a dual CNN with attention mechanisms to extract deeper and more complex feature representations of lncRNA (long noncoding RNA genes); while Kelley et al. [43,45][13][15] took a different path in using dilated convolution instead of classical convolution to share information across long distances without adding depth indefinitely.

3. Recurrent Neural Networks

Recurrent neural networks (RNNs) raised a surge of interest owning to their impressive performance on sequential prediction problems such as language translation, summarization, and speech recognition. RNNs outperform CNNs and other early deep neural networks (DNNs) on sequential data thanks to their capability of processing long ordered sequences and memorizing long-range information through recurrent loops. Specifically, RNNs scan the input sequences sequentially and feed both the previously hidden layer and current input segment as the model input so that the final output implicitly integrates both current and previous information in the sequence. Schuster and Paliwal [81][51] later proposed bidirectional RNN (BRNN) for use cases where both past and future contexts in the input matter.
The cyclic structure makes a seemingly shallow RNN over long-time prediction actually very deep if unrolled in time. To resolve the vanishing gradient problem rendered by this, Hochreiter and Schmidhuber [20][52] substituted the hidden units in RNNs with LSTM units to truncate the gradient propagation. Cho et al. [82][53] introduced Gated Recurrent Units (GRUs) with a similar proposal.
Genomics data are typically sequential and often considered languages of biological nature. Recurrent models are thus applicable in many scenarios. For example, Cao et al. [50][20] (ProLanGO) built an LSTM-based Neural Machine Translation, which converts the task of protein function prediction to a language translation problem by interpreting protein sequences as the language of Gene Ontology terms. Boža et al. [52][22] developed DeepNano for base calling, Quang and Xie [49][19] proposed DanQ to quantify the function of noncoding DNA, Sønderby et al. [48][18] devised a convolutional LSTM to predict protein subcellular localization from protein sequences, Busia et al. [73][43] applied the idea of seq-to-seq learning to their model for protein secondary structure prediction conditioned on previously predicted labels, and Wang et al. [83][54] used bidirectional LSTM (Bi-LSTM) in their prPred-DRLF predictor for plant resistance protein detection, demonstrating effective crossovers between natural language processing (NLP) and genomics [84][55]. Furthermore, sequence-to-sequence learning for genomics is boosted by attention mechanisms: Singh et al. [53][23] introduced an attention-based approach where a hierarchy of multiple LSTM modules are used to encode input signals and model how various chromatin marks cooperate; similarly, Shen et al. [85][56] used LSTM as a feature extractor and attention modules as importance scoring functions to identify regions of the RNA sequence that bind to proteins.

4. Autoencoders

Autoencoders, conventionally used as pre-processing tools to initialize the network weights, have been extended to stacked autoencoders (SAEs; [86][57]), denoising autoencoders (DAs; [87][58]), contractive autoencoders (CAEs; [88][59]), etc. Now they have proved successful in feature extraction because of being able to learn a compact representation of input through the encode–decode procedure. For example, Gupta et al. [89][60] applied stacked denoising autoencoders (SDAs) for gene clustering tasks. They extracted features from data by forcing the learned representation resistant to a partial corruption of the raw input. More examples can be found in Section 4.1.1. Autoencoders are also also used for dimension reduction in gene expression, e.g., [90,91,92][61][62][63]. When applying autoencoders, one should be aware that better reconstruction accuracy does not necessarily lead to model improvement [93][64].
Variational autoencoders (VAEs), though named “autoencoders”, were rather developed as an approximate-inference method to model latent variables. Based on the structure of autoencoders, Kingma and Welling [94][65] added stochasticity to the encoded units and added a penalty term encouraging the latent variables to produce a valid decoding. VAEs aim to deal with the problems in which each datum has a corresponding latent representation and are thus useful for genomic data, among which there are complex interdependencies. Rampasek and Goldenberg [93][64] presented a two-step VAE-based model for drug response prediction, which first predicts the post- from the pre-treatment state in an unsupervised manner and then extends it to the final semi-supervised prediction. This model was based on data from Genomics of Drug Sensitivity in Cancer (GDSC; [95][66]) and Cancer Cell Line Encyclopedia (CCLE; [96][67]). VAEs can also be used in many other genomic applications including cancer gene expression prediction [54,97][24][68], single cell feature extraction for unmasking tumor heterogeneity [56][26], metagenomic binning [57][27], DNA methylome dataset construction [55][25], etc.

5. Emergent Deep Architectures

As deep learning is constantly showing success in genomics, researchers are expecting deep learning to show higher accuracy than simply outperforming statistical or machine learning methods. To this end, the vast majority of work nowadays approaches genomic problems from more advanced architectures beyond classic deep architectures or employing hybrid models. Here, wresearchers review some examples of recent appearing deep architectures which skillfully modify or combine classical deep learning models.

5.1. Beyond Classic Models

Most of these emergent advanced architectures are of natural designs modified from classic deep learning models. Researchers began to leverage more genomic intuitions to fit each particular problem with a more advanced and suitable model.
Motivated by the fact that protein folding is a progressive refinement [98][69] rather than an instantaneous process, Lena et al. [99][70] designed DST-NNs for residue–residue contact prediction. It consists of a 3D stack of neural networks in which topological structures (same input, hidden, and output layer sizes) are identical in each stack. Each level of this stacked network can be regarded as a distinct contact predictor and can be trained in a supervised matter to refine the predictions of the previous level, hence addressing the typical problem of vanishing gradients in deep architectures. The spatial features in this deep spatiotemporal architecture refer to the original model inputs, while temporal features are gradually altered so as to progress to the upper layers. Angermueller et al. [100][71] (DeepCpG) took advantage of two CNN sub-models and a fusion module to predict DNA methylation states. The two CNN sub-models take different inputs and thus focus on disparate purposes. The CpG module accounts for correlations between CpG sites within and across cells, while the DNA module detects informative sequence patterns (motifs). Then, the fusion module can integrate higher-level features derived from two low-level modules to make predictions. Instead of subtle modifications or combinations, some works focused on depth, trying to improve the model performance by designing even deeper architectures. Wang et al. [101][72] developed an ultra-DNN consisting of two deep residual neural networks to predict protein contacts from a sequence of amino acids. Each of the two residual nets in this model has its particular function. A series of 1D convolutional transformations are designed for extracting sequential features (e.g., sequence profile, predicted secondary structure, and solvent accessibility). The 1D output is converted to a 2D matrix by an operation similar to the outer product and merged with pairwise features (e.g., pairwise contact, co-evolution information, and distance potential). Then, they are together fed into the second residual network, which consists of a series of 2D convolutional transformations. The combination of these two disparate residual nets creates a novel approach that can integrate sequential features and pairwise features in one model.

5.2. Hybrid Architectures

The fact that each type of deep neural network DNN(DNN) has its own strength inspires researchers to develop hybrid architectures that could well utilize the potential of multiple deep learning architectures. DanQ [49][19] is a hybrid convolutional and recurrent DNN for predicting the function of noncoding DNA directly from sequence alone. A DNA sequence is input as the one-hot representation of four bases to a simple convolutional neural network with the purpose of scanning motif sites. Motivated by the fact that the motifs can be determined to some extent by the spatial arrangements and frequencies of combinations of DNA sequences [49][19], the purported motifs learned by CNN are then fed into a Bi-LSTM. Similar convolutional-recurrent designs were further discussed by Lanchantin et al. [58][28] (Deep GDashboard). They demonstrated how to understand three deep architectures—convolutional, recurrent, and convolutional-recurrent networks—and verified the validity of the features generated automatically by the model through visualization techniques. They argued that a CNN–RNN architecture outperforms CNN or RNN alone based on their experimental results on a transcription factor binding site (TFBS) classification task. The feature visualization achieved by Deep GDashboard indicated that CNN–RNN architecture is able to model both motifs as well as dependencies among them. Sønderby et al. [48][18] added a convolutional layer between the raw data and LSTM input to address the problem of protein sorting or subcellular localization. In total, there are three types of models proposed and compared in the paper: a vanilla LSTM, an LSTM with an attention model used in a hidden layer, and an ensemble of ten vanilla LSTMs. They achieved higher accuracy than previous benchmark models in predicting the subcellular location of proteins from DNA sequences while no human-engineered features were involved. Almagro Armenteros et al. [60][30] proposed a hybrid integration of an RNN, a Bi-LSTM, an attention mechanism, and a fully connected layer for protein subcellular localization prediction; each of the four modules is designed for a specific purpose. These hybrid models are increasingly favored by recent research, e.g., [59][29].
Hybrid architectures allow flexible network design by selecting specific components with proven success representing different types of information in genomic sequences. For example, in both [49[19][32],62], CNN layers have been included to generate representations on local patterns such as regulatory motifs in DNA sequences, while RNN and attention modules are used to encode information on long-range dependence. Although hybrid architectures built on existing successful models have been proven to improve performance over single architecture, there still lacks a systematic principle or algorithm for designing or even optimizing network architecture for deep learning models in genomics.

6. Transformer-Based Large Language Models

As mentioned, many prior deep learning works utilized CNNs and RNNs to solve genomics tasks. However, there are several intrinsic limitations of these two architectures. (1) CNNs might fail to capture the global understanding of a long DNA sequence due to its limited receptive field. (2) RNNs could have difficulty in capturing useful long-term dependencies because of vanishing gradients and suffer from low-efficiency problems due to their non-parallel sequence processing nature. (3) Both architectures need extensive high-quality labeled data to train. These limitations hinder them from coping with harder genomics problems since these tasks usually require the model to (1) understand long-range interactions, (2) process very long sequences efficiently, and (3) perform well even for low-resource training labels.
Transformer-based [21][73] language models such as BERT [102][74] and GPT family [22,23,24][75][76][77] then become a natural fit to overcome these limitations. Their built-in attention mechanism learns better representations that can be generalized to data-scarce tasks via larger receptive fields. Ref. [103][78] found that a pre-trained large DNA language model is able to make accurate zero-shot predictions of noncoding variant effects. Similarly, according to [104][79], these language model architectures generate robust contextualized embeddings on top of nucleotide sequences and achieve accurate molecular phenotype prediction even in low-data settings.
Instead of processing input tokens one by one sequentially as RNNs do, transformers process all input tokens more efficiently at the same time in parallel. However, simply increasing the input context window infinitely is infeasible, since the computation time and memory scale quadratically with context length in the attention layers. Several improvements have been made from different perspectives: Nguyen et al. [70][40] uses the Hyena architecture [105][80] and scales sub-quadratically in context length, while Zhou et al. [68][38] replace k-mer tokenization used in Ji et al. [63][33] with Byte Pair Encoding (BPE) to achieve a 3× efficiency improvement.
In light of dealing with extremely long-range interactions in DNA sequences, the Enformer model [66][36] employs transformer modules that scale a five times larger receptive field compared to previous CNN-based approaches [43[13][15][81],45,106], and it is capable of detecting sequence elements that are 100 kb away. Moreover, the recent success of ChatGPT [107][82] and GPT-4 [108][83] further illustrated the emergent capabilities of large language models (LLMs) to deal with such long DNA sequences. A typical transformer-based genomics foundational model can only take 512 to 4k tokens as input context, which is less than 0.001% of the human genome. Nguyen et al. [70][40] proposed an LLM-based genomic model that expands the input context length to 1 million tokens at the single nucleotide level, which is up to a 500× increase over previous dense attention-based models.
Even with all these advancements in efficiency improvement, the significant training and serving cost still remains a challenging problem for LLMs [109][84], especially for long input context for genomics problems. Furthermore, due to privacy concerns and legal regulations, the generation and collection of large-scale high quality genomics data usually requires complex procedures, which might slow down the iteration of model development.

References

  1. Alipanahi, B.; Delong, A.; Weirauch, M.T.; Frey, B.J. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 2015, 33, 831–838.
  2. Zhou, J.; Troyanskaya, O.G. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 2015, 12, 931–934.
  3. Min, X.; Chen, N.; Chen, T.; Jiang, R. DeepEnhancer: Predicting enhancers by convolutional neural networks. In Proceedings of the 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Shenzhen, China, 15–18 December 2016; pp. 637–644.
  4. Zeng, H.; Edwards, M.D.; Liu, G.; Gifford, D.K. Convolutional neural network architectures for predicting DNA–protein binding. Bioinformatics 2016, 32, i121–i127.
  5. Lanchantin, J.; Singh, R.; Lin, Z.; Qi, Y. Deep motif: Visualizing genomic sequence classifications. arXiv 2016, arXiv:1605.01133.
  6. Kelley, D.R.; Snoek, J.; Rinn, J.L. Basset: Learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 2016, 26, 990–999.
  7. Chen, D.; Jacob, L.; Mairal, J. Predicting Transcription Factor Binding Sites with Convolutional Kernel Networks. bioRxiv 2017, 217257.
  8. Hou, J.; Adhikari, B.; Cheng, J. DeepSF: Deep convolutional neural network for mapping protein sequences to folds. Bioinformatics 2017, 34, 1295–1303.
  9. Pan, X.; Shen, H.B. RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach. BMC Bioinform. 2017, 18, 136.
  10. Schreiber, J.; Libbrecht, M.; Bilmes, J.; Noble, W. Nucleotide sequence and DNaseI sensitivity are predictive of 3D chromatin architecture. bioRxiv 2017, 103614.
  11. Zhang, Y.; An, L.; Hu, M.; Tang, J.; Yue, F. HiCPlus: Resolution Enhancement of Hi-C interaction heatmap. bioRxiv 2017, 112631.
  12. Adhikari, B.; Hou, J.; Cheng, J. DNCON2: Improved protein contact prediction using two-level deep convolutional neural networks. Bioinformatics 2018, 34, 1466–1472.
  13. Kelley, D.R.; Reshef, Y.A.; Bileschi, M.; Belanger, D.; McLean, C.Y.; Snoek, J. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Res. 2018, 28, 739–750.
  14. Xuan, P.; Cao, Y.; Zhang, T.; Kong, R.; Zhang, Z. Dual convolutional neural networks with attention mechanisms based method for predicting disease-related lncRNA genes. Front. Genet. 2019, 10, 416.
  15. Kelley, D.R. Cross-species regulatory sequence activity prediction. PLoS Comput. Biol. 2020, 16, e1008050.
  16. Yang, J.; Anishchenko, I.; Park, H.; Peng, Z.; Ovchinnikov, S.; Baker, D. Improved protein structure prediction using predicted interresidue orientations. Proc. Natl. Acad. Sci. USA 2020, 117, 1496–1503.
  17. Wu, T.; Guo, Z.; Hou, J.; Cheng, J. DeepDist: Real-value inter-residue distance prediction with deep residual convolutional network. BMC Bioinform. 2021, 22, 30.
  18. Sønderby, S.K.; Sønderby, C.K.; Nielsen, H.; Winther, O. Convolutional LSTM networks for subcellular localization of proteins. In Proceedings of the International Conference on Algorithms for Computational Biology, Mexico City, Mexico, 4–5 August 2015; pp. 68–80.
  19. Quang, D.; Xie, X. DanQ: A hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res. 2016, 44, e107.
  20. Cao, R.; Freitas, C.; Chan, L.; Sun, M.; Jiang, H.; Chen, Z. ProLanGO: Protein Function Prediction Using Neural Machine Translation Based on a Recurrent Neural Network. Molecules 2017, 22, 1732.
  21. Liu, B.; Chen, J.; Li, S. Protein remote homology detection based on bidirectional long short-term memory. BMC Bioinform. 2017, 18, 443.
  22. Boža, V.; Brejová, B.; Vinař, T. DeepNano: Deep recurrent neural networks for base calling in MinION nanopore reads. PLoS ONE 2017, 12, e0178751.
  23. Singh, R.; Lanchantin, J.; Sekhon, A.; Qi, Y. Attend and predict: Understanding gene regulation by selective attention on chromatin. Adv. Neural Inf. Process. Syst. 2017, 30, 6785–6795.
  24. Way, G.P.; Greene, C.S. Extracting a biologically relevant latent space from cancer transcriptomes with variational autoencoders. bioRxiv 2017, 174474.
  25. Choi, J.; Chae, H. methCancer-gen: A DNA methylome dataset generator for user-specified cancer type based on conditional variational autoencoder. BMC Bioinform. 2020, 21, 181.
  26. Rashid, S.; Shah, S.; Bar-Joseph, Z.; Pandya, R. Dhaka: Variational autoencoder for unmasking tumor heterogeneity from single cell genomic data. Bioinformatics 2021, 37, 1535–1543.
  27. Nissen, J.N.; Johansen, J.; Allesøe, R.L.; Sønderby, C.K.; Armenteros, J.J.A.; Grønbech, C.H.; Jensen, L.J.; Nielsen, H.B.; Petersen, T.N.; Winther, O.; et al. Improved metagenome binning and assembly using deep variational autoencoders. Nat. Biotechnol. 2021, 39, 555–560.
  28. Lanchantin, J.; Singh, R.; Wang, B.; Qi, Y. Deep GDashboard: Visualizing and Understanding Genomic Sequences Using Deep Neural Networks. arXiv 2016, arXiv:1608.03644.
  29. Singh, S.; Yang, Y.; Poczos, B.; Ma, J. Predicting Enhancer-Promoter Interaction from Genomic Sequence with Deep Neural Networks. bioRxiv 2016, 085241.
  30. Almagro Armenteros, J.J.; Sønderby, C.K.; Sønderby, S.K.; Nielsen, H.; Winther, O. DeepLoc: Prediction of protein subcellular localization using deep learning. Bioinformatics 2017, 33, 3387–3395.
  31. Yang, B.; Liu, F.; Ren, C.; Ouyang, Z.; Xie, Z.; Bo, X.; Shu, W. BiRen: Predicting enhancers with a deep-learning-based model using the DNA sequence alone. Bioinformatics 2017, 33, 1930–1936.
  32. Li, J.; Pu, Y.; Tang, J.; Zou, Q.; Guo, F. DeepATT: A hybrid category attention neural network for identifying functional effects of DNA sequences. Brief. Bioinform. 2021, 22, bbaa159.
  33. Ji, Y.; Zhou, Z.; Liu, H.; Davuluri, R.V. DNABERT: Pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome. Bioinformatics 2021, 37, 2112–2120.
  34. Rives, A.; Goyal, S.; Meier, J.; Guo, D.; Ott, M.; Zitnick, C.L.; Ma, J.; Fergus, R. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl. Acad. Sci. USA 2019, 118, e2016239118.
  35. Elnaggar, A.; Heinzinger, M.; Dallago, C.; Rehawi, G.; Wang, Y.; Jones, L.; Gibbs, T.; Fehér, T.B.; Angerer, C.; Steinegger, M.; et al. ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Deep Learning and High Performance Computing. bioRxiv 2020.
  36. Avsec, Ž.; Agarwal, V.; Visentin, D.; Ledsam, J.R.; Grabska-Barwinska, A.; Taylor, K.R.; Assael, Y.; Jumper, J.; Kohli, P.; Kelley, D.R. Effective gene expression prediction from sequence by integrating long-range interactions. bioRxiv 2021.
  37. Wu, R.; Ding, F.; Wang, R.; Shen, R.; Zhang, X.; Luo, S.; Su, C.; Wu, Z.; Xie, Q.; Berger, B.; et al. High-resolution de novo structure prediction from primary sequence. bioRxiv 2022.
  38. Zhou, Z.; Ji, Y.; Li, W.; Dutta, P.; Davuluri, R.; Liu, H. DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome. arXiv 2023, arXiv:2306.15006.
  39. Weissenow, K.; Heinzinger, M.; Steinegger, M.; Rost, B. Ultra-fast protein structure prediction to capture effects of sequence variation in mutation movies. bioRxiv 2022.
  40. Nguyen, E.; Poli, M.; Faizi, M.; Thomas, A.; Birch-Sykes, C.; Wornow, M.; Patel, A.; Rabideau, C.; Massaroli, S.; Bengio, Y.; et al. HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution. arXiv 2023, arXiv:2306.15794.
  41. Lin, Z.; Akin, H.; Rao, R.; Hie, B.; Zhu, Z.; Lu, W.; Smetanin, N.; Verkuil, R.; Kabeli, O.; Shmueli, Y.; et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 2023, 379, 1123–1130.
  42. Chen, B.; Cheng, X.; Geng, Y.A.; Li, S.; Zeng, X.; Wang, B.; Gong, J.; Liu, C.; Zeng, A.; Dong, Y.; et al. xtrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein. bioRxiv 2023.
  43. Busia, A.; Collins, J.; Jaitly, N. Protein Secondary Structure Prediction Using Deep Multi-scale Convolutional Neural Networks and Next-Step Conditioning. arXiv 2016, arXiv:1611.01503.
  44. Hou, J.; Wu, T.; Cao, R.; Cheng, J. Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13. Proteins Struct. Funct. Bioinform. 2019, 87, 1165–1178.
  45. Senior, A.W.; Evans, R.; Jumper, J.M.; Kirkpatrick, J.; Sifre, L.; Green, T.; Qin, C.; Zídek, A.; Nelson, A.W.R.; Bridgland, A.; et al. Improved protein structure prediction using potentials from deep learning. Nature 2020, 577, 706–710.
  46. Zhang, H.; Shen, Y. Template-based prediction of protein structure with deep learning. BMC Genom. 2020, 21, 878.
  47. Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko, A.; et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–589.
  48. Liu, J.; Wu, T.; Guo, Z.; Hou, J.; Cheng, J. Improving protein tertiary structure prediction by deep learning and distance prediction in CASP14. Proteins Struct. Funct. Bioinform. 2022, 90, 58–72.
  49. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105.
  50. Frankle, J.; Carbin, M. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks. In Proceedings of the 2019 International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019.
  51. Schuster, M.; Paliwal, K.K. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 1997, 45, 2673–2681.
  52. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780.
  53. Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078.
  54. Wang, Y.; Xu, L.; Zou, Q.; Lin, C. prPred-DRLF: Plant R protein predictor using deep representation learning features. Proteomics 2022, 22, 2100161.
  55. Le, N.Q.K. Potential of deep representative learning features to interpret the sequence information in proteomics. Proteomics 2022, 22, 2100232.
  56. Shen, Z.; Zhang, Q.; Han, K.; Huang, D.S. A Deep Learning Model for RNA-Protein Binding Preference Prediction Based on Hierarchical LSTM and Attention Network. IEEE/ACM Trans. Comput. Biol. Bioinform. 2020, 19, 753–762.
  57. Bengio, Y.; Lamblin, P.; Popovici, D.; Larochelle, H. Greedy layer-wise training of deep networks. Adv. Neural Inf. Process. Syst. 2007, 19, 153–160.
  58. Vincent, P.; Larochelle, H.; Bengio, Y.; Manzagol, P.A. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, 5–9 July 2008; pp. 1096–1103.
  59. Rifai, S.; Vincent, P.; Muller, X.; Glorot, X.; Bengio, Y. Contractive auto-encoders: Explicit invariance during feature extraction. In Proceedings of the 28th International Conference on International Conference on Machine Learning, Bellevue, WA, USA, 28 June–2 July 2011; pp. 833–840.
  60. Gupta, A.; Wang, H.; Ganapathiraju, M. Learning structure in gene expression data using deep architectures, with an application to gene clustering. In Proceedings of the 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Washington, DC, USA, 9–12 November 2015; pp. 1328–1335.
  61. Tan, J.; Ung, M.; Cheng, C.; Greene, C.S. Unsupervised feature construction and knowledge extraction from genome-wide assays of breast cancer with denoising autoencoders. In Proceedings of the Pacific Symposium on Biocomputing Co-Chairs 2014, Sydney, Australia, 31 July–2 August 2014; pp. 132–143.
  62. Tan, J.; Hammond, J.H.; Hogan, D.A.; Greene, C.S. Adage-based integration of publicly available pseudomonas aeruginosa gene expression data with denoising autoencoders illuminates microbe-host interactions. mSystems 2016, 1, e00025-15.
  63. Tan, J.; Doing, G.; Lewis, K.A.; Price, C.E.; Chen, K.M.; Cady, K.C.; Perchuk, B.; Laub, M.T.; Hogan, D.A.; Greene, C.S. Unsupervised extraction of stable expression signatures from public compendia with eADAGE. bioRxiv 2017, 078659.
  64. Rampasek, L.; Goldenberg, A. Dr. VAE: Drug Response Variational Autoencoder. arXiv 2017, arXiv:1706.08203.
  65. Kingma, D.P.; Welling, M. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114.
  66. Yang, W.; Soares, J.; Greninger, P.; Edelman, E.J.; Lightfoot, H.; Forbes, S.; Bindal, N.; Beare, D.; Smith, J.A.; Thompson, I.R.; et al. Genomics of Drug Sensitivity in Cancer (GDSC): A resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Res. 2013, 41, D955–D961.
  67. Barretina, J.; Caponigro, G.; Stransky, N.; Venkatesan, K.; Margolin, A.A.; Kim, S.; Wilson, C.J.; Lehár, J.; Kryukov, G.V.; Sonkin, D.; et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 2012, 483, 603–607.
  68. Way, G.P.; Greene, C.S. Evaluating deep variational autoencoders trained on pan-cancer gene expression. arXiv 2017, arXiv:1711.04828.
  69. Min, S.; Lee, B.; Yoon, S. Deep learning in bioinformatics. Brief. Bioinform. 2017, 18, 851–869.
  70. Lena, P.D.; Nagata, K.; Baldi, P.F. Deep spatio-temporal architectures and learning for protein structure prediction. In Proceedings of the Advances in Neural Information Processing Systems 2012, Lake Tahoe, NV, USA, 3–6 December 2012; pp. 512–520.
  71. Angermueller, C.; Lee, H.J.; Reik, W.; Stegle, O. DeepCpG: Accurate prediction of single-cell DNA methylation states using deep learning. Genome Biol. 2017, 18, 67.
  72. Wang, S.; Sun, S.; Li, Z.; Zhang, R.; Xu, J. Accurate de novo prediction of protein contact map by ultra-deep learning model. PLoS Comput. Biol. 2017, 13, e1005324.
  73. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 6000–6100.
  74. Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805.
  75. Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by Generative Pre-Training; OpenAI: San Francisco, CA, USA, 2018.
  76. Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language models are unsupervised multitask learners. OpenAI Blog 2019, 1, 9.
  77. Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901.
  78. Benegas, G.; Batra, S.; Song, Y. DNA language models are powerful zero-shot predictors of non-coding variant effects. bioRxiv 2022.
  79. Dalla-Torre, H.; Gonzalez, L.; Mendoza-Revilla, J.; Carranza, N.L.; Grzywaczewski, A.H.; Oteri, F.; Dallago, C.; Trop, E.; Sirelkhatim, H.; Richard, G.; et al. The Nucleotide Transformer: Building and Evaluating Robust Foundation Models for Human Genomics. bioRxiv 2023.
  80. Poli, M.; Massaroli, S.; Nguyen, E.; Fu, D.Y.; Dao, T.; Baccus, S.; Bengio, Y.; Ermon, S.; Ré, C. Hyena Hierarchy: Towards Larger Convolutional Language Models. arXiv 2023, arXiv:2302.10866.
  81. Zhou, J.; Theesfeld, C.; Yao, K.; Chen, K.; Wong, A.; Troyanskaya, O. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nat. Genet. 2018, 50, 1171–1179.
  82. Schulman, J.; Zoph, B.; Kim, C.; Hilton, J.; Menick, J.; Weng, J.; Uribe, J.F.C.; Fedus, L.; Metz, L.; Pokorny, M.; et al. ChatGPT: Optimizing language models for dialogue. OpenAI blog 2022.
  83. OpenAI. GPT-4 technical report. arXiv 2023, arXiv:2303.08774.
  84. Howell, K.; Christian, G.; Fomitchov, P.; Kehat, G.; Marzulla, J.; Rolston, L.; Tredup, J.; Zimmerman, I.; Selfridge, E.; Bradley, J. The economic trade-offs of large language models: A case study. arXiv 2023, arXiv:2306.07402.
More
ScholarVision Creations