You're using an outdated browser. Please upgrade to a modern browser for the best experience.

Version	Summary	Created by	Modification	Content Size	Created at	Operation
1		fusen guo	--	2855	2023-12-20 11:45:54	\|
2	update references and layout	Rita Xu	Meta information modification	2855	2023-12-21 03:14:37	\|

Video Upload Options

We provide professional Academic Video Service to translate complex research into visually appealing presentations. Would you like to try it?

No, upload directly Yes

Confirm

Are you sure to Delete?

Yes No

Cite

If you have any further questions, please contact Encyclopedia Editorial Office.

Select a Style

Zhang, X.; Guo, F.; Chen, T.; Pan, L.; Beliakov, G.; Wu, J. Machine Learning and Deep Learning Techniques. Encyclopedia. Available online: https://encyclopedia.pub/entry/52975 (accessed on 05 December 2025).

Zhang X, Guo F, Chen T, Pan L, Beliakov G, Wu J. Machine Learning and Deep Learning Techniques. Encyclopedia. Available at: https://encyclopedia.pub/entry/52975. Accessed December 05, 2025.

Zhang, Xue, Fusen Guo, Tao Chen, Lei Pan, Gleb Beliakov, Jianzhang Wu. "Machine Learning and Deep Learning Techniques" Encyclopedia, https://encyclopedia.pub/entry/52975 (accessed December 05, 2025).

Zhang, X., Guo, F., Chen, T., Pan, L., Beliakov, G., & Wu, J. (2023, December 20). Machine Learning and Deep Learning Techniques. In Encyclopedia. https://encyclopedia.pub/entry/52975

Zhang, Xue, et al. "Machine Learning and Deep Learning Techniques." Encyclopedia. Web. 20 December, 2023.

Machine Learning and Deep Learning Techniques

Edit

This entry is adapted from the peer-reviewed paper 10.3390/jtaer18040110

The rapid growth of e-commerce has significantly increased the demand for advanced techniques to address specific tasks in the e-commerce field.

machine learning deep learning e-commerce sentiment analysis

1. Introduction

Machine learning is a subset of artificial intelligence (AI) that focuses on developing algorithms and models capable of automatically learning, identifying patterns, and making predictions or decisions from data ^[1]. The field of machine learning encompasses a broad array of methods and algorithms. Some prominent examples of supervised learning methods include linear regression, logistic regression, decision tree, random forest, support vector machine, and artificial neural network techniques ^[2]. On the other hand, unsupervised learning methods typically include K-means clustering, hierarchical clustering, principal component analysis, and matrix factorization ^[3].

Deep learning is a specialized branch of machine learning that emphasizes training artificial neural networks with multiple hidden layers, enabling them to acquire hierarchical representations of data ^[4]. Exceptional accomplishments in diverse domains, including image classification, object detection, speech recognition, and language translation, have been achieved through the use of deep learning approaches ^[5]. The ability to automatically learn intricate features from raw data has positioned deep learning as a pivotal component in modern AI systems ^[6].

E-commerce refers to the buying and selling of goods and services over the Internet, which involves online transactions, electronic payments, and digital interactions between businesses and customers ^[7]. E-commerce has become increasingly popular due to its convenience, wide product range, and global accessibility. It also provides a favorable environment for the application of machine learning and deep learning techniques, due to the availability of vast data sets, the need for personalized experiences, the challenges of fraud detection and security, the potential for supply chain optimization, and the importance of customer sentiment analysis ^[8]. By leveraging these techniques, e-commerce businesses can enhance customer satisfaction, improve operational efficiency, drive sales, and gain a competitive edge in the digital marketplace ^[9].

2. The Utilized Machine Learning and Deep Learning Techniques

2.1. Machine Learning Techniques

Support vector machine (SVM) ^[10] is a machine learning model used for classification and regression. An SVM operates by identifying an optimal hyperplane that maximizes the margin between distinct classes, which is determined by critical data points known as support vectors. It can handle both linearly separable and non-linearly separable data using the kernel trick, such as the linear, polynomial, radial basis function, and sigmoid kernels. It is particularly effective for binary and even multi-class classification problems ^[11].
Decision Tree ^[12] is a model used for prediction tasks, functioning by segmenting the predictor space into simple regions for analysis. It uses a tree-like structure to make decisions based on feature values. At each internal node of the tree, a decision or splitting criterion is applied to determine the best feature and threshold for splitting the data ^[13]. In classification tasks, each leaf node represents a class label while, in regression tasks, the leaf nodes contain the predicted continuous value in that subset.
Random Forest ^[14]^[15] is an ensemble learning method that combines multiple decision trees to make predictions. It enhances classification and regression tasks by training multiple trees on various sub-samples of the data set and aggregating the predictions of individual trees to improve accuracy and prevent over-fitting ^[16].
Naïve Bayes ^[17] is based on the assumption that features are independently and naïvely unrelated to each other. It utilizes the Bayes theorem to calculate the posterior probabilities of classes based on observed feature values. Depending on the assumed distribution type of the features, there are Gaussian, Multinomial, and Bernoulli Naïve Bayes algorithms. Naïve Bayes is widely recognized for its simplicity and efficiency in training and prediction tasks, making it popular for various applications ^[18].
Logistic regression ^[19] utilizes the logistic function or the sigmoid function to estimate the probabilities of inputs belonging to different classes. This method can be extended to softmax regression or multinomial logistic regression by replacing the sigmoid function with the softmax function. Logistic and softmax regression provide straightforward and interpretable approaches to classification problems, allowing for accurate and probabilistic predictions ^[20].
Principal component analysis (PCA) ^[21] is a linear modeling technique used to map high-dimensional input features to a lower-dimensional space, typically referred to as latent factors or principal components. PCA aims to transform the original data into a set of orthogonal components that explain the maximum variance in the data ^[22].
Matrix factorization algorithms ^[23]^[24] work by decomposing the original matrix into two or more lower-dimensional matrices that represent latent factors. These algorithms aim to find lower-rank representations of the data by uncovering the underlying structure or patterns within the matrix ^[25].
K-nearest neighbors (KNN) ^[26] is a non-parametric algorithm that predicts the class label (for classification) or the target value (for regression) of a test instance based on its similarity to its K nearest neighbors in the training data. In classification, the majority vote among the neighbors determines the class label while, in regression, the average (or weighted average) of the target values is taken ^[27].

2.2. Deep Learning Techniques

Deep learning approaches continue to evolve rapidly, with new architectures, algorithms, and techniques having been developed to address various challenges in different domains. Their ability to learn complex representations from data has significantly advanced the field of artificial intelligence and contributed to various groundbreaking applications ^[28].

An Artificial Neural Network (ANN) ^[29] is a computational model inspired by the structure and functionality of biological neural networks in the human brain. It is composed of interconnected artificial neurons or nodes, organized into layers including the input layer, hidden layers, and output layer. The connections between neurons have associated weights, which are adjusted iteratively by propagating the error from the output layer back to the input layer, guided by a defined objective or loss function ^[30].
A Convolutional Neural Network (CNN) ^[31]^[32] consists of convolutional layers that apply filters to extract features from input data, followed by pooling layers to reduce the spatial dimensions. They have demonstrated exceptional performance in image classification, object detection, and image segmentation ^[33].
The Visual Geometry Group network (VGG) ^[34] is a deep convolutional neural network architecture (e.g., with 16–19 convolutional layers) developed by the Visual Geometry Group. It showcases the effectiveness of deep convolutional neural networks in capturing complex image features and hierarchies ^[35].
A Temporal Convolutional Network (TCN) ^[36] utilizes dilated convolutional layers to capture temporal patterns and dependencies in the input data. These dilated convolutions enable an expanded receptive field without significantly increasing the number of parameters or computational complexity.
Recurrent Neural Networks (RNNs) ^[29] are designed to process sequential data and utilize recurrent connections that enable information to be carried across different time steps. The key characteristic of an RNN is its recurrent connections, which create a loop-like structure and allow information to flow in cycles, enabling the network to maintain a form of memory or context to process and remember information from previous steps ^[37].
Long Short-Term Memory (LSTM) ^[38] is a type of RNN architecture that excels at capturing long-term dependencies and processing sequential data. It utilizes a memory cell and a set of gates that regulate the flow of information; in particular, the memory cell retains information over time, the input gate determines which values to update in the memory cell, the forget gate decides what information to discard from the memory cell, and the output gate selects the relevant information to be output at each time step ^[37].
Bidirectional Long Short-Term Memory (BiLSTM) ^[39] combines two LSTMs that process the input sequence in opposite directions: one LSTM processes the sequence in the forward direction, while the other processes it in the backward direction. This bidirectional processing allows the model to capture information from both past and future contexts, providing a more comprehensive understanding of the input sequence. It has demonstrated strong performance in various natural language processing tasks.
The Gated Recurrent Unit (GRU) ^[40] is a simplified alternative to the LSTM network, offering comparable performance with fewer parameters and less computation. In GRU, the update gate determines the amount of the previous hidden state to retain and the extent to which the new input is incorporated. The reset gate controls how much of the previous hidden state is ignored and whether the hidden state should be reset, based on the current input ^[41].
The BiGRU ^[41]^[42] is an extension of the standard GRU, which processes the input sequence in both forward and backward directions simultaneously, resulting in a more comprehensive understanding of the sequence.
The attention-based BiGRU ^[42]^[43] adopts attention mechanisms to dynamically assign different weights to different time steps of the sequence, allowing the model to attend to more informative or salient parts of the input. It has demonstrated superior performance in various natural language processing tasks ^[44].
Reinforcement Learning (RL) ^[45]^[46] involves an agent learning through interactions with an environment, receiving feedback in the form of rewards or punishments based on its actions, and learning a mapping from states to actions that maximize the expected cumulative reward over time ^[47].
Deep Q-Networks (DQN) ^[46] combine reinforcement learning and deep learning, utilizing the deep neural network to approximate the Q-function and then learn optimal policies in complex environments. The Q-function—also known as the action-value or quality function—represents the expected cumulative reward an agent can achieve by taking a specific action in a given state and following a certain policy. In recent years, Deep RL has gained substantial attention and success in various domains, including robotics, game playing, and autonomous systems ^[48].
A Generative Adversarial Network (GAN) ^[49] is composed of a generator network and a discriminator network, which engage in a competitive game. The generator aims to produce synthetic data samples, while the discriminator tries to discern between real and fake samples. Through iterative training in this adversarial process, GANs have exhibited remarkable capabilities in tasks such as image generation, image-to-image translation, and text generation ^[50]^[51].
Transformers ^[52]^[53] are neural networks that use self-attention to capture relationships between words or tokens in a sequence. Self-attention involves calculating attention scores based on the relevance of each element to others, obtaining attention weights through the softmax function, and computing weighted sums using these attention weights. In transformers, the encoder computes representations for each element using self-attention, capturing dependencies and relationships, while the decoder uses this information to generate an output sequence ^[54].
Bidirectional Encoder Representations from Transformers (BERT) ^[55] is a powerful pre-trained language model introduced by Google in 2018. BERT is trained in a bidirectional manner, learning to predict missing words by considering both the preceding and succeeding context, resulting in a better understanding of the overall sentence or document. BERT’s ability to capture contextual information and leverage pre-training has paved the way for advancements in understanding and generating human language ^[56].
Autoencoders ^[57]^[58] are neural networks that learn to reconstruct their input data. They consist of an encoder network that maps input data to a compressed latent space and a decoder network that reconstructs the original data from the latent representation. They can be employed for tasks such as dimensionality reduction, anomaly detection, and generative modeling ^[59].
A Stack Denoising Autoencoder (SDAE) ^[60] is a deep neural network composed of multiple layers of denoising autoencoders. These autoencoders are designed to reconstruct the input data from a corrupted or noisy version, enabling the model to learn robust and informative representations ^[61].
A Deep Belief Network (DBN) ^[62]^[63] is a type of generative deep learning model that consists of multiple layers of stochastic unsupervised restricted Boltzmann machines (RBMs). An RBM is a two-layer neural network with binary nodes that learns representations by minimizing the energy between visible and hidden nodes ^[64].
Graph Neural Networks (GNNs) ^[65]^[66]^[67] are a class of deep learning model designed to learn node representations by aggregating information from neighboring nodes in a graph, which are typically used to capture and propagate information through the graph structure, enabling effective learning and prediction tasks on graph-structured data ^[68].
A Directed Acyclic Graph Neural Network (DAGNN) ^[69] is an architecture specifically designed for directed acyclic graphs, where the nodes represent entities or features, and edges denote dependencies or relationships. DAGNNs can effectively capture complex dependencies and facilitate learning and inference in domains with intricate relationships among variables.

2.3. Optimization Techniques for Machine and Deep Learning

Optimization techniques play a crucial role in machine learning and deep learning algorithms, helping to find the optimal set of parameters that minimize a loss function or maximize a performance metric with the aim of improving the model’s accuracy and generalization ability. Some popular optimization techniques are detailed below ^[70].

Gradient Descent is an iterative algorithm that updates the model’s parameters by moving in the direction of steepest descent of the loss function.
Stochastic Gradient Descent (SGD) is a variant of the Gradient Descent algorithm that is particularly suitable for large-scale data sets. It is widely used in deep learning, where it updates the network parameters based on a randomly selected subset of training examples, called a mini-batch.
Adaptive Moment Estimation is an extension of gradient descent that incorporates adaptive learning rates for different parameters. It dynamically adjusts the learning rate based on the first and second moments of the gradients.
Root Mean Square Propagation is an optimization algorithm that adapts the learning rate individually for each parameter based on the average of past squared gradients.
Adagrad adapts the learning rate for each parameter based on their historical gradients. It places more emphasis on less frequent features by reducing the learning rate for frequently occurring features.

Researchers and practitioners often experiment with different optimization algorithms to achieve better training outcomes.

2.4. Ensemble Techniques for Machine and Deep Learning

Ensemble techniques for machine and deep learning approaches involve combining multiple individual models to create a more powerful and accurate predictive model. By leveraging the strengths and diversity of different models, ensemble techniques often present improved performance and robustness when compared to using a single model ^[71].

Some common ensemble techniques for machine and deep learning are as follows.

Bagging (Bootstrap Aggregating) ^[72] involves training multiple models independently on different subsets of the training data, typically using the same learning algorithm. The final prediction is obtained by averaging or voting the predictions of the individual models. Random Forest is an example of a popular ensemble method that utilizes bagging ^[73].
AdaBoost (Adaptive Boosting) ^[74] sequentially trains multiple homogeneous weak models and adjusts the weights of the training examples to emphasize misclassified instances. The final prediction is a weighted combination of the predictions from the individual models, with more weight given to more accurate models ^[75].
Gradient Boosting ^[76] is an advanced boosting methodology that incorporates the principles of gradient descent for optimization purposes. It assembles an ensemble of weak learners in a sequential manner. The primary objective during this iterative process is for each subsequent model to specifically address and minimize the residual errors—also referred to as gradients—with respect to a pre-determined loss function ^[77].
XGBoost (Extreme Gradient Boosting) ^[78] is an optimized and highly efficient implementation of gradient boosting. It introduces regularization techniques to control model complexity and prevent over-fitting and uses a more advanced construction to provide parallel processing capabilities to accelerate training on large data sets. It also offers built-in functionality for handling missing values, feature importance analysis, and early stopping ^[79].
Stacking ^[80]^[81] enhances the predictive accuracy by integrating heterogeneous weak learners. These base models are trained in parallel to provide a range of predictions, upon which a meta-model is subsequently trained, synthesizing them into a unified final output. This not only leverages the strengths of individual models, but also reduces the risk of over-fitting.

Ensemble techniques can enhance model performance by reducing over-fitting, increasing model stability, and capturing diverse aspects of the data. They are widely used in various domains and have been shown to improve performance in tasks such as classification, regression, and anomaly detection.

2.5. Techniques to Prevent Over-Fitting and Improve Generalization

To prevent over-fitting and improve the generalization capability of individual or ensemble models, beside the above-mentioned ensemble methods, several other techniques can be employed ^[82], as detailed below.

Cross-validation ^[83]^[84] is a widely used technique to estimate the performance of a model on unseen data. It involves partitioning the available data into multiple subsets, training the model on a subset, and evaluating its performance on the remaining subset which can guide the selection of hyperparameters and model architecture.
Regularization methods ^[85]^[86], such as L1 and L2 regularization, add a penalty term to the loss function during training. This discourages the model from fitting the training data too closely and encourages simpler and more robust models ^[87].
Dropout ^[88] is a technique commonly used in deep learning models. It randomly deactivates a fraction of the neurons during training, effectively creating an ensemble of smaller sub-networks. This encourages the network to learn more robust and less dependent representations, reducing over-fitting and improving generalization.
Early stopping ^[89]^[90] involves monitoring the model’s performance on a validation set during training and stopping the training process when the performance on the validation set starts to degrade. This prevents the model from over-optimizing the training data and helps to find an optimal point that balances training accuracy and generalization.
Data augmentation ^[91] involves artificially increasing the size of the training set by applying various transformations to the existing data. This introduces diversity into the training data, reducing the risk of over-fitting and helping the model to better generalize to unseen examples.

These techniques, either used individually or in combination, can help to mitigate over-fitting and improve the generalization ability of machine learning and deep learning models, leading to better performance on unseen data.

References

Sarker, I.H. Machine learning: Algorithms, real-world applications and research directions. SN Comput. Sci. 2021, 2, 160.
Burkart, N.; Huber, M.F. A survey on the explainability of supervised machine learning. J. Artif. Intell. Res. 2021, 70, 245–317.
Bi, Q.; Goodman, K.E.; Kaminsky, J.; Lessler, J. What is machine learning? A primer for the epidemiologist. Am. J. Epidemiol. 2019, 188, 2222–2239.
Janiesch, C.; Zschech, P.; Heinrich, K. Machine learning and deep learning. Electron. Mark. 2021, 31, 685–695.
Dong, S.; Wang, P.; Abbas, K. A survey on deep learning and its applications. Comput. Sci. Rev. 2021, 40, 100379.
Sarker, I.H. Deep learning: A comprehensive overview on techniques, taxonomy, applications and research directions. SN Comput. Sci. 2021, 2, 420.
Laudon, K.C.; Traver, C.G. E-Commerce; Pearson Boston: Boston, MA, USA, 2013.
Nosratabadi, S.; Mosavi, A.; Duan, P.; Ghamisi, P.; Filip, F.; Band, S.S.; Reuter, U.; Gama, J.; Gandomi, A.H. Data science in economics: Comprehensive review of advanced machine learning and deep learning methods. Mathematics 2020, 8, 1799.
Song, X.; Yang, S.; Huang, Z.; Huang, T. The application of artificial intelligence in electronic commerce. Proc. J. Phys. Conf. Ser. 2019, 1302, 032030.
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297.
Noble, W.S. What is a support vector machine? Nat. Biotechnol. 2006, 24, 1565–1567.
Breiman, L.; Friedman, J.; Olshen, R.; Stone, C. Cart. In Classification and Regression Trees; Wadsworth Publishing Group: Belmont, CA, USA, 1984.
Charbuty, B.; Abdulazeez, A. Classification based on decision tree algorithm for machine learning. J. Appl. Sci. Technol. Trends 2021, 2, 20–28.
Ho, T.K. Random decision forests. In Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, Canada, 14–16 August 1995; Volume 1, pp. 278–282.
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32.
Speiser, J.L.; Miller, M.E.; Tooze, J.; Ip, E. A comparison of random forest variable selection methods for classification prediction modeling. Expert Syst. Appl. 2019, 134, 93–101.
Hand, D.J.; Yu, K. Idiot’s Bayes—Not so stupid after all? Int. Stat. Rev. 2001, 69, 385–398.
Chen, S.; Webb, G.I.; Liu, L.; Ma, X. A novel selective naïve Bayes algorithm. Knowl.-Based Syst. 2020, 192, 105361.
Cramer, J.S. The Origins of Logistic Regression; University of Amsterdam and Tinbergen Institute: Amsterdam, The Netherlands, 2002.
Pampel, F.C. Logistic Regression: A Primer; Number 132; Sage Publications: New York, NY, USA, 2020.
Hotelling, H. Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 1933, 24, 417.
Xie, X. Principal component analysis. Wiley Interdiscip. Rev. 2019, 12, 273.
Koren, Y.; Bell, R.; Volinsky, C. Matrix factorization techniques for recommender systems. Computer 2009, 42, 30–37.
Lee, D.; Seung, H.S. Algorithms for non-negative matrix factorization. Adv. Neural Inf. Process. Syst. 2000, 13.
Wang, Y.X.; Zhang, Y.J. Nonnegative matrix factorization: A comprehensive review. IEEE Trans. Knowl. Data Eng. 2012, 25, 1336–1353.
Cover, T.; Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 1967, 13, 21–27.
Isnain, A.R.; Supriyanto, J.; Kharisma, M.P. Implementation of K-Nearest Neighbor (K-NN) Algorithm For Public Sentiment Analysis of Online Learning. Indones. J. Comput. Cybern. Syst. 2021, 15, 121–130.
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444.
McCulloch, W.S.; Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 1943, 5, 115–133.
Abiodun, O.I.; Jantan, A.; Omolara, A.E.; Dada, K.V.; Umar, A.M.; Linus, O.U.; Arshad, H.; Kazaure, A.A.; Gana, U.; Kiru, M.U. Comprehensive review of artificial neural network applications to pattern recognition. IEEE Access 2019, 7, 158820–158846.
Fukushima, K. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 1980, 36, 193–202.
Lawrence, S.; Giles, C.L.; Tsoi, A.C.; Back, A.D. Face recognition: A convolutional neural-network approach. IEEE Trans. Neural Netw. 1997, 8, 98–113.
Kattenborn, T.; Leitloff, J.; Schiefer, F.; Hinz, S. Review on Convolutional Neural Networks (CNN) in vegetation remote sensing. ISPRS J. Photogramm. Remote Sens. 2021, 173, 24–49.
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556.
Paymode, A.S.; Malode, V.B. Transfer learning for multi-crop leaf disease image classification using convolutional neural network VGG. Artif. Intell. Agric. 2022, 6, 23–33.
Lea, C.; Vidal, R.; Reiter, A.; Hager, G.D. Temporal convolutional networks: A unified approach to action segmentation. In Proceedings of the 2016 Computer Vision Workshops, Amsterdam, The Netherlands, 8–10, 15–16 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 47–54.
Sherstinsky, A. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Phys. D Nonlinear Phenom. 2020, 404, 132306.
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780.
Graves, A.; Schmidhuber, J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 2005, 18, 602–610.
Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078.
Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv 2014, arXiv:1412.3555.
Yang, Z.; Yang, D.; Dyer, C.; He, X.; Smola, A.; Hovy, E. Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 11–12 June 2016; pp. 1480–1489.
Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473.
Liu, J.; Yang, Y.; Lv, S.; Wang, J.; Chen, H. Attention-based BiGRU-CNN for Chinese question classification. J. Ambient. Intell. Humaniz. Comput. 2019, 10, 1–12.
Kaelbling, L.P.; Littman, M.L.; Moore, A.W. Reinforcement learning: A survey. J. Artif. Intell. Res. 1996, 4, 237–285.
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing atari with deep reinforcement learning. arXiv 2013, arXiv:1312.5602.
Arulkumaran, K.; Deisenroth, M.P.; Brundage, M.; Bharath, A.A. Deep reinforcement learning: A brief survey. IEEE Signal Process. Mag. 2017, 34, 26–38.
Huang, Y. Deep Q-networks. Deep Reinforcement Learning: Fundamentals, Research and Applications; Springer: Berlin/Heidelberg, Germany, 2020; pp. 135–160.
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; Advances in Neural Information Processing Systems 27 (NIPS 2014). Volume 2.
Mirza, M.; Osindero, S. Conditional generative adversarial nets. arXiv 2014, arXiv:1411.1784.
Creswell, A.; White, T.; Dumoulin, V.; Arulkumaran, K.; Sengupta, B.; Bharath, A.A. Generative adversarial networks: An overview. IEEE Signal Process. Mag. 2018, 35, 53–65.
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. Advances in Neural Information Processing Systems 30 (NIPS 2017).
Luong, M.T.; Pham, H.; Manning, C.D. Effective approaches to attention-based neural machine translation. arXiv 2015, arXiv:1508.04025.
Khan, S.; Naseer, M.; Hayat, M.; Zamir, S.W.; Khan, F.S.; Shah, M. Transformers in vision: A survey. ACM Comput. Surv. 2022, 54, 1–41.
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805.
Alaparthi, S.; Mishra, M. Bidirectional Encoder Representations from Transformers (BERT): A sentiment analysis odyssey. arXiv 2020, arXiv:2007.01127.
Kingma, D.P.; Welling, M. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114.
Kramer, M.A. Nonlinear principal component analysis using autoassociative neural networks. AIChE J. 1991, 37, 233–243.
Pinaya, W.H.L.; Vieira, S.; Garcia-Dias, R.; Mechelli, A. Autoencoders. In Machine Learning; Elsevier: Amsterdam, The Netherlands, 2020; pp. 193–208.
Vincent, P.; Larochelle, H.; Lajoie, I.; Bengio, Y.; Manzagol, P.A.; Bottou, L. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 2010, 11, 3371–3408.
Zhang, Z.; Jiang, T.; Li, S.; Yang, Y. Automated feature learning for nonlinear process monitoring–An approach using stacked denoising autoencoder and k-nearest neighbor rule. J. Process Control 2018, 64, 49–61.
Hinton, G.E.; Osindero, S.; Teh, Y.W. A fast learning algorithm for deep belief nets. Neural Comput. 2006, 18, 1527–1554.
Hinton, G.E. Deep belief networks. Scholarpedia 2009, 4, 5947.
Krizhevsky, A.; Hinton, G. Convolutional deep belief networks on CIFAR-10. Unpubl. Manuscr. 2010, 40, 1–9.
Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907.
Scarselli, F.; Gori, M.; Tsoi, A.C.; Hagenbuchner, M.; Monfardini, G. The graph neural network model. IEEE Trans. Neural Netw. 2008, 20, 61–80.
Scarselli, F.; Yong, S.L.; Gori, M.; Hagenbuchner, M.; Tsoi, A.C.; Maggini, M. Graph neural networks for ranking web pages. In Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI’05), Washington, DC, USA, 19–22 September 2005; pp. 666–672.
Zhou, J.; Cui, G.; Hu, S.; Zhang, Z.; Yang, C.; Liu, Z.; Wang, L.; Li, C.; Sun, M. Graph neural networks: A review of methods and applications. AI Open 2020, 1, 57–81.
Shen, W.; Wu, S.; Yang, Y.; Quan, X. Directed acyclic graph network for conversational emotion recognition. arXiv 2021, arXiv:2105.12907.
Curtis, F.E.; Scheinberg, K. Optimization methods for supervised machine learning: From linear models to deep learning. In Leading Developments from INFORMS Communities; INFORMS: Catonsville, MD, USA, 2017; pp. 89–114.
Ganaie, M.A.; Hu, M.; Malik, A.; Tanveer, M.; Suganthan, P. Ensemble deep learning: A review. Eng. Appl. Artif. Intell. 2022, 115, 105151.
Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140.
Altman, N.; Krzywinski, M. Ensemble methods: Bagging and random forests. Nat. Methods 2017, 14, 933–935.
Freund, Y.; Schapire, R.E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 1997, 55, 119–139.
Schapire, R.E.; Singer, Y. Improved boosting algorithms using confidence-rated predictions. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory, Madison, WI, USA, 24–26 July 1998; pp. 80–91.
Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232.
Natekin, A.; Knoll, A. Gradient boosting machines, a tutorial. Front. Neurorobot. 2013, 7, 21.
Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794.
Rusdah, D.A.; Murfi, H. XGBoost in handling missing values for life insurance risk prediction. SN Appl. Sci. 2020, 2, 1–10.
Wolpert, D.H. Stacked generalization. Neural Netw. 1992, 5, 241–259.
Breiman, L. Stacked regressions. Mach. Learn. 1996, 24, 49–64.
Ying, X. An overview of overfitting and its solutions. Proc. J. Phys. Conf. Ser. 2019, 1168, 022022.
Stone, M. Cross-validatory choice and assessment of statistical predictions. J. R. Stat. Soc. Ser. B 1974, 36, 111–133.
Bergmeir, C.; Benítez, J.M. On the use of cross-validation for time series predictor evaluation. Inf. Sci. 2012, 191, 192–213.
Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 1996, 58, 267–288.
Hoerl, A.E.; Kennard, R.W. Ridge regression: Applications to nonorthogonal problems. Technometrics 1970, 12, 69–82.
Ghojogh, B.; Crowley, M. The theory behind overfitting, cross validation, regularization, bagging, and boosting: Tutorial. arXiv 2019, arXiv:1905.12787.
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958.
Morgan, N.; Bourlard, H. Generalization and parameter estimation in feedforward nets: Some experiments. In Proceedings of the Advances in Neural Information Processing Systems 2 (NIPS 1989), Cambridge, MA, USA, 1 January 1989.
Prechelt, L. Early stopping-but when? In Neural Networks: Tricks of the Trade; Springer: Berlin/Heidelberg, Germany, 2002; pp. 55–69.
Shorten, C.; Khoshgoftaar, T.M. A survey on image data augmentation for deep learning. J. Big Data 2019, 6, 1–48.

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.

Upload a video for this entry

Information

Subjects: Computer Science, Theory & Methods; Economics

Contributors MDPI registered users' name will be linked to their SciProfiles pages. To register with us, please refer to https://encyclopedia.pub/register : Xue Zhang , Fusen Guo , Tao Chen , Lei Pan ,

Gleb Beliakov

, Jianzhang Wu

View Times: 742

Update Date: 21 Dec 2023

Table of Contents

1000/1000

Hot Most Recent

Notice

You are not a member of the advisory board for this topic. If you want to update advisory board member profile, please contact office@encyclopedia.pub.

Confirm

Only members of the Encyclopedia advisory board for this topic are allowed to note entries. Would you like to become an advisory board member of the Encyclopedia?

Yes