Machine Learning Algorithms for Depression: Comparison
Please note this is a comparison between Version 1 by Rashid Amin and Version 2 by Jason Zhu.

Over the years, stress, anxiety, and modern-day fast-paced lifestyles have had immense psychological effects on people’s minds worldwide. The global technological development in healthcare digitizes the scopious data, enabling the map of the various forms of human biology more accurately than traditional measuring techniques. Machine learning (ML) has been accredited as an efficient approach for analyzing the massive amount of data in the healthcare domain. ML methodologies are being utilized in mental health to predict the probabilities of mental disorders and, therefore, execute potential treatment outcomes. The ML-based depression detection algorithms are categorized into three classes, classification, deep learning, and ensemble. A general model for depression diagnosis involving data extraction, pre-processing, training ML classifier, detection classification, and performance evaluation is presented.

  • depression
  • machine learning (ML)
  • deep learning (DL)

1. Introduction

The modern age lifestyle has a psychological impact on people’s minds that causes emotional distress and depression [1]. Depression is a prevailing mental disturbance affecting an individual’s thinking and mental development. According to WHO, approximately 1 billion people have mental disorders [2] and over 300 million people suffer from depression worldwide [3]. Depression prevails in suicidal thoughts in an individual. Around 800,000 people commit suicide annually. Therefore, it requires a comprehensive response to deal with the burden of mental health issues [4][5][4,5]. Depression may harm the socio-economic status of an individual. People suffering from depression are more reluctant to socialize. Counseling and psychological therapies can help fight depression. Machine learning (ML) aims at creating algorithms that are equipped with the ability to train themselves to perceive complex patterns. This ability helps to find solutions to new problems by using previous data and solutions. ML algorithms implement processes with regulated and standardized outcomes [6][7][6,7]. Broadly, ML algorithms are categorized into supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning algorithms. The supervised ML algorithms [8] utilize main inputs to predict known values, whereas the unsupervised ML algorithms [9] divulge unidentified patterns and clusters within the given data. Semi-supervised learning [10] is concerned with the working of systems by combining both labeled and unlabeled data, and it lies between supervised and unsupervised learning. Reinforcement learning [11] is concerned with interpreting the environment to undergo desired actions and exhibiting outcomes through trial and error. The applications of ML techniques in healthcare have proven to be pragmatic as they can process a huge amount of heterogeneous data and provide efficient clinical insights. ML-based approaches provide an efficient understanding of mental conditions and assist mental health specialists in predictive decision making [12]. ML techniques benefit the prediction and diagnosis in the healthcare domain by generating information from unstructured medical data. The prediction outcomes help to identify high-risk medical conditions in patients for early treatments [13]. In mental disorders, ML techniques help arbitrate the potential behavioral biomarkers [14] to assist healthcare specialists in predicting the contingencies of mental disorders and administering effective treatment outcomes. The techniques help the visualization and interpretation of complex healthcare data. The visualization helps develop an effective hypothesis regarding the diagnosis of mental disorders. The traditional clinical diagnostic approach for depression does not accurately identify the depression complexity. The composition of the symptoms related to mental disorders such as depression can easily be detected and anticipated by utilizing ML methods. Therefore, the ML-based diagnostic approach seems to be an efficient choice for predictive analysis. In the healthcare sector, the major domains used for extracting observations associated with mental disorders through ML can be classified as sensors, text, structured data, and multimodal technology interactions [14]. The sensors data can be analyzed using mobile phones and audio signals. The text sources can be extracted through social media platforms, text messages, and clinical records. The structured data constitute the data extracted from standard screening scales, questionnaires, and medical health records. The multimodal technology interactions include data from human interactions with everyday technological equipment, robot, and virtual agents. The ML approaches can be used to assist in diagnosing mental health conditions. The majority of the studies analyze Twitter data [15][16][17][15,16,17] and sensors data from mobile devices [18][19][18,19] for identifying mood disorders. Analyzing textual data can help extract diagnostic information from the individual’s psychiatric records [20]. ML approaches can help to predict risk factors in patients with mental disorders. The analysis of sensor data [20], clinical health records [21][22][21,22], and text message data [23] can help predict the severity of mental disorders and suicidal behaviors. Various studies have been put forward to aid medical specialists in identifying depression and multiple other mental disorders. The domain of mental disorders comprises a diverse range of mental illnesses.

2. Classification Models

This section highlights the classification supervised learning models used in several studies for diagnosing depression. A mobile application, Mood Assessment Capable Framework (Moodable), has been presented in [24][45] to interpret voice samples, data from smartphone and social media handles, and Patient Health Questionnaire (PHQ-9) data for assessment of an individual’s mood, mental health, and inferring symptoms of depression by using ML classifiers SVM, KNN, and RF. The framework achieved 76.6% precision for depression assessment. The researcheuthors used six ML classifiers, KNN, Weighted Voting classifier, AdaBoost, Bagging, GB, and XGBoost, in [25][46], to predict depression. SelectKBest, mRMR, and Boruta feature selection techniques were used for feature extraction. For reducing imbalanced classes, SMOTE was applied. They used a dataset of 604 individuals, including the sociodemographic and psychosocial data and the Burns Depression Checklist (BDC) data, among which 65.73% depression prevalence was identified. The analysis indicated that the AdaBoost classifier achieved the highest classification accuracy of 92.56% when used with the SelectKBest algorithm.
An ML model using the RF algorithm has been implemented for the prognosis of depression among Korean adults in [26][47]. SMOTE was applied for class balancing between two classes: depression and non-depression. CES-D-11 was used as a depression screening scale where 10-fold cross-validation was utilized to tune the hyperparameters. A total of 6588 Korean citizen’s data were included in the study; AUROC value was calculated as 0.870 and achieved an accuracy of 86.20%. However, in this study, biomarkers were not included in the dataset. The researcheauthors used three ML algorithms, KNN, RF, and SVM, in [27][48], to diagnose depression among Bangladeshi students. The study aimed at predicting depression at early stages using related features to avoid drastic incidents. The analysis performed over 577 students’ data indicated that the Random Forest algorithm detected the symptoms of depression in the students with 75% accuracy and 60% f-measure.
In [28][49], ensemble learning and DL approaches have been applied to electroencephalography (EEG) features for detecting depression. Deep Forest (DF) and SVM classifiers were used for feature transformation. Image conversion and CNN were used for feature recognition from the EEG spatial information. The ensemble model with DF and SVM obtained 89.02% classification accuracy and the DL approach achieved 84.75% accuracy. In [29][50], ML algorithms DT, RF, Naïve Bayes, SVM, and KNN were used to predict stress, anxiety, and depression. The Depression, Anxiety, and Stress Scale questionnaire (DASS 21) analyzed 348 individuals’ data. The analysis indicated that Naïve Bayes achieved the highest accuracy of 85.50% for predicting depression. Based on F1 scores, the RF algorithm was more efficient in the case of imbalanced classes. In [30][51], the researchersuthor used the sentiment and linguistic analysis with ML to discriminate between depressive and non-depressive social content. RF with RELIEFF feature extractor, LIWC text-analysis tool, and the Hierarchical Hidden Markov Model (HMM) and ANEW scale were used to analyze 4026 social media posts with an accuracy of 90% depressive posts classification, 92% depression degree classification, and 95% depressive communities classification. However, this study takes all depression categories as a single class. Sharma et al. [31][52] used the XGBoost algorithm on data samples to diagnose mental disorders in the given data. Different sampling techniques were applied to the dataset. The dataset used in this study had imbalanced classes. The study achieved more than 0.90 values for accuracy, precision, recall, and F1 score.
Generalized Anxiety Disorder (GAD) is difficult to perceive and distinguish from major depression (MD) in a clinical framework. In [32][53], a multi-model ML algorithm was presented to distinguish GAD from MD using structural MRI data and clinical and hormonal information. Conclusively, MRI data provided accumulative data to the GAD classification. However, the sample size and accuracy needed to be increased, and the groups were unbalanced. Xiang et al. [33][54] used a multikernel SVM with minimum spanning tree (MST) and Kolmogorov–Smirnov test for feature selection. The proposed approach provided a conducive network analysis. A total of 38 MDD patients and 28 healthy controls were included in the dataset. The presented approach achieved 97.54% accuracy.

Discussion of Classification Models

The multikernel SVM proposed in [33][54] with a high-order MST achieved the highest 97.54% MDD classification accuracy among the reviewed studies. The multikernel SVM model provides dynamic changes in the functional association between brain fragments. The integration of multiple kernels can enhance classification. Another model with an efficient classification accuracy was presented in [25][46], which achieved 92.56% classification accuracy using the AdaBoost with SelectKBest feature selection method and SMOTE for balancing the classes. AdaBoost falls under the category of DT Ensemble. By comparing both the studies [25][33][46,54], it can be concluded that in [25][46], no biomarker was included in the dataset, while in [33][54], the dataset used was limited and there was no identification of any depression screening scale. Considering the studies [24][27][28][29][32][33][45,48,49,50,53,54], SVM has been the most used classifier for the detection of depression as it works well on unstructured and high-dimensional data. SVM is also resistant to overfitting. For data with an anonymous and irregular distribution, SVM can be proved to be an efficient algorithm.
Random Forest (RF) is the second most used classifier in the reviewed studies [24][26][27][29][30][45,47,48,50,51] as it is a computationally efficient algorithm. In [30][51], the RF model achieved 90, 95, and 92% accuracy for classifying depressive posts, depressive communities, and depression degrees. RF enhances the classification accuracies of continuous data by reducing the overfitting in decision trees. As RF is based on ensemble learning; it allows determining complex and straightforward functions more accurately.

3. Deep Learning Models

This section highlights the deep learning models presented in multiple studies to detect depression. An artificial intelligence mental evaluation (AiME) framework [34][55] has been presented in a study for detecting symptoms of depression using multimodal deep networks-based human–computer interactive evaluation. The framework was applied to audio, video, and speech responses of 671 participants and PHQ-9 data. The researcheuthors of [35][56] discuss the multimodal stress detection using fusion of machine learning algorithms. In [35][56], a DL framework based on EEG data have been suggested for the automatic analysis of depression. The framework includes two DL models; one-dimensional convolutional neural network (1DCNN) and a combination of 1DCNN and LSTM model have been utilized. The dataset used in the study contained 30 healthy and 33 MDD patients’ EEG data and quantitative information. BDI-II and HADS were used as the assessment scales. The framework achieved an overall classification accuracy of 98.32%. Erguzel, Sayar et al. [36][57] presented a hybridized methodology using PSO and ANN to distinguish between unipolar and bipolar depression based on EEG recordings. The presented ANN–PSO approach discriminated 31 bipolar and 58 unipolar subjects with 89.89% accuracy. SCID-I, HDRS 17-item version, YMRS, DSM-IV, and HADS were used as the assessment scales. However, this study used limited datasets.
Feng et al. [37][58] presented the X-A-BiLSTM model for diagnosing depression from social media data. The XGBoost component helped reduce imbalanced classes, and the Attention-BiLSTM neural network component enhanced the classification capacity. The RSDD dataset with approximately 9000 depressed users and 107,000 control users was used in the study. However, no standard screening scale for depression was used in their work. In [38][59], a novel approach was presented to optimize word embedding for classification. The proposed approach outperformed the previous state-of-the-art models on the RSDD dataset. The comparative evaluation was performed on some DL models for diagnosing depression from tweets on the user level. The experiments were performed on two publicly available datasets, CLPsych 2015 and Bell Let’s Talk. Results showed that CNN-based models performed better than RNN-based models. However, the word embedding models did not perform efficiently with larger datasets.
Zogan et al. [38][59] presented interpretive Multimodal Depression Detection with Hierarchical Attention Network (MDHAN) to detect depressed people on social media. User posts along with Twitter-based multimodal features were considered. The semantic sequence features were captured from the individuals’ profiles. MDHAN outperformed other baseline methods. It determined that combining DL with multi-model features can be effective. MDHAN achieved excellent performance and ensured adequate evidence to explain the prediction with an accuracy of 89.5%. However, this study needs to use a standard dataset of Twitter users because the social media data may be vague and can manipulate the experimental outcome. In [39][60], deep convolutional neural networks (DCNN) are designed to learn deep-learned characteristics from spectrograms and raw voice waveforms in the first place. To improve the depression recognition performance, we suggest using joint fine-tuning layers to merge the raw and spectrogram DCNN.
He and Cao [39][60] used DCNN to enhance depression classification. DCNN with LLD and MRELBP texture descriptors were applied on 100 training, 100 development, and 100 testing samples. AVEC2013 and AVEC2014 datasets were combined. The results were the MAE of 8.1901 and the RMSE of 9.8874 for the combined dataset. In [40][61], the researcheuthors presented a model for diagnosing mild depression by processing EEG signals using CNN. The model used four functional connectivity metrics (coherence, correlation, PLV, and PLI). The model obtained a classification accuracy of 80.74%. Only functional connectivity matrices are used in the research, and other metrics need to be used for evaluation. Ahmed et al. [41][62] discussed early depression diagnosis by analyzing posts of Reddit users using a DL-based hybrid model. BiLSTM with Glove, Word2Vec, and Fastext embedding techniques, Meta-Data features, and LIWC were applied on 401 (for testing) and 486 (for training) with 531,453 posts for depression detection. Beck Depression Inventory (BDI) was used as an assessment scale. The proposed model obtained F1 score, precision, and recall of 81, 78, and 86%, respectively. 

Discussion of Deep Learning Models

The reviewed studies used various DL models with different feature extraction and word embedding techniques in this section. The different DL models presented in [35][56] showed efficient discrimination between depressed and healthy controls. The 1DCNN achieved the highest classification accuracy of 98.32% and the one-dimensional DCNN with LSTM achieved an accuracy of 95.97%. The DL models automatically discriminate EEG signal patterns.
In the majority of the studies [35][36][40][56,57,61], EEG data have been utilized to diagnose the symptoms of depression in the participants. EEG patterns can help to indicate abnormalities in brain functions and irregular emotional alternations. The EEG signals resemble waves with peaks and valleys with the help of which irregularities can be identified. In [35][56], a variant of CNN, namely DCNN, was applied over EEG signals to diagnose unipolar depression. In [36][57], a hybrid model of ANN with PSO algorithm was used to discriminate unipolar and bipolar disorders based on EEG recordings, thereby achieving 89.89% accuracy. In [40][61], a CNN classification model for diagnosing mild depression by processing the EEG signals was used, and the model achieved 80.74% accuracy using the coherence functional connectivity metric. It can be concluded that EEG-based diagnosis is an efficient and cost-effective method for understanding brain activity and the neural that correlates with social anxiety. 

4. Ensemble Models

This section briefly highlights different ensemble models presented in the reviewed studies for the diagnosis of depression. In [42][64], ML and statistical models were used to predict clinical depression and MDD among individuals suffering from immune-mediated inflammatory disease (IMID) by identifying patient-reported outcome measures (PROMs). LR, NN, and RF algorithms were used to analyze a dataset of 637 IMID patients. In [43][65], long short-term memory (LSTM) and six ML models including LR, logistic regression with lasso regularization, RF, gradient boosted decision tree (GBDT), SVM, and deep neural network (DNN) were used. LSTM has been applied to predict the level of different depression risk factors over the course of two years. The dataset contained 1538 data of elderly people in China using the Chinese Longitudinal Healthy Longevity Study (CLHLS). The results indicated that logistic regression with lasso regularization achieved a higher AUC value than other ML algorithms.
Tao, Chi et al. [44][66] proposed an ensemble binary classifier to analyze health survey data against ground truth from the SF-20 Quality of Life scales. With ensemble model (DT, AAN, KNN, SVM) applied on the NHANES dataset, the classifier demonstrated an F1 score of 0.976 in the prediction, without any incorrectly identified depression instances. This sthe neeudy has some limitations; the need to use rich online social media sources for feature extraction and dataset range is not defined. Karoly and Ruehlman [45][67] proposed an algorithm to distinguish between MDD and BD patients based on clinical variables. LR with Elastic Net and XGBoost were applied on 103 MDD and 52 BD patients and achieved an accuracy of 78% for LR with Elastic Net model. There are some limitations in this paper such as the small and unbalanced sample, lack of external sample validation, some misclassifications of classes, and a limited range of evaluation features.
Zhao, Feng et al. [68] evaluated the depression status of Chinese recruits using ML algorithms. NN, SVM, and DT were applied on 1000 participants and achieved 86, 86, and 73% accuracy for NN, SVM, and DT. BD-II was used as an assessment scale. This study needs to include complex socio-demographic and career variables into the model. Ji et al. [69] diagnosed bipolar disorder among Chinese by developing a BDCC using ML algorithms. SVR, RF, LASSO, LR, and LDA were applied on 255 MDD, 360 BPD, and 228 healthy sample data. The experiments obtained an accuracy of 92% for MDD and 92% for BPD detection. However, this model requires large datasets and needs to enhance its cross-sectional nature.
Video Production Service