1. The Rationale for Using Machine Learning to Identify Mental Disorders
Since the term “mental disorder” includes different categories with high heterogeneity, the researchers mainly use the most common category of mental disorders, depressive disorders, as an example in the present research. Previous studies have found differences between depressed patients and the healthy population. These can include biochemical indicators such as blood oxygen consumption in the brain
[1], neurotransmitters
[2], and EEG
[3], peripheral physiological signals
[4] such as heart rate, skin conductance, etc., and non-verbal behaviors
[5] such as facial expressions and voice features, the language used (verbal or textual), etc. The features that differ between the two groups (those with and without a depressive disorder) serve as basic AI classifiers. Therefore, by using these discriminant metrics as feature inputs for machine learning, it should be possible to train good predictive models for automated depressive disorder diagnosis.
Many studies have used nonverbal behaviors to predict depression, especially facial expressions
[6][7], which are the most salient of behaviors and are considered to accurately display mood (as depressive disorders are mood disorders). The researchers chose facial cues as an example for the present research, but the application of other cues can be found in certain survey articles
[1][8][9]. Facial expressions, usually categorized as expressing anger, sadness, joy, surprise, disgust, fear, etc., are regarded as discriminative cues for depressive disorder detection. Those diagnosed with a depressive disorder often demonstrate little expressiveness in their facial expressions
[10]. Gavrilescu et al. proposed the determination of depression levels by analyzing facial expressions via the Facial Action Coding System. The experiment obtained 87.2% accuracy for depression identification
[11]. Furthermore, the duration of spontaneous smiles
[12], smile intensity
[12][13], mouth animation
[14], and lack of smile
[15] have also been considered to offer valuable patterns for depressive disorder detection. In recent years, the use of facial expressions as cues for depression recognition has made great progress. Effective facial features can now even include pupil changes. For example, a recent study considered faster pupillary responses to represent a positive healthy control
[16]. Depressed subjects demonstrate slower pupil dilation responses in certain conditions
[17]. One study found that pupil bias and diameter were important for assessing the symptoms of depression
[18]. Features consist of reduced eye contact
[15], gaze direction
[13], eyelid activity, and eye movement and blinking
[19].
Single modal (such as only facial cues) depressive disorder recognition has also been found to yield positive results. Theoretically, multimodal data should be able to further enhance the effect, such as when voice and visual cues are combined as feature input, and the addition of physiological information should further enhance the accuracy of automatic diagnosis. Multimodality is a prominent direction of both algorithm development and database development.
Researchers use visual, acoustic, verbal, and physiological signals to make predictions, so what exactly are the characteristics of these diseases they are attempting to predict? What is the diagnostic process followed by clinicians? Do these issues pose challenges to AI that are different from those seen with other tasks?
The researchers use the example of depressive disorders to illustrate the challenges of using AI to diagnose mental illness. The DSM-5
[20] or ICD-11
[21] are the most authoritative diagnostic manuals available. However, even the DSM, the most widely used standard, is highly controversial. Next, the researchers will discuss the features of depressive disorders that may pose a barrier to using AI diagnosis.
2. Challenges from Diagnostic Criteria
First, many diagnostic indicators are based on subjective experiences or qualitative descriptions or are difficult to objectively quantify and standardize. Diagnostic criteria for depressive disorders are based on symptomatology, such as a depressed mood or sleep problems. Although many scholars pursue a physiological basis or biomarkers, there are no clinically useful diagnostic biomarkers that are able to absolutely confirm a diagnosis of major depressive disorder.
Second, individuals vary greatly in their presentation of symptoms (see
Figure 1). According to the DSM-5, the two most important core symptoms of major depressive disorder are (1) a depressed mood most of the day and/or (2) markedly diminished interest or pleasure. At least three or four more of the other seven need to be met to be diagnosed with a depressive disorder, meaning that depressive disorders themselves do not have consistent symptoms and vary greatly among individuals. The PHQ-9 (Patient Health Questionnaire-9) is an assessment of the nine criteria in the DSM-5. Another widely used scale, the HAMD, does not focus on typological symptoms (i.e., insomnia, low mood, agitation, anxiety, and reduced weight). In addition, differences in symptoms exist across developmental stages. For example, depressive symptoms in adolescents tend to manifest as irritability and not necessarily in a constant low mood. There exist, at fewest, 1497 unique profiles for depression
[22]. In some cases, patients with the same diagnosis may not share any identical symptoms
[23].
Figure 1. Depressive disorders vary greatly in their presentation of symptoms and are co-morbid with other disorders (such as generalized anxiety disorder and schizophrenia).
Third, depressive disorders comprise a collection of ailments with many subcategories and variants, such as disruptive mood dysregulation disorder, major depressive disorder, and persistent depressive disorder.
Fourth, co-morbidities are very common in mental disorders (see
Figure 1). Depressive disorders may be accompanied by anxiety and personality disorders and are often confused with bipolar or other mental disorders. Such illnesses may be very similar or identical to particular depressive disorders in terms of symptoms (e.g., sleep or appetite problems) but require differential diagnoses by clinicians. This issue often results in subjective bias
[24][25][26][27][28].
Fifth, the symptoms should lead to impaired social functioning according to DSM. They are expected to cause clinically significant distress or impairment in social, occupational, or other important areas of functioning that are culturally related.
Sixth, the symptoms are not static and not always displayed. Major depressive disorder is not continuous, but rather episodic. For example, some people feel more serious in the morning on one day and remain depressed for several weeks.
Seventh, depressive disorders manifest in interactions between genetic issues and environmental, physiological, and sociocultural factors. The pathogenesis of depressive disorders has not yet been unilaterally agreed upon. Depression is not just a neurophysiological problem. It depends on the interaction between a genetic predisposition and environmental factors
[29]. The combination of biological elements, family and environmental stressors, and personal vulnerabilities plays a vital role in affecting the onset of major depressive disorder
[30]. This makes the subjective experience of depression and the behavioral and speech characteristics of depressed individuals very different.
3. Challenges from Standard Diagnostic Approaches
Then, how do clinicians diagnose depressive disorders, considering the variability of such characteristics?
As described above regarding the qualities and diagnostic criteria of depressive disorders, there is no unique and efficient clinical set of indicators, making the diagnosis of depressive disorders time-consuming and inherently subjective
[31]. Routine assessments include self-rating scales and clinician-based interviews. Both such assessments are mainly based on the DSM and ICD. Self-rating scales are a simple and convenient way to assess depressive disorders; examples include the PHQ-9, Zung’s Self-rating Depression Scale, and the Beck Depression Self-Rating Scale. The results are most often used for screening and providing a reference for physicians’ diagnoses. Self-rating scales have been used widely in various studies, with specificity and sensitivity reaching up to 80% to 90%, though there are certain problems
[32]. In addition to self-rating assessments, other rated scales such as the Hamilton Rating Scale for Depression
[33] are often also used to assist clinicians’ diagnosis.
Clinical interviews are more professional and accurate but also more time-consuming and laborious. Doctors’ interview-based assessments comprise the final decision stage for diagnosis. Diagnosing depressive disorders can be complicated, depending not only on the educational background, cognitive ability, and honesty of the subject describing their symptoms but also on the experience and motivation of the clinician. Comprehensive information and thorough clinical training are needed to accurately diagnose the severity of depression
[34]. Some biological markers such as low serotonin levels
[35], neurotransmitter dysfunction
[36], and brain structure
[37] have been considered to be indicators of depression.
Depressive disorders are so complex that the diagnostic process must be considered holistically. Because depressive disorders are not just mood problems but also sociocultural in nature, they are often accompanied by a serious impairment of social functioning. This may explain why the misdiagnosis rate is high for clinicians. It requires people to rethink the current depression dataset and ask whether the samples are representative and qualified. Can the objective features recorded predict depressive disorders? Are the annotations valid?
4. Challenges from the Logical Fallacy of Mental Disorder Diagnosis
When a clinician diagnoses a person as having a depressive disorder, they rely on the symptoms reported by the client, such as a persistent low mood for two weeks and frequent suicidal thoughts. What is the cause of the persistent low mood? The usual answer is that the individual is suffering from a depressive disorder. Depressive disorders cause the corresponding symptoms, which is the premise of conditional reasoning: If p then q. Diagnosing disorders by symptoms requires the reasoning of affirming the consequent: If q then p. This is a logical fallacy (Table 1).
Depressive disorders are labels for sets of symptoms. This means that symptoms do not explain why a person has a depressive disorder, nor does the disorder explain why the symptoms occur. Therefore, it is implausible to identify whether a person has a depressive disorder from the symptoms they present. A person may be depressed because they have been experiencing negative stimuli for the past several weeks. If the negative stimuli disappear, so might the negative mood. On the other hand, they may be depressed because of another psychological disorder, such as a personality disorder.
The researchers are not yet sure of the relationship between depressive disorders and the set of symptoms, which causes significant problems with diagnosis. If AI tries to imitate clinicians, it will face the same challenge.