Alzheimer’s disease (AD) is a type of dementia that is more likely to occur as people age. It currently has no known cure. As the world’s population is aging quickly, early screening for AD has become increasingly important. Traditional screening methods such as brain scans or psychiatric tests are stressful and costly. The patients are likely to feel reluctant to such screenings and fail to receive timely intervention. While researchers have been exploring the use of language in dementia detection, less attention has been given to face-related features.
1. Introduction
Dementia is a debilitating and progressive neurodegenerative disorder that affects an individual’s cognitive abilities and daily functioning. It is a common condition among older adults, with an estimated 55 million people living with dementia worldwide and 10 million new cases every year [
1]. Dementia is characterized by a decline in cognitive function, such as memory loss, language impairment, disorientation, and difficulties with executive functions (such as planning, organizing, and decision-making). As the disease progresses, individuals with dementia may also experience behavioral and psychological symptoms such as depression, anxiety, agitation, and aggression [
2]. As a contested term, dementia relates to a group of diseases instead of a specific type of disease [
3]. By now, it has more than 200 subtypes that do not share the same path or process [
4]. The most common cause of dementia is Alzheimer’s disease (AD), which accounts for approximately 60–70% of cases [
1]. Other types of dementia include vascular dementia, Lewy body dementia, frontotemporal dementia, mixed dementia, etc.
Typically, AD progresses along three stages: early, middle, and late, which sometimes are also referred to as mild, moderate, and severe. In the early stage of AD, people can live independently but may suffer symptoms such as subjective memory loss, and forgetting familiar words or locations. The middle stage of AD usually lasts for years and maybe the longest stage during the progression, during which the patient will require heavier care from others. They may experience forgetting events or personal history, being confused about what day it is or where they are, etc. The symptoms become severe in the late stage, and patients gradually become unable to respond to the environment, have a conversation, or even control their movements [
5].
Age is widely recognized as the most significant risk factor for dementia. The world population is growing older at an unprecedented rate. According to the United Nations, the global population aged 65 years or over is projected to reach 1.5 billion by 2050, up from 703 million in 2019. This represents a significant demographic shift, with older adults expected to account for more than 20% of the world’s population by mid-century [
6]. In some countries, such as Japan and Italy, the aging of the population is even more pronounced, with over a quarter of the population aged 65 or over.
According to the Alzheimer’s Association, 1 in 9 people aged 65 or older has dementia. As of 2023, 6.7 million Americans are suffering from dementia, with 73% of them aged 75 or older. As the elderly population in the United States continues to increase, there will be a corresponding rise in both the number and percentage of individuals who suffer from Alzheimer’s or other types of dementia. Without the discovery of medical advancements to prevent or treat AD, it is estimated that the number of people aged 65 and older with AD could potentially reach 12.7 million by the year 2050 [
7]. The World Health Organization (WHO) reported, in a study [
8] where the status of subjects was tracked over years, that the annual incidence of dementia is between 10 and 15 cases per thousand people. Patients who developed dementia have an average of 7 years of life expectancy, and no more than 3% of them will live longer than 14 years. This problem is compounded by the fact that current medicine has no effective treatment to cure AD [
9]. That being said, certain lifestyle changes and treatments can slow down the progression of AD and improve the quality of life for those living with AD [
10]. Early diagnosis of dementia allows patients to receive appropriate treatments and suggestions to slow down or even halt the progression of the disease. By developing an efficient early detection system for dementia, many patients’ life quality can be significantly improved and even many lives can be saved. These facts urge institutions and researchers to pay attention to the development of methods for the prevention and early detection of dementia.
Traditional methods for diagnosing dementia, such as brain scans and psychiatric tests, face significant limitations that prevent their mass deployment. Firstly, these methods are resource-intensive and costly, requiring specialized equipment, trained personnel, and extensive time commitments. As a result, it becomes challenging to scale up these diagnostic procedures to reach a larger population in need. Additionally, the invasive nature of brain scans and the comprehensive nature of psychiatric evaluations can deter individuals from willingly participating in these tests, leading to low uptake and potential delays in diagnosis. Furthermore, the expertise required to interpret and analyze the results of these tests is often concentrated in specialized healthcare settings, limiting access to diagnosis for individuals in remote or underserved areas. Given these factors, the widespread adoption of traditional diagnostic methods for dementia becomes impractical and underscores the need for more accessible and efficient alternatives.
A cost-effective and scalable screening method is needed to detect subtle indicators of dementia, such as subjective memory loss, Mild Cognitive Impairment (MCI), and AD. MCI is a condition characterized by cognitive changes that are noticeable and measurable but not severe enough to meet the criteria for a diagnosis of dementia. It is often considered an intermediate stage between normal aging and dementia. Language is an important indicator in the detection of dementia because AD significantly impairs patients’ language abilities. These impairments are easily revealed in certain tasks, such as describing pictures. In picture description tasks, dementia patients and healthy individuals describe the same image differently. Dementia patients show more grammar mistakes, use shorter sentences, and struggle with word finding and sentence organization. They may have difficulty understanding the connections between events in the picture, indicating the involvement of both visual and language abilities in accurate descriptions. As a result, many speech-based dementia detection methods have been proposed [
11].
Compared to the extensive research on speech and language-based dementia detection, facial features have received much less attention, even though facial features have proven to be effective indicators in dementia detection [
12]. Previous works [
12,
13] have made some explorations on dementia detection using facial features or facial images. However, these works only perform experiments with raw facial images or some facial features. The dataset used in their experiments is relatively small and limited, which could also introduce bias.
2. The Potential of Facial Features in Dementia Detection
In dementia, brain regions such as the hippocampus (involved in memory), language centers, frontal lobes, and communication pathways can be damaged, leading to language and memory problems. Many researchers have paid attention to speech data, and many methods were developed to distinguish dementia patients from healthy subjects [
14,
15,
16,
17,
18,
19]. These methods mainly used natural language processing techniques to evaluate the data. Different features were extracted and different types of machine learning models were proposed to perform classification on the text data, with labels indicating whether the text samples came from a dementia patient or a healthy subject. Weiner et al. [
14] compared two feature extraction pipelines for dementia detection. The first pipeline involves manual transcriptions, while the second pipeline uses transcriptions created by automatic speech recognition (ASR). According to the study, the transcription quality of the ASR system is a dependable feature for detecting dementia on its own. Moreover, the features extracted from the automatic transcriptions exhibit comparable or slightly superior performance when compared to those derived from manual transcriptions. Mirheidari et al. [
15] investigated how word vector representation models perform in detecting signs of dementia. Motivated by the fact that dementia patients suffer from impairment in accurately expressing something, they analyzed conversations designed to test examinees’ long-term and short-term memory. Three methods were proposed to show the potential of word vectors in a classification task. Their study concluded that it is possible to detect signs of dementia with a speech recognizer, even though the recognition result has a high error rate. Zhu et al. [
16] found that transfer learning models utilizing text data outperform those using audio data, likely due to the high similarity between the pre-training text dataset and the Cookie-theft picture text dataset. Multi-modal transfer learning shows a slight improvement in accuracy, indicating that audio and text data provide limited complementary information. However, multi-task transfer learning results in limited improvements in classification and a negative impact on regression. They also identified that inconsistencies between AD/non-AD labels and Mini-Mental State Examination (MMSE) scores can limit the performance of multi-task learning. Mahajan et al. [
17] re-implemented NLP methods that utilized Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) architectures and specific conversational features to diagnose AD. They examined why these models had lower accuracy on the ADReSS [
20] dataset than the DementiaBank [
21] dataset. Additionally, they created a deep learning-based solution using Recurrent Neural Networks (RNNs) that performed binary classification of AD from natural speech. Their method combined acoustic features using Speech-GRU, improving accuracy by 2% over the acoustic baseline. Furthermore, when enriched with some targeted features, their method outperformed acoustic baselines by 6.25%. They proposed a bi-modal approach to AD classification and discussed the potential benefits of their method. Shibata et al. [
18] adapted and applied the idea density to the Japanese language. They defined several rules to count the idea density in Japanese speech and proposed a method to evaluate the idea density using machine translation. They proposed different pipelines for estimating the idea density, and compared the performance of different idea density calculation methods. They concluded that these findings prove that it is feasible to apply the method in the area of dementia detection. Santander-Cruz et al. [
19] proposed a method in which they extracted 17 kinds of features, which represented syntactic, semantic, lexical, and demographic information of the speech samples collected in the DementiaBank Pitt Corpus dataset. The relevance of the 17 features was quantified by calculating a mutual information score that can represent the dependency between the extracted features and the MMSE score. They concluded that their methodology is superior to the methods based on syntactic information or the BERT method with only linguistic features. Farzana et al. [
22] evaluated how removing the disfluency in the annotated result of an automatic speech recognition system affects the performance of dementia detection. They used an existing tool that detects and tags disfluency in transcriptions of speech. Through a set of experiments, they found that removing the disfluency has a bad influence on dementia detection performance, which reduces the detection accuracy. Ilias et al. [
23] proposed a novel method that combined two modalities such as speech and transcripts into one model based on the vision transformer. They used the gated multimodal unit to control how much influence each modality had on the final classification. They also used crossmodal attention that learned the relationships between modalities in an effective way. They tested their work on the ADReSS Challenge dataset and proved their model’s superiority over existing methods.
The above-mentioned most recent studies concentrate on the use of speech and text data to detect dementia. However, in some past studies [
24,
25] which analyzed the facial expressions of dementia patients, some conclusions mentioned that some deviations happen among dementia patients. The different displays of facial expressions have the potential to detect dementia. Asplund et al. [
24] showed that patients with some types of dementia tend to show fewer facial expressions. They analyzed the ability to produce expressions using the Facial Action Coding System (FACS) under pleasant and unpleasant stimulus conditions, while, in their later study [
25], they compared two methods developed for interpreting the facial expressions of demented people. In this research, disagreement between the two methods was reported, and demented people had less clarity or a lower amount of facial cues. Based on the findings of this research, It can be noticed that the facial expressions of demented people deviate, which may function as indicators in dementia detection.
In recent years, some works [
12,
13,
26] collected facial expression data from dementia patients and explored the potential of using facial expressions to detect dementia. Tanaka et al. [
12] collected human-agent interaction data of spoken dialogues from 24 participants. They extracted facial features from these data and used L1-regularized logistic regression to classify dementia patients and healthy subjects. Their research identified several features, including Action Units (AUs), eye gaze, and lip activity, as contributing to the classification. They also identified the importance of each feature in their L1 regularized logistic regression. The majority of the important features are AU-derived features. However, their methods do not apply to natural free conversation, nor did they evaluate their methods on a large dataset. Umeda et al. [
13] examined whether artificial intelligence could distinguish between the faces of people with and without cognitive impairment. They achieved a classification accuracy of 92.56%. Their study showed that it is possible for deep learning to distinguish the faces of dementia patients from those of people without dementia. Although their study achieved high accuracy, they used facial images for the study, and institutional bias was reported [
13]. Institutional bias can mislead the machine learning model so that the model learns environmental information instead of the dementia indicators.
Some papers have analyzed the expressions of dementia patients under controlled conditions. However, they did not apply them to the detection of dementia: Liu et al. [
26] investigated how different sound interventions affect the facial expressions of older people with dementia. They showed the participants different sound pieces and analyzed their emotions. Jiang et al. [
27] used a computer vision-based deep learning model to analyze facial emotions expressed during a passive viewing memory test in people with cognitive impairment caused by AD and other causes. Their results showed evidence that facial emotions are different in patients with cognitive impairment. Specifically, patients with cognitive impairment showed more negative emotional expressions and more facial expressiveness than healthy controls. They concluded that facial emotions could be a practical tool for screening patients with cognitive impairment.
This entry is adapted from the peer-reviewed paper 10.3390/bioengineering10070862