1. Background
It has been established that pain represents a significant issue in the realm of public health within the United States. However, its subjective nature is a challenge for effective pain management
[1][2][3]. The International Society for the Study of Pain (IASP) defines pain as “an unpleasant sensory and emotional experience associated with, or resembling that associated with, actual or potential tissue damage”
[4][5][6]. The previous definition was revised in light of the many variables that affect how intense one’s painful episodes are, including past painful experiences, cultural and social contexts, individual pain tolerance, gender, age, and mental or emotional state
[2][7].
Despite a thorough understanding of the pathophysiological processes underlying the physical pain response and the technological advances to date, pain is often poorly managed. Misdiagnosis of pain levels through associated subjective biases, as it is currently performed, can increase unnecessary costs and risks. In addition, poor pain relief can result in emotional distress and is linked to several consequences, including chronic pain
[8][9][10].
Self-reports and observations form the basis of well-defined instruments for measuring pain and pain-related factors. Visual analog scales (VAS), the McGill Pain Questionnaire, and the numeric rating scales (NRS) are examples of patient-reported outcome measures (PROs, self-reports, or PROMs), which are frequently considered “the gold standard” for measuring acute and chronic pain. As these strategies depend on patients’ accounts, they are exclusively applicable to individuals with no verbal or cognitive impairments
[1][6][11][12]. Because of this, all of these established techniques are inapplicable to newborns, delirious, sedated, and ventilated patients, or individuals with dementia or developmental and intellectual disorders
[13]. Such patients are entirely dependent on others’ awareness of nonverbal pain cues. In light of this, observational pain scales are advised for use among adults in critical conditions when self-reported pain measurements are not feasible. Even so, the reliability and validity of these instruments are still limited, because even qualified raters cannot ensure an unbiased judgment
[5][14][15][16][17][18][19].
Automatic pain recognition has transitioned from being a theory to becoming a highly important area of study in the last decade
[10]. As a result, a few studies have concentrated on utilizing artificial intelligence (AI) to identify or classify pain levels using audio inputs.
2. Artificial I ntelligence Techniques Used in Pain Detection
Artificial intelligence (AI) describes a system’s ability to mimic human behavior and exhibit intelligence, which is now viewed as a branch of engineering. The adoption of low-cost, high-performance graphics processing units (GPUs) and tensor processing units (TPUs), quicker cloud computing platforms with a high digital data storage capacity, and model training with cost-effective infrastructure for AI applications have all contributed to AI’s unprecedented processing power
[20][21][22][23].
There are two main categories of AI applications in the field of medicine: physical and virtual. The virtual subfield involves machine learning (ML), natural language processing (NLP), and deep learning (DL). On the other hand, the physical subfield of AI in medicine involves medical equipment and care bots (intelligent robots), which assist in delivering medical care
[24][25][26][27].
Because of their cutting-edge performance in tasks such as image classification, DL algorithms have become increasingly prominent in the past ten years. The ability of a machine to comprehend text and voice is described as NLP. NLP has many practical applications, including speech recognition and sentiment analysis.
Machine learning algorithms may be classified into three categories: unsupervised (capability to recognize patterns), supervised (classification and prediction algorithms based on previous data), and reinforcement learning (the use of reward and punishment patterns to build a practical plan for a specific problem space). ML has been applied to a variety of medical fields, including pain management
[25][27][28].
To establish an automated pain evaluation system, it is essential to document the pertinent input data channels. Modality is the term used to describe the behavioral or physiological sources of information. The principal behavioral modalities are auditory, body language, tactile, and facial expressions
[29][30].
The availability of a few databases with precise and representative data linked to pain has allowed for recent developments in the field of automatic pain assessment
[31], with most works focusing on the modeling of facial expressions
[7].
Numerous databases have been established by pooling data from diverse modalities acquired from distinct cohorts of healthy individuals and patients. Most studies use the following publicly available databases: UNBC-McMaster, BioVid Heat Pain, MIntPAIN, iCOPE, iCOPEvid, NPAD-I, APN-db, EmoPain, Emobase 2010, SenseEmotion, and X-ITE.
The BioVid Heat Pain Database is the second most frequently used dataset for pain detection after the UNBC-McMaster Shoulder Pain Archive Database. In the first case, video recordings of 90 healthy people under the influence of heat pain applied to the forearm are collected. The second is made up of 200 videos of participants’ faces as they experience pain through physical manipulation of the shoulder
[29][32][33][34].
Voice, meanwhile, has thus far received little consideration. Out of all the publicly available databases, BioVid Heat Pain, SenseEmotion, X-ITE, and Emobase use audio as one of their modalities. To fully leverage the potential of these databases, it is imperative to employ deep learning algorithms. Systems built on deep learning operate in two stages: training and inference. During training, the system is presented with a large dataset so as to teach it to recognize patterns and make predictions. Then, the trained model is used for inference, creating predictions based on the new data
[29][32][33][35].
3. Artificial Intelligence Models Used in Pain Detection from Voice
Multiple artificial neurons are merged to create an artificial neural network (ANN). These artificial neurons imitate the behavior and structure of biological neurons. Furthermore, to enhance the efficiency and precision of the ANN, the neurons are organized into layers for ease of manipulation and precise mathematical representation.
The operation of artificial neurons is governed by three fundamental principles: multiplication, summation, and activation. Initially, each input value is multiplied by a distinctive weight. Subsequently, a summation function aggregates all the weighted inputs. Finally, at the output layer, a transfer function transmits the sum of the previous inputs and bias
[36][37].
The topology of an ANN refers to the way in which different artificial neurons are coupled with one another
[36]. Different ANN topographies are appropriate for addressing different issues.
The most critical topologies are as follows:
- (1)
-
Recurrent Artificial Neural Networks (RNNs).
-
- (2)
-
Feed-Forward Artificial Neural Networks (FNNs).
-
- (3)
-
Convolutional Neural Networks (CNNs): These use multiple layers to automatically learn features from the input data.
-
- (4)
-
Long Short-Term Memory (LSTMs): It can handle vanishing and exploding gradients, which are common problems in the training of RNNs.
-
- (5)
-
Multitask Neural Network (MT-NN): This employs the sharing of representations across associated tasks to yield a more advanced generalization model.
-
Bi-directional Artificial Neural Networks (Bi-ANN) and Self-Organizing Maps (SOM) can be named as other types
[36][37][38].
An artificial neural network can solve a problem once its topology has been selected and it has been tuned and learned the right behavior
[36][37].
Lately, there has been significant focus on artificial intelligence algorithms in sound event classification and voice recognition, as shown in Figure 1.
Figure 1.
Artificial neural network mechanism of action on pain-induced vocalization.
Keyword spotting (KWS), wake-up word (WUW), and speech command recognition (SCR) are three essential techniques in speech processing that enable machines to recognize spoken words and respond accordingly.
Keyword identification technology is an automated approach to recognizing specific keywords within an uninterrupted spoken language and vocalization flow.
KWS systems are less reliant on high-quality audio inputs. They are created to be cheap and flexible and to run accurately and reliably on low-resource gadgets such as embedded edge devices
[39][40][41][42].
Researchers have begun to create algorithms in order to automate pain level assessment using speech due to developments in signal processing and machine learning methods. As an example, Tsai et al.
[14] employed bottleneck LSTM to detect pain based on a subset of the triage dataset, specifically prosodic signals. Later, Li et al.
[7] introduced age and gender factors into a variational acoustic model.