Autism spectrum disorder (ASD) is a neurological disorder that severely impairs the communication skills necessary for regular living. Most people with autism have mild difficulties but occasionally severe ones that necessitate specialized care. As a result of their difficulties communicating with others, people with ASD often struggle in social situations. Most of the neurophysiological symptoms of ASD are known to medical professionals, but no definitive biosignature or pathological technique can diagnose autism at any time
[1]. Despite the absence of a specific treatment protocol, receiving a diagnosis at a very early age can improve outcomes significantly. Children with ASD may have a better chance of improving their socializing skills in early childhood with proper intervention due to greater flexibility in brain development at this age. Scientific evidence suggests that children who receive medical care before age four have a higher average IQ than those who wait until they are older
[2]. Despite these efforts, a new study estimates that only 34% of children with ASD are identified by the age of three in the United States. However, the proportion is substantially lower in underdeveloped nations
[3]. Currently, there is no particular treatment protocol for ASD. However, specialists have carefully explored several intervention techniques to minimize symptoms, enhance cognitive capacity, and improve daily living skills. Early and precise identification of ASD is essential for successfully implementing various intervention modalities. The conventional interview-based diagnosis methods, the Autism Diagnostic Observation Schedule (ADOS) and the Autism Diagnostic Interview-Revised (ADI-R), have been considered a golden standard in this regard
[4]. These methods primarily depend on the skilled physicians and the precision of the information provided by patients’ attendants or the parents. Although highly dependable, human bias may reduce the accuracy of these procedures. Recent advances in artificial intelligence have prompted the desire to implement it in this advanced medical diagnosis system. AI can improve the accuracy and efficiency of medical diagnoses by providing doctors with valuable information and insights that can aid in their decision-making processes
[5].
2. Deep Learning-Based Method for ASD Detection
Recent works in the literature have demonstrated that methods based on deep learning can play a significant role in the diagnosis of ASD. The use of neuroimaging data is one of the most investigated methods for diagnosing ASD in recent studies, as compared to interview-based methods, which are considered the gold standard. Structural MRI is one modality of neuroimaging data, while functional neuroimaging consists of electroencephalography (EEG). Both images are used to train various deep neural networks, such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), autoencoders (AEs), generative adversarial networks (GANs), etc.
[6]. The fusion of neuroimaging data from both modalities with algorithmic deep features makes the detection of ASD more robust and accurate
[7]. While neuroimaging offers higher specificity and relevance for autism spectrum disorder (ASD) detection, it is expensive and time-consuming for patients to acquire the necessary images. A second method for diagnosing ASD is based on a dataset of behavioral characteristics, including capturing special activity (video)
[8], eye gaze pattern
[9], subsequently analyzing speech pattern
[10] and handwriting
[11] and so on. All of these methods initiated by behavioral datasets necessitate a substantial amount of time and extensive pre-processing steps. Rather, another very promising ASD diagnosis technique is analyzing facial features
[12] using deep learning. This approach avoids causing any discomfort to children as a result of lengthy medical protocols, is devoid of human prejudice, and is inexpensive, which could potentially provide a more objective and efficient method compared to current diagnostic practices. However, the accuracy of this method is still under investigation, and more research is needed to validate its efficacy.
3. Deep Methods for ASD Diagnosis by Facial Image
Early screening for ASD from facial images can greatly benefit using convolutional neural network (CNN) models with a transfer learning approach
[13]. The advantage of transfer learning is that it allows a machine learning model to leverage knowledge from a pre-trained model and apply it to a different but related problem. This can save time and resources and improve performance compared to training a model from scratch
[14]. This approach avoids causing unwarranted injury to children due to lengthy medical protocols, is devoid of human prejudice, and is inexpensive. This method aims to automate the diagnosis process by analyzing facial features from individuals’ images or videos, which could provide a more objective and efficient method compared to current diagnostic practices. However, the accuracy of this method is still under investigation, and more research is needed to validate its efficacy. Recently, excellent progress has been made in screening ASD from facial images. Mohammad-Parsa et al. (2022) demonstrated the first amazing result by using the same MobileNet model to obtain 94.6% prediction accuracy in autism identification with a cleaned dataset
[15]. Later, Zayed A. T. Ahmed et al. (2022) concentrated on the same study and obtained 95% accuracy with the MobileNet model
[16]. B. R. G. Elshoky et al. (2022) analyzed their performance using shallow ML and deep neural networks before implementing the automated program TPOT AutoML to achieve a classification accuracy of 96.6%
[17]. Taher M. Ghazal et al. (2022) used a modified version of Alexnet to create their own ASD detection model, ASDDTLA, which showed an accuracy of just 87.7%
[18]. M. S. Alam et al. (2022) conducted a systematic ablation study to tune the optimizers and hyperparameters and, utilizing Xception and the optimal parameter set, reported a maximum accuracy of 95%
[12]. In 2023, both Narinder Kaur et al. (2023) and M. Ikermane et al. (2023) conducted identical research with accuracies of 70% and 98%, respectively
[19][20]. In every research, CNN-based models were used to extract features from the photos in the Kaggle ASD
[21][22] dataset. They were pre-trained on the ImageNet dataset containing 14 million images divided into 1000 categories.
Table 1 illustrates the relatively latest research on the diagnosis of ASD using the transfer learning approach by the Kaggle ASD dataset, which consists of facial photos of youngsters. All the prior researchers concentrated principally on the model-centric approach. They focused mostly on fine-tuning CNN models with an optimal set of hyperparameters. Not a single study could be explained in terms of particular features of facial traits causing ASD or other observational factors. However, the success of AI relies heavily on optimal training
[23], and the quantity of high-quality, categorized datasets is a crucial factor in this regard. Industry experts expect that the most significant restriction of AI, the lack of high-quality data, will become increasingly apparent
[24]. In order for machine learning to be effective, there must be a vast and varied dataset to analyze, and there comes the need for a data-centric approach.
Table 1. Recent research on CNN-based transfer learning algorithms for diagnosing autism spectrum disorders.
4. Data Augmentation in Medical Image Analysis
To work with medical image-based datasets, researchers face a few obstacles when training deep neural network models. The availability of annotated medical images is limited, and collecting these data is expensive and time consuming. In contrast, the images from different sources vary in terms of acquisition protocols, image modalities, and image resolutions, making it challenging to standardize the data
[25]. Class imbalance is another challenge, where one class dominates over the other, leading to a bias in the model
[26]. Thus, deep learning (DL) models are prone to overfitting when trained on small datasets
[27]. In this context, data pre-processing can be highly useful for minimizing noise and achieving a uniform image dataset size. On the other hand, augmentation techniques can assist in overcoming practically all of the aforementioned obstacles by adding new samples to the dataset
[28]. Medical data augmentation refers to the process of synthesizing additional training samples from existing data to increase the size of the dataset. This technique is widely used in medical imaging to overcome the limitations of small, unbalanced, and annotated datasets
[29]. One popular method of data augmentation in medical imaging is image transformations. This involves applying various geometric and intensity transformations to the original image to generate new samples. Some commonly used image transformations include rotations, translations, flips, and scaling
[30]. Another approach to medical data augmentation is data synthesization, which involves creating new data samples by combining or altering existing data. These methods can increase the dataset’s size and improve the model’s robustness by exposing it to variations in the data
[31].
A growing body of literature demonstrates the effectiveness of dataset pre-processing and augmentation. In medical image classification, for example, researchers have found that augmenting the training dataset with random transformations can substantially improve accuracy and stability. For example, Deepak et al. 2020
[32] applied data augmentation to MRI images to detect brain tumors; after augmentation, the CNN classifier’s detection accuracy increased by 6.7%. Ju et al. 2021
[33] utilized the generative adversarial network (CycleGAN) model on the UWF fundus image dataset. It demonstrated an improvement of 2.87% for precision and 4.85% for F1-score on diabetic retinopathy (DR) classification, lesion detection, and tessellated fundus segmentation after augmentation. By synthetic image augmentation technique using X-rays, D. Srivastav et al. (2021) improved the prediction accuracy of COVID-19 pneumonia detection by 3.2%
[34].