Epilepsy is a nervous system disorder. Encephalography (EEG) is a generally utilized clinical approach for recording electrical activity in the brain. Although there are a number of datasets available, most of them are imbalanced due to the presence of fewer epileptic EEG signals compared with non-epileptic EEG signals.
1. Introduction
Epilepsy is a neurological disorder that affects children and adults. It can be characterized by sudden recurrent epileptic seizures
[1]. This seizure disorder is basically a temporary, brief disturbance in the electrical activity of a set of brain cells
[2]. The excessive electrical activity inside the networks of neurons in the brain will cause epileptic seizures
[3]. These seizures result in involuntary movements that may include part of the body (partial movement) or the whole body (generalized movement) and are sometimes accompanied by disturbances of sensation (involving hearing, vision, and taste), cognitive functions, mood, or may cause loss of consciousness
[2]. The frequency of seizures varies from patient to patient, ranging from less than once a year to several times a day. Active epilepsy patients have a mortality proportion of 4–5 times greater than seizure-free people
[4]. However, effective medical therapy that is individualized for each individual patient helps to lower the risk of mortality. Reduced mortality can be achieved by objectively quantifying both seizures and the response to therapy
[5].
The seizure detection modality uses an electroencephalogram (EEG)
[6]. Signals monitor the brain’s electrical activity through electrodes. An electrode is a small metal disc that attaches to the scalp to capture the brainwave activity through the EEG channel, which, depending upon the EEG recording system, can range from 1 channel to 256 channels. EEG signals are in the form of sinusoidal waves with different frequencies that neurophysiologists use to identify brain abnormalities. One major challenge that neurologists face is the presence of EEG signal artifacts. EEG signals overlapped with other internal and external bio-signals cause artifacts that mimic the EEG seizure signal and thus give false data. Some examples include eye movement, cardiogenic movement, muscle movement, or environmental noise
[7].
Table 1 illustrates the frequency bands of EEG signals with normal and abnormal tasks affecting each band. Neurophysiologists need to collect an extensive amount of long-term EEG signals in order to detect seizures through visual analysis of these signals in a time-consuming manual process.
Table 1. The frequency bands of EEG signals
[8].
There is a current, urgent need to develop a generalized automatic seizure detection system that provides precise seizure quantification, allowing neurophysiologists to objectively tailor treatment. Developing such a system is challenging because the available datasets are mostly imbalanced; the number of non-seizure EEG signals is larger than the number of EEG seizure signals in the datasets
[9]. This imbalanced dataset issue can have a major negative impact on classification performance
[10].
The research proposes a compatibility framework to integrate local EEG data from an epilepsy center at King Abdulaziz University hospital (KAU) with the CHB-MIT dataset
[11] to solve the problem of limited resources and imbalanced data. It also proposes an algorithm for reading XLtek EEG data, incorporated into the proposed framework, thus allowing researchers to analyze this type of EEG signal for which no auxiliary analytical tools are available in the dedicated packages. Finally, a deep-learning seizure-detection model based on selected EEG channels has been developed. The results show that the proposed method outperforms other models that rely on using a larger number of EEG channels to detect epileptic seizures.
The CHB-MIT dataset was chosen as it has the same type of scalp EEG recordings and annotations as the KAU local dataset. Additionally, the CHB-MIT has recordings from all parts of the brain that contain similar seizure types as those in the KAU dataset, such as clonic, tonic, and atonic seizures.
2. Epileptic Disorder Detection of Seizures Using EEG Signals
Many studies concentrate on intracranial brain signals, in which electrodes are placed inside the skull directly on the brain. Antoniades et al.
[12] used convolutional neural networks (CNN) applied with two convolutional layers on intracranial EEG data to extract the features of interictal epileptic discharge (IED) waveforms. The system divided the data into several 80 ms segments with 40 ms of overlap, and achieved a detection rate of 87.51%.
Birjandtalab et al.
[9] employed Fourier transform with deep neural networks (
DNN) to classify the signals by applying the transform first on the obtained alpha, beta, gamma, delta, and theta as well as on the individual windows in order to calculate the power spectrum density that measures the signal power as a function of frequency. Then, DNN based on multilayer perceptrons with only two hidden layers was used to classify the signals. To avoid the overfitting problem, a few hidden layers were applied. The system achieved an accuracy of 95%.
Seizure detection systems rely on the type of EEG data. Some of these systems detect epileptic seizures coming from only one channel, while others can detect epileptic seizures from multiple channels. ChannelAtt
[13] is a novel channel-aware attention framework that adopts fully connected multi-view learning to soft-select critical views from multivariate bio signals. This model implements a new technique that relies on global attention in the view domain rather than the time domain. The system achieved a 96.61% accuracy rate.
Some studies performed feature learning by training the deep-learning model directly on EEG signals. Ihsan Ullah et al.
[14] used a pyramidal 1D-CNN framework to reduce the amount of memory and the detection time. The final result used the voting approach for post-processing. To overcome the bottleneck of the requirement of training a huge amount of data, they performed data augmentation using overlapping windows. The system reached 99% accuracy.
Zabihi et al.
[15] developed a system that combines non-linear dynamics (NLD) and linear discriminant analysis (LDA) for extracting the features and introduced the concept of nullclines to extract the discriminant features. The system employs
artificial neural network (ANN) for classification. The yielded accuracy for the model was 95.11%. To mimic the real-world clinical situation, only 25% of the dataset was used for training. The results showed that the false negative rate was relatively high as a result of using a limited dataset for training. The sensitivity rates are considered too low for practical clinical use.
Likewise, Avcu et al.
[16] used a deep CNN algorithm on the EEG signals of 29 pediatric patients from KK Women’s and Children’s Hospital, Singapore. The researchers tried to minimize the number of channels in recorded EEG data to two channels only, Fp1 and Fp2. This data consists of 1037 min, of which only 25 min contain epileptic signals distributed over 120 seizure onsets. As seen, the data is not balanced. To overcome this problem, the researchers attempted to use various overlapping proportion techniques according to the seizures’ presence or absence by applying two shifting processes. The first one takes 5 s to create an interictal class (without overlapping). The second one takes 0.075 s to create an ictal class. These shifting processes were applied to balance the input data to the CNN. The system achieved an accuracy of 93.3%. However, the outcome of the data augmentation technique was not mentioned in this research.
Hu et al.
[17] used long-
short-term memory (LSTM) as it is efficient on both long-term and short-term dependencies in time series data. The authors developed the model using Bi-LSTM. The authors extracted and fed the network with seven linear features. The system was trained and tested on the Bonn University dataset, and it had a 98.56% accuracy. However, this reflects the accuracy of testing results, whereas the evaluation results were not mentioned herein.
Chandel et al.
[18] proposed a patient-specific algorithm that is based on wavelet-based features in order to detect onset-offset latency. The model operates by calculating statistical features such as mean,
entropy, and energy over the wavelet sub-bands and then classifying the EEG signals using a linear classifier. The developed algorithm achieved an average accuracy of 98.60%. The algorithm was tested on 14 out of 23 patients in the dataset. Although the algorithm is patient-specific, its performance degraded significantly for patient 7, who had a very short seizure duration compared with the remaining patients; the number of seizures for this patient was 10, with a total duration of 94 s. This means that the algorithm performs well if the duration of the seizure is long, but falls significantly if the seizure is short.
Kaziha et al.
[19] suggested using a model proposed in a previous study applied to the CHB-MIT dataset and tweaked to enhance performance. The model is based on five CNN layers, each of which is followed by a batch normalization and an average pooling layer, respectively. Finally, the model has three dense layers to detect the signal class. However, the performance chart of training and testing accuracy is an obvious indicator of the overfitting of a network, which can be seen from the sensitivity score. This is due to the imbalance of the dataset, as the number of epileptic signals is significantly lower than the number of non-epileptic signals, and therefore requires the use of a data augmentation scheme.
Huang et al.
[20] suggested a three-part hybrid framework. The first part extracts the hand-crafted features and converts them into sparse categorical features, while the second part is based on a neural network architecture with the original signals as input to extract the deep features. Both types of extracted features are combined in the third and final part of the model for classifying the EEG signals into seizure and non-seizure. The model achieved a sensitivity score of 90.97%. It should be noted that the idea of the hybrid framework may achieve higher results if it enhances the output of the first part of the model, which are the features manually extracted from the signals. This is accomplished by using one of the feature-importance methods. A tree-based model is implemented to infer the importance score of each feature based on the decision rules (or ensembles of trees such as random forest) of the model.
Jeong et al.
[21] implemented an attention-based deep-neural network to detect seizures. The model is divided into three modules; the first module extracts the spatial features, while the second module extracts the spatio-temporal features. The third module is the attention mechanism for capturing the representations that take into account the interactions among several variables at each point in time. The accuracy of the model is 89% and the sensitivity is 94%. However, based on the performance metrics of the model, the percentage of false negatives (FN), that is, the number of seizure signals that were detected as non-seizure, was low, which is reflected in the high sensitivity score. In contrast, the overall accuracy of the model was significantly lower compared with the sensitivity score, which means that the number of false positives (FP) was high. FP counts the number of non-seizure signals that were detected as seizures. Consequently, the model focused on extracting the features that would clearly distinguish the seizure class while not taking into consideration extracting the discriminative features for the non-seizure class as well. The overall performance of the model was affected.
Table 2 summarizes all the above-mentioned studies in this section.
Table 2. EEG-based epileptic seizure detection systems using deep-learning approaches.