The widespread use of CALL (computer-assisted language learning) systems attests to their success in helping people improve their language and speech skills. CALL is predominantly concerned with addressing pronunciation errors in non-native speakers’ speech. Accurate mispronunciation detection, voice recognition, and accurate pronunciation evaluation are all activities that may be accomplished with CALL.
No. | Arabic Letter | Phonetic Symbol |
---|---|---|
1 | س | /s/ |
Work | Classification Algorithm | Data Utilized | Performance Metrics | Results | ||||
---|---|---|---|---|---|---|---|---|
Ye et al. | [2] | Acoustic, Phonetic, and Linguistic Data Embedding | L2-ARCTIC database | Detection Accuracy, Diagnosis Error Rate, F-Measure | Accuracy: 9.93% DER: 10.13% F-measure: 6.17% | |||
2 | ر | /r/ | ||||||
Li et al. | [3] | Acoustic-Graphemic Phonemic Model (AGPM) Using Multi-Distribution Deep Neural Networks (MD-DNNs) | Not specified | Phone Error Rate (PER), False Rejection Rate (FRR), False Acceptance Rate (FAR), Diagnostic Error Rate (DER) | PER: 11.1%, FRR: 4.6%, FAR: 30.5%, DER: 13.5% |
3 | ق | /q/ |
Shahin and Ahmed | [4] | One-Class SVM, DNN Speech Attribute Detectors | WSJ0 and TIMIT standard datasets | False-Acceptance Rate, False-Rejection Rate | Lowered FAR and FRR by 26% and 39% compared to the GOP technique | 4 | ج | /ʒ/ |
Arafa et al. | [5] | Random Forest (RF) | 89 students’ Arabic phoneme utterances | Accuracy | 85.02% | 5 | ك | /k/ |
Shareef and Al-Irhayim | [6] | LSTM and CNN-LSTM |
Not specified | Classification Accuracy | LSTM: 93%, | 6 | خ | /x/ |
7 | غ | /ɣ/ | ||||||
8 | ض | / | d | / | ||||
CNN-LSTM: 91% | 9 | ح | /ḥ/ | |||||
10 | ص | /Ṣ/ | ||||||
11 | ط | /ŧ/ | ||||||
12 | ظ | /∂/ | ||||||
13 | ذ | /ð/ |
In our proposed system, we emphasize the prominence of “speech signal processing” in diagnosing Arabic mispronunciation using the “Mel-Frequency Cepstral Coefficients” (MFCCs) as the optimum extracted features. In addition, Long Short-Term Memory (LSTM) has also been utilized for the classification process. Furthermore, the analytical framework has been incorporated with a gender recognition model to perform two-level classification. Our results show that the LSTM network significantly enhances mispronunciation detection along with gender recognition. The LSTM models have attained an average accuracy of 81.52% in the proposed system, reflecting a high performance compared to previous miss pronunciation detection systems.