The widespread use of CALL (computer-assisted language learning) systems attests to their success in helping people improve their language and speech skills. CALL is predominantly concerned with addressing pronunciation errors in non-native speakers’ speech. Accurate mispronunciation detection, voice recognition, and accurate pronunciation evaluation are all activities that may be accomplished with CALL.
No. | Arabic Letter | Phonetic Symbol |
---|---|---|
1 | س | /s/ |
2 | ر | /r/ |
3 | ق | /q/ |
4 | ج | /ʒ/ |
5 | ك | /k/ |
6 | خ | /x/ |
7 | غ | /ɣ/ |
8 | ض | /d/ |
9 | ح | /ḥ/ |
10 | ص | /Ṣ/ |
11 | ط | /ŧ/ |
12 | ظ | /∂/ |
13 | ذ | /ð/ |
Work | Classification Algorithm | Data Utilized | Performance Metrics | Results |
---|---|---|---|---|
Ye et al. [2] | Acoustic, Phonetic, and Linguistic Data Embedding | L2-ARCTIC database | Detection Accuracy, Diagnosis Error Rate, F-Measure | Accuracy: 9.93% DER: 10.13% F-measure: 6.17% |
Li et al. [3] | Acoustic-Graphemic Phonemic Model (AGPM) Using Multi-Distribution Deep Neural Networks (MD-DNNs) | Not specified | Phone Error Rate (PER), False Rejection Rate (FRR), False Acceptance Rate (FAR), Diagnostic Error Rate (DER) | PER: 11.1%, FRR: 4.6%, FAR: 30.5%, DER: 13.5% |
Shahin and Ahmed [4] | One-Class SVM, DNN Speech Attribute Detectors | WSJ0 and TIMIT standard datasets | False-Acceptance Rate, False-Rejection Rate | Lowered FAR and FRR by 26% and 39% compared to the GOP technique |
Arafa et al. [5] | Random Forest (RF) | 89 students’ Arabic phoneme utterances | Accuracy | 85.02% |
Shareef and Al-Irhayim [6] | LSTM and CNN-LSTM |
Not specified | Classification Accuracy | LSTM: 93%, CNN-LSTM: 91% |
In our proposed system, we emphasize the prominence of “speech signal processing” in diagnosing Arabic mispronunciation using the “Mel-Frequency Cepstral Coefficients” (MFCCs) as the optimum extracted features. In addition, Long Short-Term Memory (LSTM) has also been utilized for the classification process. Furthermore, the analytical framework has been incorporated with a gender recognition model to perform two-level classification. Our results show that the LSTM network significantly enhances mispronunciation detection along with gender recognition. The LSTM models have attained an average accuracy of 81.52% in the proposed system, reflecting a high performance compared to previous miss pronunciation detection systems.
This entry is adapted from the peer-reviewed paper 10.3390/info14070413