Child Handwritten Arabic Character Recognition

This entry is adapted from the peer-reviewed paper 10.3390/s23156774

Handwritten Arabic character recognition has received increasing research interest. However, as of yet, the majority of the existing handwriting recognition systems have only focused on adult handwriting. In contrast, there have not been many studies conducted on child handwriting, nor has it been regarded as a major research issue yet. Compared to adults’ handwriting, children’s handwriting is more challenging since it often has lower quality, higher variation, and larger distortions.

child handwriting handwritten character recognition writer-group classification

1. Introduction

Despite significant advances in technology, the textual compositions of many people are still handwritten ^[1]. Thus, using automated recognition techniques for handwritten data in many applications is crucial. These techniques convert handwritten data (e.g., texts, words, characters, or digits) into corresponding digital representations, which can be accurately processed offline, such as scanned handwritten documents, or online, such as handwriting data input via electronic pen tip ^[2]^[3]^[4]. Developing automatic handwriting recognition systems is a difficult task in computer vision due to the wide variety of handwriting sizes and styles, besides the characteristics of the language to be recognized ^[4]. Handwritten character recognition is one of the most challenging research fields in document image processing. Most investigations in this field have been conducted on different languages (e.g., English, French, and Chinese), but only a little work has been conducted on other languages like Arabic ^[3].

In recent years, handwritten Arabic character recognition has gained considerable research interest. This is due to the importance of the Arabic language, which is considered one of the five most widely spoken languages worldwide and used for reading and writing by hundreds of millions of people from hundreds of nations ^[5]. However, it is considered a challenging task in pattern recognition and computer vision, as it still requires significant effort to construct generalized systems capable of handling various recognition problems and achieving highly feasible accuracy ^[6]. These challenges are due to the unique characteristics of the Arabic script, e.g., cursive nature, the existence of diacritics and dots, diagonal strokes, different alternative character shapes in the middle of words, and many characteristics ^[2]^[3]^[5]. Moreover, there are high diversities in handwriting styles across individuals; even at the individual level, a person’s handwriting may change significantly or slightly every time, which may make it difficult for a system to recognize the letters from their own handwriting ^[4].

The majority of research on handwritten Arabic character recognition has focused on adult handwriting, as the findings have revealed the effectiveness of their systems in achieving accuracy rates of up to 99% using deep learning and machine learning techniques ^[7]^[8]^[9]^[10]^[11]^[12]^[13]^[14]^[15]. Furthermore, a few researchers have recently focused on children’s handwriting data for recognizing Arabic letters due to its great significance for many applications and different purposes ^[4]^[16]^[17]^[18]^[19]^[20]. Employing character recognition capabilities in child-related applications such as education ^[21], interactive learning, physical or mental health assessment, or other possible practical purposes is critical for many future research areas. However, it poses a further challenge due to many differences between the nature of children’s and adults’ handwriting in several different aspects, including generally being of lesser quality, having more variances, and having more considerable distortions ^[16].

Handwritten Arabic character recognition technologies have evolved rapidly and achieved progress dramatically using different algorithms, such as support vector machines (SVMs), k-nearest neighbor (KNN), artificial neural networks (ANNs), and, later, convolutional neural networks (CNNs). CNNs have recently outperformed machine learning (ML) techniques that require manually generated features, while CNNs automatically detect and extract distinctive and representative features from the analyzed images ^[18]. Furthermore, building handwriting recognition hybrid systems using CNNs as a feature extractor and ML algorithms as a classifier has yielded effective results in several handwritten Arabic character datasets ^[15]^[18].

For writer-group classification, Shin et al. ^[22] proposed a machine learning-based method to automatically classify individuals as adults or children based on their handwritten data, including Japanese scripts and drawn patterns. No similar research has focused on differentiating between children’s and adults’ handwriting for Arabic characters. Establishing this capability in this research could open new horizons for other research fields serving multiple purposes, such as fraud or forgery detection and prevention, recognizing and discriminating handwriting more accurately and working on improving skills, comprehending similarities and differences in ways of writing, and further estimating age groups. Finally, after analyzing handwriting, researchers propose some appropriate supplementary features that can be used along with the extracted deep features of the proposed CNN model to improve the accuracy of child and adult writer-group classification.

The main contributions of this study can be summarized as follows:

Developing an effective CNN model for recognizing children’s handwritten Arabic characters.
Investigating and analyzing the effect on child handwritten Arabic character recognition performance when training the proposed CNN model on a variety of datasets that either belong to children, adults, or both.
Examining the capability of the suggested CNN model to classify the writers of Arabic characters into two writer groups, either children or adults.
Suggesting some supplementary features that contribute to distinguishing between children’s and adults’ handwriting and augment the performance of the suggested CNN model.
Extended performance analysis, evaluation, and comparison of the extracted deep features learned by the proposed CNN model and the proposed supplementary features using SVM, KNN, RF, and Softmax classifiers.

2. Handwritten Arabic Character Recognition for Adult Writers

Most researchers have focused on adult handwriting in Arabic character recognition. In 2017, El-Sawy et al. ^[7] developed a novel CNN model that was trained and tested on their own dataset, AHCD, which contains 16,800 handwritten Arabic characters collected from 60 persons aged between 19 and 40 years and divided into 28 classes, where their model achieved an accuracy of 94.9%. Another research by Younis ^[8] introduced a deep model using CNN to recognize handwritten Arabic letters, and it was improved by applying multiple optimization strategies to avoid overfitting. The results demonstrated that their model could classify letters using two datasets, AIA9k and AHCD, achieving 94.8% and 97.6% accuracy, respectively. In 2021, another new handwritten Arabic character dataset named HMBD was introduced by Balaha et al. ^[9]. They also suggested two CNN-based architectures known as HMB1 and HMB2. They investigated the effect of changing the complexity of these architectures using overfitting reduction strategies on various datasets, including HMBD, AIA9k, and CMATER, to increase recognition accuracy. The uniform weight initializer and the AdaDelta optimizer scored the highest accuracies, where the performance was improved via data augmentation using the HMB1 model, achieving the top overall performance of 90.7%, 98.4%, and 97.3% on AIA9k, HMBD, and CMATER datasets, respectively.

In ^[10], De Sousa suggested VGG12 and REGU deep models for recognizing handwritten Arabic letters and numbers. Both models were trained twice, once with data augmentation and once without. Then, an ensemble of the four models was created by averaging the predictions of each model. The highest accuracy of their ensemble model was 98.42% for AHCD and 99.47% for MADbase. Boufenar et al. ^[11] also built a DCNN model similar to Alexnet architecture. They investigated the role of preprocessing data samples in enhancing their model performance using three learning strategies: training the model from scratch, utilizing a transfer-learning technique, and fine-tuning the CNN. Overall, their experimental findings revealed that the first technique outperformed the others, either way, with and without preprocessing, achieving an average of 100% and 99.98% accuracy on OIHACDB-40 and AHCD, respectively. Moreover, Ullah et al. ^[12] investigated the dropout technique’s effect on their built CNN model. They noticed a considerable difference in performance when the model was trained with and without dropout, indicating that dropout regularization could effectively prevent model overfitting. The model reported a test accuracy of 96.78% on the AHCD dataset using dropout. Alyahya et al. ^[13] studied how the ResNet-18 architecture could be effective in recognizing handwritten Arabic characters. They suggested four ensemble models: the first two were the original ResNet-18 and the updated ResNet-18, using one fully connected layer with or without a dropout layer. The last two models were the original ResNet-18 and the updated ResNet18, but they included two fully connected layers with or without a dropout layer. The original ResNet-18 model achieved the highest test score of 98.30% from other ensemble models on the AHCD dataset. In ^[14], a CNN model was developed to recognize Arabic letters written by hand. The model was trained and tested using an AHCD dataset. Their experiment has shown that the suggested method achieved a recognition rate of 97.2%. Meanwhile, once data augmentation techniques were used, their model’s accuracy rose to 97.7%. Ali et al. ^[15] designed a CNN-based SVM model with a dropout technique utilizing two deep neural networks and evaluated it on various datasets, including AHDB, AHCD, HACDB, and IFN/ENIT, for recognizing handwritten Arabic letters. The authors reported improved performance of the suggested model compared to previous models created for the same domain by obtaining the accuracies of 99%, 99.71%, 99.85%, and 98.58% on AHDB, AHCD, HACDB, and IFN/ENIT, respectively. Table 1 summarizes these handwritten Arabic character recognition studies using adults’ data.

Table 1. A summary of related work on handwritten Arabic character recognition for adult writers.

Ref.	Year	Feature Extractor	Classifier	Dataset	Type	Size	Accuracy
^[7]	2017	CNN	Softmax	AHCD	Characters	16,800	94.9%
^[8]	2017	CNN	Softmax	AIA9k AHCD	Characters Characters	9000 16,800	94.8% 97.6%
^[10]	2018	CNN	Softmax	AHCD MADbase	Characters Digits	16,800 70,000	98.42% 99.47%
^[11]	2018	CNN	Softmax	OIHACD AHCD	Characters Characters	30,000 16,800	100% 99.98%
^[9]	2020	CNN	Softmax	HMBD AIA9k CMATER	Characters Characters Digits	54,115 9000 3000	90.7% 98.4% 97.3%
^[13]	2020	CNN	Softmax	AHCD	Characters	16,800	98.30%
^[14]	2021	CNN	Softmax	AHCD	Characters	16,800	97.7%
^[15]	2021	CNN	SVM	AHDB AHCD HACDB IFN/ENIT	Words and Texts Characters Characters Words	15,084 16,800 6600 26,459	99% 99.71% 99.85% 98.58%
^[12]	2022	CNN	Softmax	AHCD	Characters	16,800	96.78%

3. Handwritten Arabic Character Recognition for Child Writers

A few efforts have been made to address the issue of children’s Arabic handwriting recognition. In 2020, unlike earlier research, Altwaijry et al. ^[4] concentrated on recognizing Arabic letters for children’s writing. They collected a new dataset named Hijja, consisting of 47,434 disconnected and connected Arabic characters written by children aged 7 to 12 years. They also developed a functional CNN-based model to study and evaluate its performance on their dataset. They compared the performance of their model with the model suggested in El-Sawy’s paper ^[7] on both datasets, Hijja and AHCD. According to the experiment findings, their model outperformed the other compared model, achieving an accuracy of 88% and 97% on the Hijja and AHCD datasets, respectively. Alkhateeb et al. ^[16] also proposed a deep learning-based system for recognizing handwritten Arabic letters using CNN and three separate datasets, AHCR, AHCD, and Hijja, to validate the proposed system. Based on their experimental results, the suggested approach achieved accuracies of 89.8%, 95.4%, and 92.5% on the AHCR, AHCD, and Hijja datasets, respectively. Another study proposed by Nayef et al. ^[17] discussed using CNN models to recognize handwritten Arabic characters with an improved Leaky-ReLU activation function. To evaluate the performance of their compared models, they used four datasets, AHCD, HIJJA, and MNIST, in addition to their own dataset containing 38,100 handwritten Arabic characters, categorized into 28 classes that were collected from elementary school students in grades one to three. The proposed CNN model with Leaky-ReLU optimization outperformed the other compared model of ^[8] with an accuracy of 99%, 95%, and 90% on AHCD, their dataset, and Hijja, respectively.

Alrobah et al. ^[18] employed a different approach, merging CNN deep-learning models for feature extraction with SVM and XGBoost machine-learning models for classification to build a hybrid model. They used the two CNN architectures presented in ^[9], namely HMB1 and HMB2. The study attained an accuracy of 96.3% using the HMB1 model and the SVM classifier on the Hijja dataset, highlighting their hybrid model’s efficiency. In 2022, Wagaa et al. ^[19] presented a new CNN architecture that achieved 98.48% and 91.24% accuracies on the AHCD and Hijja datasets, respectively, by applying rotation and shifting data augmentation techniques and using the Nadam optimizer. They also investigated the impact of mixing the two AHCD and Hijja datasets of handwritten Arabic characters in varying proportions on the model’s performance during the training and testing phases using different data augmentation approaches. Their results showed that using the Nadam optimizer together with rotation and shifting data augmentation techniques gave their highest test accuracy of 98.32% among other choices when mixed with 80% of AHCD and 20% of Hijja for training along with 20% of AHCD and 10% of Hijja for testing. Bouchriha et al. ^[20] also presented a novel CNN model for recognizing handwritten Arabic characters. They focused on unique characteristics of Arabic text, particularly the difference in the shape of letters according to their location in the word, and by using the Hijja dataset, they attained an accuracy of 95%. Table 2 summarizes these handwritten Arabic character recognition studies on children’s data.

Table 2. A summary of related work on handwritten Arabic character recognition for child writers.

Ref.	Year	Feature Extractor	Classifier	Dataset	Type	Size	Accuracy
^[4]	2020	CNN	Softmax	Hijja AHCD	Characters Characters	47,434 16,800	88% 97%
^[16]	2020	CNN	Softmax	AHCR AHCD Hijja	Characters Characters Characters	28,000 16,800 47,434	89.8% 95.4% 92.5%
^[17]	2021	CNN	Softmax	AHCD Proposed dataset Hijja MNIST	Characters Characters Characters Digits	16,800 38,100 47,434 70,000	99% 95.4% 90% 99%
^[18]	2021	CNN	Softmax SVM XGBoost	Hijja	Characters	47,434	89% 96.3% 95.7%
^[19]	2022	CNN	Softmax	AHCD Hijja	Characters Characters	16,800 47,434	98.48% 91.24%
^[20]	2022	CNN	Softmax	Hijja	Characters	47,434	95%

References

Albattah, W.; Albahli, S. Intelligent Arabic Handwriting Recognition Using Different Standalone and Hybrid CNN Architectures. Appl. Sci. 2022, 12, 10155.
Alrobah, N.; Albahli, S. Arabic Handwritten Recognition Using Deep Learning: A Survey. Arab. J. Sci. Eng. 2022, 47, 9943–9963.
Ali, A.A.A.; Suresha, M.; Ahmed, H.A.M. Survey on Arabic Handwritten Character Recognition. SN Comput. Sci. 2020, 1, 152.
Altwaijry, N.; Al-Turaiki, I. Arabic handwriting recognition system using convolutional neural network. Neural Comput. Appl. 2021, 33, 2249–2261.
Balaha, H.M.; Ali, H.A.; Badawy, M. Automatic recognition of handwritten Arabic characters: A comprehensive review. Neural Comput. Appl. 2020, 33, 3011–3034.
Ghanim, T.M.; Khalil, M.I.; Abbas, H.M. Comparative study on deep convolution neural networks DCNN-based offline Arabic handwriting recognition. IEEE Access 2020, 8, 95465–95482.
El-Sawy, A.; Loey, M.; EL-Bakry, H. Arabic handwritten characters recognition using convolutional neural network. WSEAS Trans. Comput. 2017, 5, 11–19.
Younis, K.S. Arabic handwritten character recognition based on deep convolutional neural networks. Jordanian J. Comput. Inf. Technol. 2017, 3, 186–200.
Balaha, H.M.; Ali, H.A.; Saraya, M.; Badawy, M. A new Arabic handwritten character recognition deep learning system (ahcr-dls). Neural Comput. Appl. 2021, 33, 6325–6367.
De Sousa, I.P. Convolutional ensembles for Arabic Handwritten Character and Digit Recognition. PeerJ Comput. Sci. 2018, 4, e167.
Boufenar, C.; Kerboua, A.; Batouche, M. Investigation on deep learning for off-line handwritten Arabic character recognition. Cogn. Syst. Res. 2018, 50, 180–195.
Ullah, Z.; Jamjoom, M. An intelligent approach for Arabic handwritten letter recognition using convolutional neural network. PeerJ Comput. Sci. 2022, 8, e995.
Alyahya, H.; Ismail, M.M.B.; Al-Salman, A. Deep ensemble neural networks for recognizing isolated Arabic handwritten characters. ACCENTS Trans. Image Process. Comput. Vis. 2020, 6, 68–79.
AlJarrah, M.N.; Zyout, M.M.; Duwairi, R. Arabic Handwritten Characters Recognition Using Convolutional Neural Network. In Proceedings of the 2021 12th International Conference on Information and Communication Systems (ICICS), Valencia, Spain, 24–26 May 2021; pp. 182–188.
Ali, A.A.A.; Mallaiah, S. Intelligent handwritten recognition using hybrid CNN architectures based-SVM classifier with dropout. J. King Saud Univ. Comput. Inf. Sci. 2021, 34, 3294–3300.
Alkhateeb, J.H. An Effective Deep Learning Approach for Improving Off-Line Arabic Handwritten Character Recognition. Int. J. Softw. Eng. Knowl. Eng. 2020, 6, 53–61.
Nayef, B.H.; Abdullah, S.N.H.S.; Sulaiman, R.; Alyasseri, Z.A.A. Optimized leaky relu for handwritten Arabic character recognition using convolution neural networks. Multimed. Tools Appl. 2021, 81, 2065–2094.
Alrobah, N.; Albahli, S. A Hybrid Deep Model for Recognizing Arabic Handwritten Characters. IEEE Access 2021, 9, 87058–87069.
Wagaa, N.; Kallel, H.; Mellouli, N. Improved Arabic Alphabet Characters Classification Using Convolutional Neural Networks (CNN). Comput. Intell. Neurosci. 2022, 2022, e9965426.
Bouchriha, L.; Zrigui, A.; Mansouri, S.; Berchech, S.; Omrani, S. Arabic Handwritten Character Recognition Based on Convolution Neural Networks. In Proceedings of the International Conference on Computational Collective Intelligence (ICCCI 2022), Hammamet, Tunisia, 28–30 September 2022; pp. 286–293.
Bin Durayhim, A.; Al-Ajlan, A.; Al-Turaiki, I.; Altwaijry, N. Towards Accurate Children’s Arabic Handwriting Recognition via Deep Learning. Appl. Sci. 2023, 13, 1692.
Shin, J.; Maniruzzaman, M.; Uchida, Y.; Hasan, M.A.M.; Megumi, A.; Suzuki, A.; Yasumura, A. Important features selection and classification of adult and child from handwriting using machine learning methods. Appl. Sci. 2022, 12, 5256.

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.

Upload a video for this entry

Information

Subjects: Computer Science, Artificial Intelligence

Contributors MDPI registered users' name will be linked to their SciProfiles pages. To register with us, please refer to https://encyclopedia.pub/register :

Maram Saleh Alwagdani

Emad Sami Jaha

View Times: 568

Update Date: 22 Aug 2023

Version	Summary	Created by	Modification	Content Size	Created at	Operation
1		Maram Saleh Alwagdani	--	2067	2023-08-22 07:01:50	\|
2	update references and layout	Rita Xu	-6 word(s)	2061	2023-08-22 07:31:48	\|

1. Introduction

2. Handwritten Arabic Character Recognition for Adult Writers

3. Handwritten Arabic Character Recognition for Child Writers

References

Video Upload Options

Confirm