Pap Smear Images Classification with Machine Learning: History
Please note this is an old version of this entry, which may differ significantly from the current revision.
Contributor: , , , , , ,

Cervical cancer is regularly diagnosed in women all over the world. This cancer is the seventh most frequent cancer globally and the fourth most prevalent cancer among women. Automated and higher accuracy of cervical cancer classification methods are needed for the early diagnosis of cancer. In addition, this study has proved that routine Pap smears could enhance clinical outcomes by facilitating the early diagnosis of cervical cancer. Liquid-based cytology (LBC)/Pap smears for advanced cervical screening is a highly effective precancerous cell detection technology based on cell image analysis, where cells are classed as normal or abnormal. Computer-aided systems in medical imaging have benefited greatly from extraordinary developments in artificial intelligence (AI) technology.

  • cervical cancer
  • cell classification
  • review

1. Introduction

Cervical cancer is a major disease that seriously threatens women’s health [1,2]. This cancer is also known as the second most commonly affected and killer type of cancer among women around the world [3]. It results from a chronic infection of the skin and mucosal cells in women’s vaginal regions. The fact that this cancer does not manifest any signs when it first appears is the most alarming feature of it [4]. In a report by Elakkiya et al. [5], it has been mentioned that this type of cancer is curable with early detection in the early stage. Unfortunately, the mortality rate is getting higher among women around the world that have been affected by this cancer [6,7,8]. The traditional method of manual inspection, also known as a Pap smear examination, is inaccurate due to human error that may lead to a false patient diagnosis [9,10]. The technology for automated cervical cancer screening is indeed very significant for lessening the risk of cervical cancer. However, the existing approach using machine learning has drawbacks, including poor generalisation capacity in complicated situations, as well as low efficiency, accuracy, and generalisation ability [11]. Several studies have attempted to investigate the ability of machine learning to classify cervical cancer cells for the purpose of enhancing manual screening [10,12,13]. The most often used approach for predicting characteristics from a high-dimensional collection of cancer imaging data is the random forest approach [14,15,16]. However, if a large number of decision trees are utilised, the random forest approach might become too sluggish and ineffective for real-time predictions [10]. In addition, current classification approaches, such as deep learning (DL) or hand-crafted techniques, mostly rely on single detection structures and have high processing complexity and low accuracy [17].

2. Pap Smear Images Classification

The cancer disease’s significance has increased, as public health worries about the region’s development and success. Microscopic image-based analysis has been extensively used in pathological research and disease diagnosis. However, the misauthentication of cell lines due to pathologists’ errors has been identified as a severe issue. Therefore, a comparative evaluation of the proposed model was conducted to illustrate the utility of feature selection and class imbalance based on the classifier’s accuracy, sensitivity, precision, F-measure, and specificity.

Classification of Cells Based on Machine Learning Approach

There are numerous approaches that have been done by previous researchers in the area of cell classification. The methodology and results of the approaches are summarised to make it easier to compare the findings of the studies. Table 1 illustrates the summary of the prevailing works related to the classification of cervical cancer cells using machine learning.
Table 1. Summary of the prevailing works.
Title Class and Database Methodology Results
CVM-Cervix: A hybrid cervical Pap smear image classification framework using CNN, visual transformer, and multilayer perceptron [18] Database: CRIC dataset, SIPaKMeD dataset and combination of CRIC and SIPaKMeD

Class: CRIC-6 class, SIPaKMeD 5 Class
Framework CVM-Cervix based on deep learning.

Type: Machine learning—Neural network
Effective and potential of the proposed CVM-Cervix proved.
A Comparative Analysis of Deep Learning Models for Automated Cross-Preparation Diagnosis of Multi-Cell Liquid Pap Smear Images [37] Database: Thin Prep Pap dataset

Class: 4 Bethesda class
An ensemble novel convolutional neural network (CNN) and a CNN with autoencoder (AE).

Type: Machine learning—Neural network
All models’ accuracy >905.
The individual transfer model had high variability in performance, while CNN and AE CNN did not.
ResNet101 accuracy is 92.65%.
Multi-class nucleus detection and classification using a deep convolutional neural network with enhanced high dimensional dissimilarity translation model on cervical cells [38] Database: Herlev dataset, SIPaKMeD dataset, CRIC dataset

Class: Herlev-7 class, SIPaKMeD-5 Class, CRIC-6 class
Segmentation—hybrid system that incorporates two binary image patches obtained by a 19-layered convolutional neural network (ConvNet) model with an enhanced deep high dimensional dissimilarity translation (HDDT).
A Pre trained Resnet-50 model.
T-distribution stochastic neighbour embedding (t-SNE) for down-sampled.
Classification using a multi-class weighted kernel extreme learning machine (WKELM) classifier via a sparse multicanonical correlation (SMCCA) method.

Type: Machine learning—Neural network
Accuracy 99.12%
Specificity 99.45%
Sensitivity 99.25%
Execution time 99.6248 s
The proposed model is more effective compared to existing approaches.
Hybrid Loss-Constrained Lightweight Convolutional Neural Networks for Cervical Cell Classification [39] Database: Herlev dataset

Class: Herlev-7 class
Classification using hybrid loss function with label smoothing to improve the distinguishing power of lightweight convolutional neural networks (CNN)

Type: Machine learning—Neural network
ShufflenetV2 results:-
Accuracy 96.18%
Precision 96.30%
Recall 96.23%
Specificity 99.08%
GhostnetV2 results:-
Accuracy 96.39%
Precision 96.42%
Recall 96.39%
Specificity 99.09%
Detection of cervical cells based on improved SSD network [20] Database: Herlev dataset

Class: Herlev-7 class
Integration of Single Shot MultiBox Detector with the positive and negative features to address the problem of insufficient sensitivity for small objects.

Type: Machine learning—Neural network
Accuracy 90.80%
Mean average precision (mAP) is 81.53%, which is 7.54% and 4.92% higher than YOLO and classical SSD.
Detection of cervical cancer cells in a complex situation based on improved YOLOv3 network [1] Database: Herlev dataset

Class: Herlev-7 class
Detection using the YOLO algorithm.
Feature extraction generalization by adding the dense block and S3Pool algorithm on the basis of the feature extraction network DarkNet-53.
Clustering algorithm of improved algorithm k-means++.

Type: Machine learning—Neural network
Mean average precision (mAP) is 78.87%, which is 7.54% and 4.92% higher than YOLO (You Only Look Once) and classical SSD
Pap smear-based cervical cancer detection using residual neural networks deep learning architecture [27] Database: Mendeley LBC SIPaKMeD dataset

Class: Mendeley LBC SIPaKMeD—4 class
Data augmentation module of DTWCT module and convolutional neural networks (CNN).
Classification using ResNet 18, which defines four classes of sources for Pap smear cell images.

Type: Machine learning—Neural network
Average Pap smear detection index (PDI) is 99%.
Cervical cell multi-classification algorithm using global context information and attention mechanism [3] Database: SIPaKMeD dataset

Class: SIPaKMeD—5 class
Convolutional neural network (L-PCNN) that integrates global context information and attention mechanism.
Improved ResNet-50 backbone network for feature extraction.

Type: Machine learning—Neural network
Accuracy 98.89%. Sensitivity 99.9%. Specificity 99.8%. F-measure 99.89%.
DeepCyto: a hybrid framework for cervical cancer classification using deep feature fusion of cytology images [40] Database: Herlev dataset, SIPaKMeD dataset, LBC dataset

Class: Herlev-7 class, SIPaKMeD-5 class, LBC-4 class
Novel classification using DeepCyto.
Principal component analysis and machine learning ensemble for classification of Pap smear images.
Artificial neural network with feature fusion vectors as an input for classification.

Type: Machine learning—Neural network
DeepCyto is a powerful tool for precise feature extraction and Pap smear image classification.
Classification of Cervical Cytology Overlapping Cell Images with Transfer Learning Architectures [29] Database: Cervix93 cervical cytology image

Class: 3 class
Transfer learning using deep learning convolutional neural network.
Cutting edge pretrained networks: AlexNet, ImageNet, and Places 365.

Type: Machine learning—Neural network
Accuracy 99.03%.
Kappa coefficient showing perfect agreement.
AlexNet proved a successful assistive tool for cervical cancer detection.
Optimal deep convolution neural network for cervical cancer diagnosis model [8] Database: Herlev dataset

Class: Herlev-7 class
Detection using intelligent deep convolutional neural network.
Classification (IDCNN-CDC) model using biomedical Pap smear images.
Noise removal using Gaussian Filter.
Segmentation using the Tsallis entropy technique with the dragonfly optimization.
Deep learned feature using SqueezeNet.
Classification using weighted extreme learning machine (ELM).

Type: Machine learning—Neural network
Higher performance of the proposed technique in terms of sensitivity, specificity, accuracy, and F-Score.
Modified metaheuristics with stacked sparse denoising autoencoder model for cervical cancer classification [9]. Database: Herlev dataset

Class: Herlev-7 class
Novel Modified Firefly Optimization Algorithm with Deep Learning-enabled cervical cancer classification (MFFOA-DL3) model.
Noise removal using Bilateral Filtering (BF)-based.
Segmentation technique of Kapur’s entropy-based image to define affected area.
Generate feature vectors using EfficientNet.
Classification of the cell using MFFOA with Stacked Sparse Denoising Autoencoder (SSDA) model.

Type: Machine learning—Neural network
The findings of a comprehensive comparison investigation revealed that the MFFOA-DL3 model outperformed other recent approaches.
Imaging based cervical cancer diagnostics using small object detection—generative adversarial networks [5] Database: Herlev dataset, Colposcopy images, Clinical references

Class: not applicable
An effective hybrid deep learning technique using Small-Object Detection-Generative Adversarial Networks (SOD-GAN) with Fine-tuned Stacked Autoencoder (F-SAE).
Generation and discrimination of the cervical cell using Region-based Convolutional Neural Network (RCNN).

Type: Machine learning—Neural network
The proposed method identifies and classifies cervical premalignant and malignant diseases based on deep characteristics without the necessity for initial classification and segmentation.
Cervical cancer diagnosis based on modified uniform local ternary patterns and feed forward multilayer network optimized by genetic algorithm [41] Database: Herlev dataset

Class: Herlev-7 class
Segmentation of the image using a thresholding approach.
Feature extraction by applying a texture descriptor titled modified uniform local ternary patterns (MULTP).
Classification of the cell using an optimized multilayer feed-forward neural network.

Type: Machine learning—Neural network
MULTP, the proposed texture descriptor, is a generic operator that may be used to characterise texture features of images in numerous computer vision issues. In addition, the suggested optimization approach may be utilised to increase performance in deep networks.
Early cervical cancer diagnosis using Sooty tern-optimized CNN-LSTM classifier [11] Database: Herlev dataset

Class: Herlev-7 class
Augmentation process of image enhancement, image flipping, and image rotating to reduce the number of parameters necessary.
Segmentation of the cancer-affected regions with the help of kernel weighted fuzzy local information c-means clustering (KWFLICM) model.
Classification using the Sooty Tern Optimization (STO) algorithm with CNN-based long short-term memory classifier (CNN-LSTM).

Type: Machine learning—Neural network
Accuracy 99.80%.
Specificity 99%.
Sensitivity 98.83%.
F-Score 97.8.
Improvement of 28.5% better than Random Forest and 19.46% better than ensemble classifier.
Hybrid Model for Detection of Cervical Cancer Using Causal Analysis and Machine Learning Techniques [10] Database: Private

Class: not applicable
Boruta analysis and SVM method for an efficient feature selection and prediction of the model for the cervical cell dataset.

Type: Machine learning—Linear model
Boruta analysis shows a better performance approach compared to the existing techniques available.
Cervical Cancer Classification Using Combined Machine Learning and Deep Learning Approach [42] Database: Herlev dataset

Class: Herlev-2 class
Feature extraction using ResNet-101.
Classification using Support vector Machine (SVM).

Type: Machine learning—Linear model
Accuracy 97.30%.
Auxiliary classification of cervical cells based on the multi-domain hybrid deep learning framework [17] Database: Herlev dataset, SIPaKMeD dataset, BJTU dataset

Class: Herlev-2&7 class, SIPaKMeD-5 Class, BJTU-7 class
Deep features extraction using deep Convolutional Neural Network of pretrained Visual Geometry Group-19 (VGG-19).
Hand-crafted images undergo the process of feature selection, clustering, and dimensionality reduction.
Classification using a Support Vector Machine (SVM) classifier.

Type: Machine learning-Linear model
Accuracy 98.70%.
Sensitivity 98.20%.
Specificity 98.90%.
The suggested novel screening methodology is promising for early cervical cancer detection, with multi-domain and hybrid characteristics proving realistic in clinical practise.
An Evaluation of Computational Learning-based Methods for the Segmentation of Nuclei in Cervical Cancer Cells from Microscopic Images [43] Database: Z-Stack cellular microscopy proliferation images provided by the HCS Pharma

Class: not applicable
Machine learning architecture of Random Forest, Ada Boost, and MLP algorithm.

Type: Machine learning—Nonlinear model
All machine learning architectures gave outstanding nuclei segmentation in cervical cancer cells but did not solve the overlapping nuclei and Z-stack segmentation problems.
Prognosis of Cervical Cancer Disease by Applying Machine Learning Techniques [4] Database: Dataset of 858 cervical cancer patients with 36 risk factors and one outcome variable
Class: not applicable
Analysis of the different supervised machine learning techniques.
The classification algorithm used Artificial Neural Network, Bayesian Network, SVM, Random Tree, Logistic Tree and XG-Boost Tree.
Selection algorithm for feature selection: relief rank, wrapper method, and LASSO regression.

Type: Machine learning—Nonlinear model
Maximum accuracy achieved using XG-Boost with complete features 94.94%.
This approach offers much potential for clinical use and cervical cancer cell detection.
Is the aspect ratio of cells important in deep learning? A robust comparison of deep learning methods for multi-scale cytopathology cell image classification: From convolutional neural networks to visual transformers [44] Database-SIPaKMeD dataset

Class: not applicable
Twenty-two deep learning models were used to classify the cervical cancer cells into two categories of standard and scaled datasets.

Type: Machine learning—Nonlinear model
Deep learning models are robust to changes in the aspect ratio of cervical cells in cervical cytopathological images.
A Fast Hybrid Classification Algorithm with Feature Reduction for Medical Images [45] Database: Herlev dataset

Class: Herlev-7 class
A novel fast hybrid fuzzy classification algorithm with feature reduction for medical images.
Integration of quantum-based grasshopper computing algorithm (QGH) with a fuzzy clustering technique for feature extraction.
The second integration of the fusion technique utilises QGH with the fuzzy c-means algorithm to determine the best features.

Type: Machine learning—Nonlinear model
Established the importance of the feature selection on the accuracy of the proposed classifier
Cervical Cancer Classification from Pap Smear Images Using Modified Fuzzy C Means, PCA, and KNN [7] Database: Herlev dataset

Class: Herlev-7 class
Geometrical and feature extraction using a novel approach of modified fuzzy c-means.
Augmentation of the images using Principal Component Analysis (PCA) to maintain the uncorrelated features and thus reduce the algorithm processing time.
Classification of the Pap smear image into normal and abnormal cells using K Nearest Neighbour (KNN).

Type: Machine learning—Nonlinear model
Minimum accuracy 94.15%.
Maximum accuracy 96.28%.
Average accuracy 94.86%.
Sensitivity 97.96%.
Specificity 83.65%.
F1-Score 96.87%.
Precision 96.31%.
A Semi-supervised Deep Learning Method for Cervical Cell Classification [28] Database: Herlev dataset, SIPaKMeD dataset

Class: Herlev-7 class, SIPaKMeD-5 Class
A novel manual features and voting mechanism to achieve data expansion in semi-supervised learning.
Clarity function to filter out higher-quality images, annotating a small amount of the high-quality images, and voting mechanism for balancing and training data.

Type: Machine learning—Classifier
Accuracy 91.94%.
Among the plausible explanations for these findings is that several previous attempts have been made in the area of cervical cell classification. Convolutional neural network (CNN) is currently one of the best approaches for the classification process. Several techniques have been studied by previous researchers related to CNN, such as CNN-based long short-term memory classifier, region-based classifier, lightweight, ResNet-50, and others. For example, Chitra et al. [11] introduced a technique of classification using the Sooty Tern Optimization (STO) algorithm with a CNN-based long short-term memory classifier (CNN-LSTM) and achieved better performance results compared to other literature reviews. The results achieved in the study have shown that the accuracy is 99.80%, specificity is 99%, sensitivity is 98.83%, and F-score is 97.8. Their findings show an improvement of 28.5% better than Random Forest and 19.46% better than the ensemble classifier. The findings are consistent with the findings of the past study by Li et al. [3], which also achieved almost similar values of accuracy, specificity, sensitivity, and F-measure as the previous study by Chitra et al. [11]. The method applied was a pulse convolutional neural network (PCNN) that integrates a global context information and attention mechanism with an improved ResNet-50 backbone network for feature extraction. Other than that, in a study by Liu et al. [44], a conclusion was made that DL models are robust to changes in the aspect ratio of cervical cells in cervical cytopathological images. The above findings contradict the study by Elakkiya et al. [5], which proposed a method of identification and classification of cervical premalignant and malignant diseases based on deep characteristics without the necessity for initial classification and segmentation. The findings of the literature have come out with four different methods of classification used by previous studies, which are neural network-based classification, linear model classification, nonlinear classification model, and others.

This entry is adapted from the peer-reviewed paper 10.3390/diagnostics12122900

This entry is offline, you can click here to edit this entry!
ScholarVision Creations