Computer-Aided Diagnosis Approach for Breast Cancer

Computer-Aided Diagnosis Approach for Breast Cancer: Comparison

Please note this is a comparison between Version 2 by Beatrix Zheng and Version 1 by Ahmed M. Zaalouk.

Breast cancer is a gigantic burden on humanity, causing the loss of enormous numbers of lives and amounts of money. It is the world’s leading type of cancer among women and a leading cause of mortality and morbidity. The histopathological examination of breast tissue biopsies is the gold standard for diagnosis. A computer-aided diagnosis (CAD) system based on deep learning is developed to ease the pathologist’s mission A new transfer learning approach is introduced for breast cancer classification using a set of pre-trained Convolutional Neural Network (CNN) models with the help of data augmentation techniques. Multiple experiments are performed to analyze the performance of these pre-trained CNN models through carrying out magnification dependent and magnification independent binary and eight-class classifications. Xception model has shown a promising performance through achieving the highest classification accuracy for all experiments.

BreakHis
breast cancer
computer-aided diagnosis
deep learning
histopathological images

1. Introduction

Breast cancer claims the lives of 40,000 women in the United States each year ^[1]. Twelve percent of women are diagnosed annually with breast cancer. It usually affects women over the age of 40 ^[2]. In recent years, the healthcare cost of a wide spectrum of diseases has crippled the economies of the world nations. A disease such as cancer can be extremely costly in terms of lives, quality of life, and money. Breast cancer occupies a unique place in this category of diseases, as breast cancer is the world’s leading type of cancer in female patients. In 2020, 2.3 million women were diagnosed with this disease, and it caused 685,000 fatalities. Moreover, 7.8 million women were diagnosed with this disease in the previous five years as of the end of 2020, making it the most common cancer in the world ^[3]. The lost disability-adjusted year rate in women is the highest in breast cancer compared to any other disease. The early identification of pre-cancerous and cancerous cases proved to be extremely effective in providing higher outcomes of cure. This was concluded from the fact that the developed countries achieved better survival rates in 1970s after early detection and intervention programs ^[4]. There are several tools that can be used to screen and diagnose breast cancer. These tools fall into three main categories: clinical examination, imaging techniques, and histopathological examination of tissue biopsies ^[5]. The gold standard for the diagnosis is tissue biopsy examination ^[6]. Thus, myriads of samples are produced annually, which can be considered as a huge burden on the pathologists to examine and accurately identify the diseases in it. The examination is conducted by placing a breast tissue stained by special stains, most commonly by hematoxylin and eosin, in a glass slide and examined under a microscope. Different magnification lenses are used to examine the slide. If the pathologist detects cellular atypia (general criteria of malignancy) or a tissue abnormality (e.g., ductal invasion), then the diagnosis is concluded, and the subtype of the cancer is determined ^[7].

In order to aid the pathologists in diagnostic process, a computer-aided diagnosis (CAD) system can be produced to reduce the errors, effort, time, and cost in the process. The system is just a helping tool and the pathologist’s judgment is irreplaceable for the diagnostic process. One way to develop a CAD system is to rely on the extraction of handcrafted features from the histopathological images, mainly to train traditional machine learning models and predict unseen inputs later. However, this approach is not popular, as it requires extensive prior-domain knowledge for the extraction of the handcrafted features. Additionally, the generated classification accuracy from such systems is insufficient to attain trust for diagnosis in such a field ^[8]. An alternative approach is to use the current advances in deep learning for the development of such systems. This approach does not require any prior domain knowledge, as the model learns intrinsic features from the input raw images. Additionally, the generated classification accuracy is very impressive if such a system is well designed. However, deep learning models require a huge number of input images to be trained effectively to have a great generalization ability on unseen data. This was a huge issue for histopathological images for breast cancer, until the release of BreakHis dataset, which contains 7909 histopathological images ^[9]. Still, this number of images is not enough for the development of a high accuracy image recognition system.

2. Related Work on Computer-Aided Diagnosis

Boumaraf et al. ^[8] implement a transfer learning-based CNN using ResNet-18 model. They used the BreakHis dataset, and they divided it into 80% training set and 20% testing set. The experiments carried out are magnification-dependent and magnification-independent classification for both binary and multi-class (eight classes) classifications. ResNet-18 is trained on the ImageNet database. The transfer learning strategy is to train only the last two residual blocks and freeze the rest of the blocks; consequently, it makes ResNet more domain-specific (to learn the intrinsic features of the histopathology images). Meanwhile, they have first resized the input images size to 224 × 224. Additionally, global contrast normalization (GCN) has been adopted to prevent the images from having different values of contrast. Moreover, three-fold data augmentation has been used, in which each image is transformed into three images by applying three transformation techniques, which are random horizontal flip, random vertical flip, and random rotation with 40°. The metrics used to evaluate the performance are accuracy, precision, recall, F-measure, and Matthews’ correlation coefficient. The study achieves 98.42% accuracy in magnification-independent binary classification and 92.03% in magnification-independent multi-classification. Moreover, 98.84% average accuracy in magnification-dependent binary classification and 92.15% average accuracy in magnification-dependent multi-classification are achieved. The detailed results are illustrated in Table 1.

Davoudi et al. ^[10] design and implement a CNN for the detection of binary classes of BreakHis dataset independent of the magnification factors. The main contribution of their study is to try to optimize the weights of the CNN using genetic algorithms (GAs) instead of the normal optimizers. The model is trained using Adam, mini-batch gradient descent, and the GA optimizers. The evaluation metrics used in this study are accuracy, recall, precision, F1-score, and execution time. They divide the BreakHis dataset as 70% training set and 30% testing set. The model achieves 69.88% accuracy with gradient descent optimizer, 85.83% accuracy with Adam optimizer and 85.49% accuracy with the GA optimizer. The model accepts the images with a size of 210 × 210 × 3. Table 1 illustrates their results.

Spanhol et al. ^[11] tested the use of DeCaf features. DeCaf is simply the usage of a pre-trained CNN as a feature vector with a classifier on top of it that is trained for the new classification task. In more detail, an output of a given layer of a pre-trained network is used as an input to a classifier. Logistic regression has been adopted as the classifier for this study. BVLC CaffeNet is used as the pre-trained model in their study. The study considers patch-based recognition and different configurations for it. Firstly, they test the output from three layers solely. These layers are fc6, fc7, and fc8. After that, they consider the features through combining the output from more than one layer. The classification tasks performed in this study are the binary magnification-dependent tasks. Image-level accuracy and patient-level accuracy are used as the performance metrics. Additionally, they consider the F1-score at the patient level and image level. The highest patient-level accuracy is obtained for the 200× dataset, and it is equal to 86.3 ± 3.5. Meanwhile, the highest image-level accuracy is computed for the 200× dataset, and it is equal to 84.2 ± 1.7. The detailed patient-level and image-level accuracies for each magnification factor is presented in Table 1.

Bardou et al. ^[12] implement many approaches for the classification of breast cancer using the BreakHis dataset. The preformed experiments are magnification-dependent binary classification and multi-class classification. A CNN model is proposed in their study with five convolutional layers and two fully connected layers. This model has been developed with the original dataset without data augmentation in one trial and with data augmentation in another trial through applying horizontal flip and rotation with three angles: 90°, 180°, and 270°. Additionally, they test the CNN + support vector machine (SVM) configuration through using a linear support vector machine instead of the fully connected layer, CNN + classifiers, which is simply the usage of CNN to extract features and then classifying them through random forests, radial basis support vector machine, linear support vector machine, and K-nearest neighbors. Another approach they follow is the usage of an ensemble model in which they use 10 models, and then the probability vector of each sample in the test set is extracted from the last fully connected layer (with softmax activation) of each model, creating 10 probability vectors. Finally, these 10 probability vectors are summed up and the maximum value is computed among them to output the predicted class. Moreover, they extract handcrafted features from the images and then they classify them with traditional classifiers such as support vector machine and CNN (the extracted handcrafted features are given as input to the CNN). Moreover, they divide the dataset into 70% training set and 30% test set. The evaluation metrics used in their study are accuracy, precision, recall, and F1-score. The highest results are achieved by the ensemble model, in which they achieve accuracy in the interval [96.15%, 98.33%] for the binary classification experiments and [83.31%, 88.23%] for the multiclassification experiments. Table 1 illustrates these results in more details.

Xiang et al. ^[13] uses a pre-trained Inception-V3 model for the detection of malignant and benign tumours. They carry out malignant and benign magnification-independent classification. The dataset used is BreakHis dataset. The input size for their model is 229 × 229. Data augmentation techniques are used to overcome overfitting, mainly using image flip and rotation techniques. They flip each image and rotate them around their centers with angles of 90°, 180°, and 270°. Using a data augmentation strategy, they increase the BreakHis dataset by five times. The dataset is divided into train, validation, and test sets with an approximate ratio of 3:1:1. A cross-validation training strategy is adopted. The evaluation metrics used in their study are the image-classification rate and patient-classification rate. The best results are achieved using the cross-validation training strategy on the expanded dataset with an image accuracy of 95.7% and patient accuracy of 97.2%. The detailed results are shown in Table 1.

Shallu et al. ^[14] conduct a study to determine whether to use transfer learning or a fully trained model to classify the histopathological images in the BreakHis dataset. Three pre-trained models for this task are used: VGG16, VGG19, and ResNet50. These models are used as feature extractors only and they have used logistic regression as a classifier. Moreover, three different training and testing splitting ratios are used to determine the effect of using different ratios on the results. The ratios used are 90–10%, 80–20%, and 70–30%. Only rotation is used for the data augmentation, where the images are rotated around their centers with three different angles: 90°, 180°, and 270°. A binary magnification-independent experiment on the BreakHis dataset is carried out in this study to create a balanced dataset for the process of fine-tuning and the full training of the models.

Table 1.

Performance results achieved by the related models.

Reference	Classification Type	Dataset Splitting Ratio	Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)	MCC	Patient Level Accuracy (%)	Average Precision Score (%)
Boumaraf et al.	^[8]	Magnification-independent binary classification	80% training set and 20% testing set	98.42	98.75	99.01	98.88	0.9619	-	-
		Magnification-independent multi-classification		92.03	91.39	90.28	90.77	0.8938	-	-
		40× binary classification		99.25	99.63	99.26	99.44	0.9829	-	-
		100× binary classification		99.04	98.99	99.66	99.33	0.9765	-	-
		200× binary classification		99.00	98.94	99.65	99.29	0.9762	-	-
		400× binary classification		98.08	98.00	99.19	98.59	0.9558	-	-
		40× multi-classification		94.49	93.81	94.78	94.15	0.9283	-	-
		100× multi-classification		93.27	92.94	91.59	92.23	0.9141	-	-
		200× multi- classification		91.29	91.18	88.28	89.47	0.8895	-	-
		400× multi- classification		89.56	87.97	87.97	87.77	0.8652	-	-
Davoudi et al.	^[10]	Magnification-independent binary classification with Gradient descent optimizer	70% training set and 30% testing set	69.88	84.37	55.61	67.02	-	-	-
		Magnification-independent binary classification with Adam optimizer		85.83	96.23	72.31	82.56	-	-	-
		Magnification-independent binary classification with GA optimizer		85.49	94.71	69.43	80.11	-	-	-
Spanhol et al.	^[11]	40× binary classification	-	84.6 ± 2.9	-	-	-	-	84.0 ± 6.9	-
		100× binary classification		84.8 ± 4.2	-	-	-	-	83.9 ± 5.9	-
		200×binary classification		84.2 ± 1.7	-	-	-	-	86.3 ± 3.5	-
		400× binary classification		81.6 ± 3.7	-	-	-	-	82.1 ± 2.4	-
Bardou et al.	^[12]	40× binary classification	70% training set and 30% testing set	98.33	97.80	97.57	97.68	-	-	-
		100× binary classification		97.12	98.58	96.98	97.77	-	-	-
		200× binary classification		97.85	95.61	99.28	97.41	-	-	-
		400× binary classification		96.15	97.54	96.49	97.01	-	-	-
		40× multi-classification		88.23	84.27	83.79	83.74	-	-	-
		100× multi-classification		84.64	84.29	84.48	84.31	-	-	-
		200× multi- classification		83.31	81.85	80.83	80.48	-	-	-
		400× multi- classification		83.98	80.84	81.03	80.63	-	-	-
Xiang et al.	^[13]	Magnification-independent binary classification with normal training strategy and original database	3:1:1-training, validation, and testing sets	92.8	-	-	-	-	93.6	-
		Magnification-independent binary classification with normal training strategy and expanded database		94.6	-	-	-	-	95.4	-
		Magnification-independent binary classification with cross validation training strategy and original database		93.2	-	-	-	-	94.1	-
		Magnification-independent binary classification with cross validation training strategy and expanded database		95.7	-	-	-	-	97.2	-
Shallu et al.	^[14]	Magnification-independent binary classification using fine-tuned pre-trained VGG-16 network with logistic regression classifier	90% training set and 10% testing set	92.60	93	93	93	-	-	95.95
Min Liu et al.	^[15]	40× binary classification	60% training set, 20% validation set and 20% testing set	98.15 ± 0.9	-	-	-	-	-	-
		100× binary classification		97.71 ± 1.9	-	-	-	-	-	-
		200× binary classification		97.96 ± 0.7	-	-	-	-	-	-
		400× binary classification		98.48 ± 1.1	-	-	-	-	-	-

The images in the malignant class (which are greater than the images of the benign class) are down-sampled to a number equal to the number of images in the benign class. To fully train the networks from scratch, the weights are initialized randomly. However, they have kept the weights of the pre-trained networks without change for the transfer-learning approach. The performance measures used are accuracy, precision, recall, F1-score, and average precision score (APS). In addition, they use receiver operating characteristics (ROC) and area under the curve (AUC) to further validate the model performance. The results show that the fined-tuned VGG-16 has the best performance with 92.60% accuracy using the 90–10% training and testing data-splitting ratio. Table 1 presents the best results obtained for VGG-16.

Liu et al. ^[15] implement a CNN model called the AlexNet-BC model. This model is pre-trained on the ImageNet dataset and then fine-tuned using transfer learning. Moreover, many data augmentation techniques are used to expand the dataset. Additionally, a new loss function approach is proposed and implemented. The proposed model is trained and tested using the BreakHis dataset for the four different magnification factors in binary classification mode, and then the model is further verified using UCSB and IDC datasets. They have divided the BreakHis dataset into 60% for the training set, 20% for the validation set, and 20% for the test set. One evaluation metric is used in this study, which is accuracy. They have achieved a range of accuracies in the interval [97.71 ± 1.9%, 98.48 ± 1.1%]. Since Table 1 consists of results achieved by previous studies using the BreakHis dataset, wthe researchers only illustrate in Table 1 the results achieved by this study using the BreakHis dataset.

References

Khamparia, A.; Bharati, S.; Podder, P.; Gupta, D.; Khanna, A.; Phung, T.K.; Thanh, D.N.H. Diagnosis of breast cancer based on modern mammography using hybrid transfer learning. Multidimens. Syst. Signal Process. 2021, 32, 747–765.
Zerouaoui, H.; Idri, A. Reviewing Machine Learning and Image Processing Based Decision-Making Systems for Breast Cancer Imaging. J. Med. Syst. 2021, 45, 1–20.
Sung, H.; Ferlay, J.; Siegel, R.L.; Laversanne, M.; Soerjomataram, I.; Jemal, A.; Bray, F. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J. Clin. 2021, 71, 209–249.
WHO. Int. Breast Cancer. 2021. Available online: https://www.who.int/news-room/fact-sheets/detail/breast-cancer (accessed on 15 September 2021).
Breast Cancer—Diagnosis and Treatment—Mayo Clinic. 2022. Available online: https://www.mayoclinic.org/diseases-conditions/breast-cancer/diagnosis-treatment/drc-20352475 (accessed on 1 April 2022).
Javaeed, A. Breast cancer screening and diagnosis: A glance back and a look forward. Int. J. Community Med. Public Health 2018, 5, 4997–5002.
He, L.; Long, L.R.; Antani, S.; Thoma, G.R. Histology image analysis for carcinoma detection and grading. Comput. Methods Programs Biomed. 2012, 107, 538–556.
Boumaraf, S.; Liu, X.; Zheng, Z.; Ma, X.; Ferkous, C. A new transfer learning based approach to magnification dependent and independent classification of breast cancer in histopathological images. Biomed. Signal Process. Control 2020, 63, 102192.
Spanhol, F.; Oliveira, L.; Petitjean, C.; Heutte, L. A Dataset for Breast Cancer Histopathological Image Classification. IEEE Trans. Biomed. Eng. 2016, 63, 1455–1462.
Davoudi, K.; Thulasiraman, P. Evolving convolutional neural network parameters through the genetic algorithm for the breast cancer classification problem. Simulation 2021, 97, 511–527.
Spanhol, F.A.; Oliveira, L.S.; Cavalin, P.R.; Petitjean, C.; Heutte, L. Deep features for breast cancer histopathological image classification. In Proceedings of the 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Banff, AB, Canada, 5–8 October 2017; pp. 1868–1873.
Bardou, D.; Zhang, K.; Ahmad, S.M. Classification of Breast Cancer Based on Histology Images Using Convolutional Neural Networks. IEEE Access 2018, 6, 24680–24693.
Xiang, Z.; Ting, Z.; Weiyan, F.; Cong, L. Breast Cancer Diagnosis from Histopathological Image based on Deep Learning. In Proceedings of the 2019 Chinese Control and Decision Conference (CCDC), Nanchang, China, 3–5 June 2019; pp. 4616–4619.
Shallu; Mehra, R. Breast cancer histology images classification: Training from scratch or transfer learning? ICT Express 2018, 4, 247–254.
Liu, M.; Hu, L.; Tang, Y.; Wang, C.; He, Y.; Zeng, C.; Lin, K.; He, Z.; Huo, W. A Deep Learning Method for Breast Cancer Classification in the Pathology Images. IEEE J. Biomed. Health Inform. 2022, 1–8.