Objective Diagnosis for Histopathological Images

Objective Diagnosis for Histopathological Images: History

Please note this is an old version of this entry, which may differ significantly from the current revision.

Subjects: Computer Science, Artificial Intelligence

Contributor: Naira Elazab ,

Hassan Soliman

, Shaker El-Sappagh , S. M. Riazul Islam , Mohammed Elmogy

Histopathology refers to the examination by a pathologist of biopsy samples. Histopathology images are captured by a microscope to locate, examine, and classify many diseases, such as different cancer types. They provide a detailed view of different types of diseases and their tissue status. These images are an essential resource with which to define biological compositions or analyze cell and tissue structures. This imaging modality is very important for diagnostic applications.
The analysis of histopathology images is a prolific and relevant research area supporting disease diagnosis. In this paper, the challenges of histopathology image analysis are evaluated. An extensive review of conventional and deep learning techniques that have been applied in histological image analyses is presented. This review summarizes many current datasets and highlights important challenges and constraints with recent deep learning techniques, alongside possible future research avenues. Despite the progress made in this research area so far, it is still a significant area of open research because of the variety of imaging techniques and disease-specific characteristics.

medical image analysis
histopathology image analysis
conventional machine learning methods
deep learning methods
computer-assisted diagnosis

Please note: Below is an entry draft based on your previous paper, which is written tightly around the entry title. Since it may not be very comprehensive, we kindly invite you to modify it (both title and content can be replaced) according to your extensive expertise. We believe this entry would be beneficial to generate more views for your work. In addition, no worry about the entry format, we will correct it and add references after the entry is online (you can also send a word file to us, and we will help you with submitting it).

Definition

1. Introduction

Medical Images are a fundamental section of each patient’s digital health file. Such images are produced by individual radiologists who are restricted by speed, professional weaknesses, or a lack of practice. It requires decades and reasonable financial resources to train a radiologist. Additionally, some medical care methods outsource radiology confirmations to less economically developed nations, such as India, via teleradiology. A late or incorrect analysis can cause injury to the patient. Thus, it would be beneficial for medical imaging (MI) analyses to be performed by automatic, precise, and effective machine learning (ML) algorithms. MI analysis is a significant research area for ML, in part because the information is somewhat organized and labeled; i.e., this is probable if the patient was examined in a region with good ML systems [1]. That is significant for two reasons. First, with regards to real patient metrics, MI analysis is a litmus check regarding whether ML techniques would, in actuality, improve individual outcomes and survival. Second, it provides a testbed for human–ML interactions—i.e., how responsive is an individual likely to be to the health changing possibilities being put forward or aided by a nonhuman actor [2]. In recent years, ML has shown significant advances. For a wide variety of applications, including image recognition, medical diagnosis, defect identification and construction health assessments, the potential of this field has also expanded. These new developments in ML are due to many factors, like the creation of self-learning mathematical models that enable computer techniques to execute particular (human-like) tasks based solely on learned patterns, in addition to the increase in the computer power that supports these models’ analytical capabilities [3].

There are many imaging types, and their use is becoming more widespread. Types of MI include ultrasound, X-ray, magnetic resonance imaging (MRI), retinal scans, histopathology images (HI), computed tomography (CT), positron emission tomography (PET), and dermoscopy images. Some examples of MIs are shown in Figure 1. Many of these types analyze numerous organs, such as CT and MRI, whereas others are organ-specific, such as retinal and dermoscopy images [4]. The quantity of produced information from each analysis stage differs depending on nature of the MI and the tested organs. HIs are useful for biological studies and to make medical decisions. In addition, they are generally utilized to provide “ground truths” (GTs) for other modalities of MI, such as MRI. A histology slide is a digital record a few megabytes in size, while a magnetic resonance image can be several hundred megabytes. This has a technical effect on how the data is preprocessed and on the architecture design of the algorithm in terms of processor and storage limitations [5].

Figure 1. Examples of some medical image types: (a) MRI scan of the left side of a brain; (b) an axial CT brain scan; (c) an axial CT lung scan; (d) chest x-ray; (e) a histology slide with high-grade glioma.

Pathology analyses are traditionally executed by an individual pathologist observing a dyed specimen on a glass slide with a microscope. Lately, efforts have been made to record the whole slide with a reader and save it as an electronic picture, called a whole slide image (WSI) [6].

Digitizing pathology is just one recent development that produces high levels of visible information designed for automated diagnoses. It enables us to see and understand pathologic cell and muscle samples in good quality images with assistance from personal computer tools. It also brings about the possibility of applying image analysis techniques. Such techniques would assist pathologists and support their explanations, such as hosting and grading. Various classification and segmentation methods for HI have already been discussed in this review. We present and compare conventional techniques and deep learning (DL) methods to choose the most appropriate method for histopathology issues [7].

Natural microscopic architecture data and their features at nuclei, tissue, and different organ levels could be key to illness expansion and infection treatment analysis. Additionally, to examine and diagnose the histological image of biologic microscopic, pathologists have identified the morphological features of tissue that show the current presence of infection, such as cancer [8].

Some characteristics of disease, such as tumor-infiltrating lymphocytes, might be deduced from HI alone. Additionally, HI analysis, which is called the “gold standard” in many disease diagnoses, is nearly included in all kinds of cancer detection and treatment procedures. HI needs specific analysis with respect to organs and a specific task for the visualization of various tissue components under a microscope. With one or more stains, the sections are dyed. These are staining attempts to uncover cellular elements. The contrast is shown by using counterstains [9].

Efficient ML algorithms are presented and used in HI analysis to help pathologists to acquire a quick, stable, and quantified examination result for a more accurate diagnosis. Many different traditional and deep learning methods support the pathologists in accessing more tissues to determine the internal relationship between the visual images and the specific illness. Additionally, since the ML techniques are generally semi- or fully automated, they are effective, encouraging technical feasibility for histopathology examination within the recent big data age [10].

On the other hand, most of the HI analysis stages are based on mathematical basics. Mathematical operations and functions are applied to all analysis stages, starting from the preprocessing to diagnosis stages to provide an intensive analysis for HIs. Figure 2 illustrates the main phases of a common histopathological images pipeline based on conventional ML techniques. First, HIs are supplied to the system as a 2D array for grayscale images or a 3D array for colored images. Then, the preprocessing stage applies some linear algebra operations on the image array to enhance the image quality. This stage helps to distinguish significant structures from others in the processed images. Third, the segmentation stage is applied to differentiate the cells from other background objects by applying some state-of-the-art mathematical algorithms, such as thresholding, level set, watershed transform, and intensity and texture homogeneity transforms. Fourth, the feature extraction stage extracts the most significant features in the segmented images instead of processing each pixel, which reduces the system’s computation complexity. Besides, most handcrafted features are based on applying some mathematical techniques to detect the changes in the intensity, color, or texture of the pixels. Common derivative techniques are utilized to detect these changes by applying first or second derivatives to pixel values. Finally, the diagnosis stage is applied to classify or cluster the processed images, depending on the extracted features. The classification and clustering techniques are based on applying some mathematical operations that distinguish the processed images based on the extracted features.

Figure 2. An overview of the HI analysis pipeline.

2. Histopathological Image Overview

HI has natural and abnormal biological structures, as well as morphological and architectural features defined by pathologists, based on their knowledge. Even given the tissue area, some structures are small, and related patterns typically have high visual appearance variability. In biological systems and anatomy, most visual variability is inherent [11].

Next to obtaining electronic HI via the biopsy test, the guide analysis of images contributes to variability in diagnosis and treatment. To get over this issue, CAD techniques are applied to provide an objective examination of disease. The fundamental steps necessary for applying the CAD examination system appear in Figure 2. This includes electronic image handling methods, such as segmentation, feature extraction, and classification [12].

HI analysis contains the computations executed at various zoom scales (×2, ×4.5, ×10, ×20, and ×40) for multivariate mathematical examination, analysis, and classification. It could be achieved at a lower zoom for tissue stage examination. Demir et al. [13] presented tissue stage and cell stage examination techniques for cancer diagnosis. They examined HI by applying preprocessing, feature extraction, and classification strategies. The new improvement in electronic pathology requirements for the growth of quantitative and automatic digital image examination methods aids pathologists in understanding the number of digitized HIs [14].

3. Conventional Machine Learning Methods

CAD systems played an essential role and have become an important research topic in HI and diagnostics. Various image processing techniques were applied to examine the disease’s diagnosis and prognosis for these HIs. Various image processing and computer vision (CV) techniques have been implemented for gland and nuclei segmentation, cell kind recognition, or classification to extract quantitative measurements of disease characteristics from HIs and automatically assess whether or not a disease exists inside examined samples. It could help to determine the degree of seriousness of the disease, whether present in the sample. Conventional ML methods often contain a few steps to manage HI, as shown in Figure 3.

Mathematics 08 01863 g003 550

Figure 3. The conventional machine learning methods for HI.

4. Deep Learning Methods

Recently, DL techniques have often been studied in the effective form of ML methods. Within the last few years, DL techniques outperformed traditional ML methods in varied fields, such as CV, natural language processing (NLP), biomedical fields, and automated analysis for HI [7]. DL methods in the CV are derived from the structure levels for nonlinear transformations on natural input pixels. This structure formed significantly abstract representations, which could be realized in a hierarchical style [70]. A typical instance of a commonly applied structure is the CNN [71].

Multiple criteria can be considered when using the DL techniques to deal with histopathology, since accomplishing the method is partly due to the task-species setting. Among the principal features of HI is that appropriate styles be determined by the magnification stage. The key factors are the size of the patch given to the network, the localization of parts in the image where appropriate histopathology originals can be found, and the homogeneity of staining for WSI [72]. The network structure represents an important position, while many studies keep predefined system structures, as illustrated in Figure 4.

Figure 4. The typical deep learning steps for HI analysis.

The majority of the DL techniques for localizing, classifying, and segmenting HI are somewhat recent. Deep neural techniques are stated in the new literature of HI analysis, such as [6,13,36]. For example, Irshad et al. [48] were the first mentioned in a review. The critical patterns from an exhaustive analysis of different nuclei identification, segmentation, and classification approaches utilized in HI, specifically in H&E staining protocols, were described and discussed in this review. Ciresan et al. [56] presented one of the first significant efforts to utilize the deep method in mitosis recognition for HI analysis. Arevalo et al. [73] presented a hybrid illustration method to the basal cell of carcinoma areas and utilized a topographic unsupervised technique and a case of characteristic illustrations. They increased the classifier’s efficiency by 6% regarding traditional structure-based discrete cosine transform (DCT). Nayak et al. [74] presented an alternative method for the unsupervised Boltzmann technique for understanding image signatures. They classified images of the cancer genome atlas (TCGA) for apparent cell-kidney cancer and glioblastoma variform. The last stage was created utilizing the classifier of multi-class support vector machines (SVM) techniques.

Approaches that rely on Generative Adversarial Networks (GANs) are likely to minimize the need for large volumes of manual notations. Not only have recent innovations enhanced initiatives but so have new technologies. Now, unattended techniques may carry out various tasks for which supervised methods are indispensable. The latest state-of-the-art advances in histopathological images of GANs were summarized in [89]. The overview of the discussed studies is summarized in Table 1.

Table 1. Overview of supervised and unsupervised learning models based on DL techniques.

Study	Organ	Staining	Potential Usage	Method
Supervised Learning
Litjens et al. [93]	different tissue	H&E	Prostate and breast carcinoma detection	Convolutional Neural Network based on pixel classifier
Nagpal et al. [94]	Prostate	H&E	Anticipating Gleason indicator	CNN based on sectional Gleason model classifier + k-nearst neighbors (KNN) based on Gleason grade anticipation
Zhao et al. [95]	Breast	H&E	Metastasis Detection + classification	Characteristic pyramid collecting based on the fully convolutional network (FCN) system with the synergistic training technique
Xing et al. [96]	different tissue	H&E, Immunohistochemistry (IHC)	Segmentation of nuclei	CNN + selection based on sparse form Pattern
Gu et al. [97]	Breast	H&E	Tumor detection	U-Net based on multiple resolution model with multiple encoders and a singular decoder system
Tellez et al. [98]	Breast	H&E	Detection of Mitosis	Train of Convolutional Network applying H&E registered to PHH3 slides as a reference
Wei et al. [99]	Lung	H&E	Histological subtypes of lung gland classifier	ResNet-18 on the basis of patch classification
Song et al. [100]	Cervix	Papanicolaou (Pap), H&E	Cells Segmentation	Multiple level CNN system
Agarwalla et al. [101]	Breast	H&E	Segmentation of tumor	CNN and 2D- Long short-term memory (LSTM) to representing training and context collecting
Ding et al. [102]	Colon	H&E	Glands segmentation	Multiple level FCN network with a high-resolution section to avoid the lost in highest pooling layers
Bejnordi et al. [103]	Breast	H&E	Invasive Carcinoma detection	Multiple level CNN which first determines tumor-associated stromal modifications and more categorize into normal/benign versus invasive carcinoma
Seth et al. [104]	Breast	H&E	Ductal carcinoma in-situ (DCIS) segmentation	Compared UNets learned in many resolutions
Unsupervised Learning
Xu et al. [105]	Breast	H&E	Segmentation of nuclei	Stacked sparse autoencoders
Bulten and Litjens [106]	Prostate	H&E	Tumor classification	Convolutional adversarial Autoencoders
Hou et al. [107]	Breast	H&E	Segmentation and detection of nuclei	Sparse autoencoder
Sari and Gunduz-Demir [108]	Colon	H&E	Feature extraction and classification	Restricted Boltzmann + clustering
Gadermayr et al. [109]	Kidney	Stain agnostic	Object of interest segmentation in WSIs	CycleGAN + UNet segmentation
Gadermayr et al. [110]	Kidney	Periodic acid–Schiff (PAS), H&E	Glomeruli segmentation	CycleGAN

5. Datasets

The size of the datasets given to researchers for training and testing their methods has dramatically increased in the latest challenges. There is a set of public databases in the electronic pathology subject that include manual annotations for HI, as listed in Table 2 and Table 3 [108]. They might help the examination objectively. Slide issue (stain) and image issue (image resolution, zoom level) are similar. However, all these databases are targeted to specific diseases. These databases do not handle several tasks. Additionally, there are many high scale HI datasets, which include WSIs of high resolutions.

TCGA [33] includes around 10,000 images from different types of cancer. Genotype-Tissue Expression (GTE) [109] includes around 20,000 WSIs from different tissues. The Stanford Tissue Microarray Database (TMAD) is available for researchers to access images of microarrays for tissue. It provides images of archiving 349 distinguished probes on 1488 microarray slides of tissue [110]. The CAMELYON dataset is a collection of WSI tissues for the sentinel lymph node. It contains CAMELYON16 and CAMELYON17 challenges that include 399 WSI and 1000 WSI, respectively. The data are currently accessed via registration on the CAMELYON17 website [111]. The Breast Cancer Histopathological Image (BreakHis) contains 9109 macroscopic images for the tissue of the breast tumor obtained from 82 patients in various magnifying factors (40X, 100X, 200X). Up to now, it includes samples of 2480 benign and 5429 malignant WSIs [112].

Table 2. Some common downloadable WSI databases.

Datasets	No Slides	Staining	Diseases
TCGA [33,116]	18,462	H&E	Cancer
GTE [112]	25,380	H&E	Normal
TMAD [113,117]	3726	H&E/IHC	various tissue
TUPAC16 [118]	821 from TCGA	H&E	Breast cancer
Camelyon17 [114]	1000	H&E	Breast cancer (lymph node metastasis)
Kӧbel et al. [119,120]	80	H&E	Ovarian carcinoma
KIMIA Path24 [121]	24	H&E/IHC	various tissue

Table 3. Some publicly available hand-annotated histopathological images.

Datasets	No of Images	Staining	Organs	Potential Usage
KIMIA960 [122,123]	960	H&E/IHC	Different tissue	Classification
Bio-segmentation[124,125]	58	H&E	Breast	Classification
Bioimaging challenge 2015 [126]	269	H&E	Breast	Classification
GlaS [127]	165	H&E	Colorectal	Gland segmentation
BreakHis [115]	7909	H&E	Breast	Classification
Jakob Nikolas et al. [123,128]	100	IHC	Colorectal	Detection of blood vessel
MITOS-ATYPIA-14 [129]	4240	H&E	Breast	Detection of mitosis, classification
Kumar et al. [122,130]	30	H&E	Different cancer	Segmentation of Nuclear
MITOS [20]	100	H&E	Breast	Detection of mitosis
Janowczyk et al. [131,132]	374	H&E	Lymphoma	classification
Janowczyk et al. [131,132]	85	H&E	Colorectal	Segmentation of gland
Ma et al. [133]	81	IHC	Breast	TIL analysis
Linder et al. [134,135]	1377	IHC	Colorectal	Segmentation of epithelium and stroma

This entry is adapted from the peer-reviewed paper 10.3390/math8111863

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.