Deep Learning on Images for Colorectal Cancer Diagnosis

Deep Learning on Images for Colorectal Cancer Diagnosis: Comparison

Please note this is a comparison between Version 2 by Lindsay Dong and Version 1 by Athena Davri.

Colorectal cancer (CRC) is the second most common cancer in women and the third most common in men, with an increasing incidence. Pathology diagnosis complemented with prognostic and predictive biomarker information is the first step for personalized treatment. Artificial Intelligence (AI) has made significant progress in the medical field, showing potential for clinical applications.

colorectal cancer
CRC
microscopy images
deep learning
DL
convolutional neural networks
CNN
artificial intelligence

1. Introduction

Colorectal cancer (CRC) is one of the most common types of gastrointestinal cancer, the second most common cancer in women and the third in men ^[1]. Despite existing variations, such as geographical distribution, age and gender differences, the CRC incidence, overall, is estimated to increase by 80% in the year 2035, worldwide ^[2]. This rising incidence of CRC is mainly due to changes in lifestyle, particularly dietary patterns ^[3]. Most CRCs are sporadic (70–80%), while approximately one third have a hereditary component ^[4]. Within the term CRC, a wide range of carcinoma subtypes is included, characterized by different morphological features and molecular alterations.

The cornerstone of CRC diagnosis is the pathologic examination (biopsy or surgical excision) ^[5]. With the advent of screening methods, many precursor lesions are also detected and biopsied. Consequently, a wide range of pre-malignant lesions have been identified, and occasionally, a differential diagnosis between pre-malignant and malignant lesions is quite challenging ^[6]. The histopathological examination of the tissue remains the “gold standard” for diagnosis, with the first step being the optimal preparation of the histological section, stained with Hematoxylin and Eosin (H&E) ^[7]. Further examination with special in situ methods, such as immunohistochemistry (IHC) and in situ hybridization (ISH), and other molecular techniques follows ^[8]. There are published guidelines for pre-analytical, analytical and post-analytical procedures in a pathology laboratory ^[9]. As expected, due to the high incidence of CRC, the diagnostic load in a routine pathology laboratory is very heavy and the introduction of an ever-growing list of morpho-molecular features to be examined and noted has made the diagnosis a time-consuming process ^[10]. All these factors, in combination with the shortage of pathologists worldwide, have led to delays in diagnosis, with consequences to the optimal healthcare of the patient.

It has been shown that pathologists make a diagnosis based mainly on image-based pattern recognition [6]. With this strategy, architectural and cellular characteristics conform to already known features of a disease [11]. In several instances, an accurate diagnosis or estimation of prognostic and predictive factors is subject to personal interpretations, leading to inter- and intra-observer variability [12,13]. In a continuous effort to improve the accuracy of the pathology diagnosis, combined with the timely delivery of all vital information for optimal patient treatment, the new and breakthrough technologies can be of great value.

It has been shown that pathologists make a diagnosis based mainly on image-based pattern recognition ^[6]. With this strategy, architectural and cellular characteristics conform to already known features of a disease ^[11]. In several instances, an accurate diagnosis or estimation of prognostic and predictive factors is subject to personal interpretations, leading to inter- and intra-observer variability ^[12][13]. In a continuous effort to improve the accuracy of the pathology diagnosis, combined with the timely delivery of all vital information for optimal patient treatment, the new and breakthrough technologies can be of great value.

The recent World Health Organization (WHO) classification for malignant epithelial tumors of the colorectum includes four main categories: adenocarcinoma (ADC) not otherwise specified (NOS), neuroendocrine tumor NOS, neuroendocrine carcinoma NOS and mixed neuroendocrine-non-neuroendocrine neoplasm (MiMEN) [16]. Of these, colorectal ADC is the most common (90%) and, by definition, it shows glandular and mucinous differentiation. Colorectal ADC has several histopathological subtypes, with specific morphologic, clinical, and molecular characteristics, i.e., serrated ADC, adenoma-like ADC, micropapillary ADC, mucinous ADC, poorly cohesive carcinoma, signet-ring cell carcinoma, medullary ADC, adenosquamous carcinoma, carcinoma undifferentiated NOS and carcinoma with sarcomatoid component. The diagnosis of CRC is only the first step for a complete pathology report. According to best-practice guidelines, the specific histologic subtype, the histologic grade, the TNM staging system, the lymphovascular and perineural invasion, and the tumor budding should be reported [9,16]. Histopathology image generations start with the standard procedure of tissue preparation. Biopsy or surgical specimens (representative sections) are formalin-fixed and paraffin-embedded. Then, the 4μm tissue sections are prepared and stained with H&E dye [22]. The images are extracted after a scanning procedure. Several scanning systems can be used to digitize the whole slide [23], such as the Hamamatsu NanoZoomer series, the Omnyx scanner, the Zeiss scanners, the Pannoramic 250 Flash II, and the Leica Biosystems Aperio systems [24]. Most of the above scanners provide two optical magnifications, 20× and 40×, however, the user can also digitally undersample the image in different magnifications. A scanner needs several minutes for the scanning of the whole slide, while most of the system can deal with tens or hundreds of slides that are scanned automatically one-by-one. According to the digitalization, each pixel of a Whole Slide Image (WSI) corresponds to a physical area of several decades nm

The recent World Health Organization (WHO) classification for malignant epithelial tumors of the colorectum includes four main categories: adenocarcinoma (ADC) not otherwise specified (NOS), neuroendocrine tumor NOS, neuroendocrine carcinoma NOS and mixed neuroendocrine-non-neuroendocrine neoplasm (MiMEN) ^[14]. Of these, colorectal ADC is the most common (90%) and, by definition, it shows glandular and mucinous differentiation. Colorectal ADC has several histopathological subtypes, with specific morphologic, clinical, and molecular characteristics, i.e., serrated ADC, adenoma-like ADC, micropapillary ADC, mucinous ADC, poorly cohesive carcinoma, signet-ring cell carcinoma, medullary ADC, adenosquamous carcinoma, carcinoma undifferentiated NOS and carcinoma with sarcomatoid component. The diagnosis of CRC is only the first step for a complete pathology report. According to best-practice guidelines, the specific histologic subtype, the histologic grade, the TNM staging system, the lymphovascular and perineural invasion, and the tumor budding should be reported ^[9][14]. Histopathology image generations start with the standard procedure of tissue preparation. Biopsy or surgical specimens (representative sections) are formalin-fixed and paraffin-embedded. Then, the 4μm tissue sections are prepared and stained with H&E dye ^[15]. The images are extracted after a scanning procedure. Several scanning systems can be used to digitize the whole slide ^[16], such as the Hamamatsu NanoZoomer series, the Omnyx scanner, the Zeiss scanners, the Pannoramic 250 Flash II, and the Leica Biosystems Aperio systems ^[17]. Most of the above scanners provide two optical magnifications, 20× and 40×, however, the user can also digitally undersample the image in different magnifications. A scanner needs several minutes for the scanning of the whole slide, while most of the system can deal with tens or hundreds of slides that are scanned automatically one-by-one. According to the digitalization, each pixel of a Whole Slide Image (WSI) corresponds to a physical area of several decades nm

². For example, in the 40× magnification mode, a Hamamatsu NanoZoomer scanner extracts an image, where the size of each pixel edge corresponds to 227 nm [25]. The latter image digitization provides an appropriate resolution for most of the histological findings, which presents a physical size of microns [26]. In most of the cases, the extracting images are storage either in a compressed JPEG-based format or an uncompressed TIFF format.

. For example, in the 40× magnification mode, a Hamamatsu NanoZoomer scanner extracts an image, where the size of each pixel edge corresponds to 227 nm ^[18]. The latter image digitization provides an appropriate resolution for most of the histological findings, which presents a physical size of microns ^[19]. In most of the cases, the extracting images are storage either in a compressed JPEG-based format or an uncompressed TIFF format.

Figure 1

presents the resolution of a WSI, scanned by a Hamamatsu NanoΖoomer 210.

/media/item_content/202204/625901a66069fdiagnostics-12-00837-g001.png

Figure 1.

Image generation using a Hamamatsu NanoZoomer whole slide scanner: (

) histological slide 75 mm × 25 mm, (

) Whole Slide Image (WSI), (

) cell level in 40× magnification, (

d) pixel level in 40× magnification digitizing images 227 nm per pixel.

Machine learning is a branch of AI which is based on the concept that machines could have access to data and be able to learn on their own. AI has a broader scope and involves machines that are capable of carrying out tasks requiring intelligence. Machine learning techniques focus on the creation of intelligent software using statistical learning methods and require access to data for the learning procedure [27]. A branch of machine learning, which has drawn a lot of attention over the last few years, is DL. DL involves training artificial neural networks (ANNs) with multiple layers of artificial neurons (nodes). Neural networks are inspired from the human physiology of the brain, comprising a simplified artificial model of the human neural network. An ANN is a collection of connected artificial neurons. The simplest ANN architecture is the single layer feed forward neural network. In these types of networks, the information moves in one direction only, from the inputs’ nodes to the hidden layer nodes and then to the output nodes. The success and wide acceptance of ANNs relies on their capability to solve complex mathematical problems, nonlinear or stochastic, by using very simple computational operations. In contrast to a conventional algorithm, which needs complex mathematical and algorithmic operations and could only apply to one problem, an ANN is computationally and algorithmically very simple and its structure allows it to be applied in a wide range of problems [28].

2. Deep Learning on Images for Colorectal Cancer Diagnosis

A pathology diagnosis focuses on the macroscopic and microscopic examination of human tissues, with the light microscope being the valuable tool for almost two centuries [11]. A meticulous microscopic examination of tissue biopsies is the cornerstone of diagnosis and is a time-consuming procedure. An accurate diagnosis is only the first step for patient treatment. It needs to be complimented with information about grade, stage, and other prognostic and predictive factors [4]. Pathologists’ interpretations of tissue lesions become data, guiding decisions for patients’ management. A meaningful interpretation is the ultimate challenge. In certain fields, inter- and intra-observer variability are not uncommon [12,13]. In such cases, the interpretation of the visual image can be assisted by objective outputs. Many data have been published over the last 5 years exploring the possibility of moving on to computer-aided diagnosis and the measurement of prognostic and predictive markers for optimal personalized medicine [117,118]. Furthermore, the implementation of AI is now on the horizon. In the last 5 years, extensive research has been conducted to implement AI-based models for the diagnosis of multiple cancer types and, in particular, CRC [14,15,119]. The important aspects in a CRC diagnosis, such as histological type, grade, stromal reaction, immunohistochemical and molecular features have been addressed using breakthrough technologies. The traditional pathology methods are accompanied by great advantages [120]. The analytical procedures in pathology laboratories are cost-effective and, during recent years, have become automated, eliminating the time and errors of procedures, while maintaining high levels of sensitivity and specificity of techniques, such as IHC [119]. Despite the widespread availability, challenges and limitations of traditional pathology methods remain, such as the differences between laboratories’ protocols and techniques, as well as the subjective interpretation between pathologists, resulting in inconsistency in diagnoses [12,13]. Novel imaging systems and WSI scanners promise to upgrade traditional pathology, preserving the code and ethics of practice [119]. The potential of DL algorithms is expanding all over the fields in histopathology. In clinical practice, such algorithms could provide valuable information about the tumor microenvironment quantitative analysis of histological features [76]. Better patient stratification for targeted therapies could be approached by DL-based models predicting mutations, such as MSI status [77,78,107]. More than ever, AI could be of great importance for a pathologist in daily clinical practice. AI is consistently supported by extensive research, which is followed by good performance metrics and potential. Several studies have shown that many DL-based models’ predictions did not differ in terms of statistical significance when compared to pathologists’ predictions [45,104]. Thus, DL algorithms could provide valuable results for diagnoses in clinical practice, especially when inconsistencies occur. The available scanned histological images can be reviewed and examined by the collaboration of pathologists simultaneously, from different locations [121,122]. For an efficient fully digital workflow, however, the development of technology infrastructure, including computers, scanners, workstations and medical displays is necessary. The application of DL methods in the diagnosis of CRC over the last 5-years seems to be evolving rapidly, faster than other fields of histopathology. However, it seems that there is an expected gradual evolution, starting from the simple techniques of CNNs, then employing transfer learning to the networks, and finally attempting to develop new architectures, focusing on the requirements of the medical question. Additionally, in the last two years, alternative deep learning techniques such as GANs have started to be used. The contribution of such methods will be significant, since DL requires a sufficient size of the training set to perform well and provide generalization. Large data sets may not always be available from the annotations of pathologists and, therefore, need to be enriched with a simulated training set. It is expected that CNN’s application directly in histopathological images will present a better performance compared to traditional techniques. CNNs are advantageous over traditional image processing techniques due to the training procedure, while they are also more robust than the traditional AI techniques because they automatically extract features from the image.

3. Conclusions

When dealing with human disease, particularly cancer, we need in our armamentarium all available resources, and AI is promising to deliver valuable guidance. Specifically for CRC, it appears that the recent exponentially growing relevant research will soon transform the field of tissue-based diagnosis. Preliminary results demonstrate that AI-based models are further applied in clinical cancer research, including CRC, breast and lung cancer. However, to overcome several limitations, larger numbers of datasets, quality image annotations, as well as external validation cohorts are required to establish the diagnostic accuracy of DL models in clinical practice.

) pixel level in 40× magnification digitizing images 227 nm per pixel.

Machine learning is a branch of AI which is based on the concept that machines could have access to data and be able to learn on their own. AI has a broader scope and involves machines that are capable of carrying out tasks requiring intelligence. Machine learning techniques focus on the creation of intelligent software using statistical learning methods and require access to data for the learning procedure ^[20]. A branch of machine learning, which has drawn a lot of attention over the last few years, is DL. DL involves training artificial neural networks (ANNs) with multiple layers of artificial neurons (nodes). Neural networks are inspired from the human physiology of the brain, comprising a simplified artificial model of the human neural network. An ANN is a collection of connected artificial neurons. The simplest ANN architecture is the single layer feed forward neural network. In these types of networks, the information moves in one direction only, from the inputs’ nodes to the hidden layer nodes and then to the output nodes. The success and wide acceptance of ANNs relies on their capability to solve complex mathematical problems, nonlinear or stochastic, by using very simple computational operations. In contrast to a conventional algorithm, which needs complex mathematical and algorithmic operations and could only apply to one problem, an ANN is computationally and algorithmically very simple and its structure allows it to be applied in a wide range of problems ^[21].

2. Deep Learning on Images for Colorectal Cancer Diagnosis

A pathology diagnosis focuses on the macroscopic and microscopic examination of human tissues, with the light microscope being the valuable tool for almost two centuries ^[11]. A meticulous microscopic examination of tissue biopsies is the cornerstone of diagnosis and is a time-consuming procedure. An accurate diagnosis is only the first step for patient treatment. It needs to be complimented with information about grade, stage, and other prognostic and predictive factors ^[4]. Pathologists’ interpretations of tissue lesions become data, guiding decisions for patients’ management. A meaningful interpretation is the ultimate challenge. In certain fields, inter- and intra-observer variability are not uncommon ^[12][13]. In such cases, the interpretation of the visual image can be assisted by objective outputs. Many data have been published over the last 5 years exploring the possibility of moving on to computer-aided diagnosis and the measurement of prognostic and predictive markers for optimal personalized medicine ^[22][23]. Furthermore, the implementation of AI is now on the horizon. In the last 5 years, extensive research has been conducted to implement AI-based models for the diagnosis of multiple cancer types and, in particular, CRC ^[24][25][26]. The important aspects in a CRC diagnosis, such as histological type, grade, stromal reaction, immunohistochemical and molecular features have been addressed using breakthrough technologies. The traditional pathology methods are accompanied by great advantages ^[27]. The analytical procedures in pathology laboratories are cost-effective and, during recent years, have become automated, eliminating the time and errors of procedures, while maintaining high levels of sensitivity and specificity of techniques, such as IHC ^[26]. Despite the widespread availability, challenges and limitations of traditional pathology methods remain, such as the differences between laboratories’ protocols and techniques, as well as the subjective interpretation between pathologists, resulting in inconsistency in diagnoses ^[12][13]. Novel imaging systems and WSI scanners promise to upgrade traditional pathology, preserving the code and ethics of practice ^[26]. The potential of DL algorithms is expanding all over the fields in histopathology. In clinical practice, such algorithms could provide valuable information about the tumor microenvironment quantitative analysis of histological features ^[28]. Better patient stratification for targeted therapies could be approached by DL-based models predicting mutations, such as MSI status ^[29][30][31]. More than ever, AI could be of great importance for a pathologist in daily clinical practice. AI is consistently supported by extensive research, which is followed by good performance metrics and potential. Several studies have shown that many DL-based models’ predictions did not differ in terms of statistical significance when compared to pathologists’ predictions ^[32][33]. Thus, DL algorithms could provide valuable results for diagnoses in clinical practice, especially when inconsistencies occur. The available scanned histological images can be reviewed and examined by the collaboration of pathologists simultaneously, from different locations ^[34][35]. For an efficient fully digital workflow, however, the development of technology infrastructure, including computers, scanners, workstations and medical displays is necessary. The application of DL methods in the diagnosis of CRC over the last 5-years seems to be evolving rapidly, faster than other fields of histopathology. However, it seems that there is an expected gradual evolution, starting from the simple techniques of CNNs, then employing transfer learning to the networks, and finally attempting to develop new architectures, focusing on the requirements of the medical question. Additionally, in the last two years, alternative deep learning techniques such as GANs have started to be used. The contribution of such methods will be significant, since DL requires a sufficient size of the training set to perform well and provide generalization. Large data sets may not always be available from the annotations of pathologists and, therefore, need to be enriched with a simulated training set. It is expected that CNN’s application directly in histopathological images will present a better performance compared to traditional techniques. CNNs are advantageous over traditional image processing techniques due to the training procedure, while they are also more robust than the traditional AI techniques because they automatically extract features from the image.

3. Conclusions

When dealing with human disease, particularly cancer, AI is promising to deliver valuable guidance. Specifically for CRC, it appears that the recent exponentially growing relevant research will soon transform the field of tissue-based diagnosis. Preliminary results demonstrate that AI-based models are further applied in clinical cancer research, including CRC, breast and lung cancer. However, to overcome several limitations, larger numbers of datasets, quality image annotations, as well as external validation cohorts are required to establish the diagnostic accuracy of DL models in clinical practice.