Colon Cancer Diagnosis: History
Please note this is an old version of this entry, which may differ significantly from the current revision.
Contributor: , , , , ,

Researchers presents a comprehensive survey on the diagnosis of colon cancer. This covers many aspects related to colon cancer, such as its symptoms and grades as well as the available imaging modalities (particularly, histopathology images used for analysis) in addition to common diagnosis systems. Furthermore, the most widely used datasets and performance evaluation metrics are discussed. Researchers provide a comprehensive review of the current studies on colon cancer, classified into deep-learning (DL) and machine-learning (ML) techniques, and researchers identify their main strengths and limitations. These techniques provide extensive support for identifying the early stages of cancer that lead to early treatment of the disease and produce a lower mortality rate compared with the rate produced after symptoms develop. In addition, these methods can help to prevent colorectal cancer from progressing through the removal of pre-malignant polyps, which can be achieved using screening tests to make the disease easier to diagnose. Finally, the existing challenges and future research directions that open the way for future work in this field are presented.

  • colon cancer diagnosis
  • imaging modalities
  • deep-learning techniques

1. Introduction

Colon cancer is a specific kind of tumor that originates in the colon or the rectum, existing in the digestive system at the lower portion of [1]. The colon forms the main part of the large intestine, and the rectum exists at the end of the colon [2]. Colon cancer is considered to be one of the leading causes of death in the industrialized and Western world, and its incidence grown [3]. In 2012, about 1.4 million people were diagnosed with this disease. In 2017, there were almost 50,260 deaths reported [4]. The main reasons for incidence stem from unhealthy habits, including chain-smoking and eating high amounts of red meat and little fruit in addition to a family history of disease and increasing age [5].
There are four main grades of colon cancer as shown in Figure 1 [6]. The first stage is defined as the mucosa or lining of the colon or rectum, while the organ wall has not yet developed tumors. In the second stage, the walls of the rectum or colon begin to develop tumors; however, nearby tissues or lymph nodes are not yet affected [7].
Figure 1. The different stages of colon cancer.
The third stage is reached when the tumor has spread only to the lymph tissues but has not yet spread to any other body part. In the fourth stage, the tumor spreads to other organs, such as the lungs [8]. The prevalence in stage four has different symptoms, depending on the organ to which the tumor has spread as shown in Table 1 [9].
Table 1. Comparison between symptoms of tumor spread across various organs at the fourth stage.
No. Spread to Various Organs Symptoms
1 Liver
  • Pain on the right side of the abdomen.
  • Constant feeling of illness and fatigue.
  • Loss of weight and appetite.
  • Abdominal bulge due to fluid assembly.
  • Itching disorders of the skin.
2 Lung
  • Constant cough.
  • Shortness of breath.
  • Duplicate infections in the lungs.
  • Bloody cough.
  • Fluid assembly around the lung.
3 Bone
  • Pain in injured bones.
  • Bone weakness and increased risk of fracture.
4 Lymph nodes
  • Swollen lymph node.
Although colorectal cancer does not have apparent symptoms, particularly in its early stages [10], there are unusual symptoms, such as abdominal pain, constipation, excess gas, diarrhea, and changes in the color and shape of stool (e.g., narrow stool, abdominal cramps, and blood in the stool) [11]. According to ACS, the most common reason for colon cancer stems from adenocarcinoma disorders, accounting for almost 96% of all stages of this type of cancer [12].
Colorectal cancers can also arise from other tissues that have tumors, such as carcinomas that first arise in the hormone-producing polyps of the intestines [13] and lymphomas that may first form in the colon; however, this is less common. These sarcomas start in small tissues, such as gastrointestinal stromal tumors that start as normal tumors and later become cancerous (these at a few times begin in the colon but almost start in the digestive tract) [14].
Not all types of tumors are malignant. There is a non-spreadable or benign type that is not fatal or destructive as the spreadable type is. The difference of biological tumor structures presents great challenges for automatic and manual analysis of histopathological images (HIs) [15]. A manual examination of the cancer level/grade relies on the pathologist’s visual assessment, which is subjective, time-consuming, and potentially error-prone [16]. An incorrect or late diagnosis can cause anxiety for many patients. Therefore, Medical Image Analysis (MIA) is required to process and analyze HIs automatically. Such an MIA system can be used to classify colon cancer and present an objective, and accurate assessment of various grades of this cancer [17].
A diagnosis of colon cancer can be implemented automatically with the power of AI, leading to more types of diagnosis with less cost and in less time. AI-based diagnosis methods can be categorized into ML techniques and DL techniques. Recent advances in digital image processing (DIP) techniques and DL play an essential role in the diagnostic process [18]. Researchers show a comprehensive survey on different ML and DL techniques proposed for identifying the different stages of colon cancer. This can be accomplished using different imaging modalities. However, researchers focus on histopathological imaging, which is considered the best modality used to examine, classify, locate and provide a comprehensive view of the different cancer stages.

2. Colon Cancer Diagnosis

Before going in depth and reviewing the current work on colon cancer diagnosis, many aspects related to the diagnosis process should be taken into consideration, such as the image modality used, type of diagnosis system, the dataset used, and the metrics used for evaluation. Therefore, in the following subsections, researchers discuss these aspects.

2.1. Imaging Modalities

As mentioned before, the main goal is the automatic diagnosis of colon cancer with high detection accuracy and without manual intervention. Researchers take an in-depth look at the different imaging modalities recently applied for MIA, including Computed Tomography (CT), Endorectal Ultrasound (ERUS), virtual Computed Tomography Colonoscopy (CTC), and Magnetic Resonance Imaging (MRI) [19] in addition to other modalities, such as Histopathological Imaging (HI) and Positron Emission Tomography (PET) [20]

2.2. Common Diagnosis Systems Based on HI Analysis

The main focus is on colon cancer diagnosis based on HI analysis. In general, most of the stages of HI analysis depend mainly on the basic concepts of mathematics. Figure 2 presents the main stages of a typical HI Analysis pipeline [21].
Figure 2. HI analysis pipeline.
In the first stage, 2D/3D arrays of HIs are obtained and passed to a gray-scale or color imaging system. They are then fed to the preprocessing phase, where some operations in linear algebra are applied to array of the image for better image resolution to be able to distinguish structures from others. Then, the segmentation phase separates the background of the objects from the cells by applying mathematical algorithms, such as texture homogeneity, intensity, watershed transformation, and level set transformations.
The next stage is the extraction of features process. Instead of processing each pixel, this stage explores the most significant features from the sliced images for further processing. Therefore, it minimizes the computational complexity of the system. Finally, the diagnostic stage applies clustering or classification algorithms on the features extracted from the input images [22]. To achieve an intensive analysis of HIs, mathematical functions and operations must be applied to all analysis phases, beginning with the prepossession phase and ending with the diagnostic phase [23].

2.3. Datasets

Dramatically increasing the dataset size needed for testing training is a critical challenge [24][25]. There are public datasets in the electronic pathology course, including manual observations for HIs. These are helpful in the review process. Image artifacts (e.g., the zoom level and image resolution) and slide problems (e.g., smudges) have similarity ratios. However, all of these datasets are expected only in specific states of tumors, and there are several tasks that the existing databases do not handle. 
  • CRC Grading Dataset
    The CRC [26] Grading Dataset contains 38 H&E stained histological WSIs with a resolution 4548 × 7548.
  • PanNuke Dataset
    PanNuke [27] includes 200,000 nuclei divided into five main classes to challenge the approaches of classifying and segmenting nuclei in WSIs with a resolution of 224 × 224.
  • The Warwick-QU Dataset
    In this dataset [28] are 16 slides of H&E stained histological WSIs of colon histology; this dataset is being created as category of the GlaS challenge with resolutions of 430 × 575 (14 images) and 520 × 775 (151 images).
  • CoNSeP Dataset
    CoNSeP [29] contains 41 H&E stained image slides with a resolution of 1000 × 1000 pixels at 40× magnification of objective: generally 24,319 annotated nuclei with labeled classes.
    The ETIS-LARIB [30] database contains frames taken from colonoscopy videos, including several examples of polyps. It produces the baseline reality for each frame while displaying a mask due to the polyp region in the image. A sample of this dataset is shown in Figure 3.
  • CRCHistoPhenotypes–Labeled Cell Nuclei Dataset
    This dataset [31] has 100 H&E CRC. For the process of detection, there are 29,756 nuclei; for classification, 22,444 nuclei (miscellaneous, fibroblast, and epithelial); and 7312 unlabeled with a resolution of 500 × 500.
  • Kent Integrated Dataset (KID)
    The KID [32] is responsible for the health and welfare system for the entire population of Medway and Kent. This dataset is rich and unique for researchers seeking health and care on a large scale. This also provides an overview of the patient journey, care, and needs.
  • CVC-ColonDB and CVC-ClinicDB
    Since 2012 [30], this dataset has been the top research leader as it includes many databases that are public and available, and CVC-ColonDB is included, which specializes in colon cancer imaging containing the original images and the ground truth as shown in Figure 4.
  • Colonoscopy Dataset
    The dataset [33] contains 76 videos, containing both WL and NBI. The database contains 40 adenomas with SD resolution of 768 × 576, 21 hyperplastic lesions, and 15 serrated adenomas.
  • Extended CRC Grading Dataset (KID) In this dataset [34] are 300 images that are non-overlapping. These were labeled by expert pathologists as high grade (Grade 3) tumors, low grade (Grade 2) tumors, or normal tissue (Grade 1) with a resolution of 4548 × 7548.
  • ASU-Mayo Clinic
    Currently, there are numerous research programs based on co-funded acceleration, seed research, and team science grants [35]. This means that more than 20–30 cohorts of senior nursing students in their clinical training by Mayo Clinic nursing faculty on the Mayo campus are expected to be completed. Due to this effort and cooperation, the seed grant program has added joint, cutting-edge research collaborations, a host of dual degree opportunities, and others. In 2016 and in the summer of 2010, the relationships of the Mayo Clinic became enterprise-wide, and the ASU Alliance for Health Care was formed.
Figure 3. Original data and associated manual annotation from ETIS-Larib polyp DB. (a) the original image and (b) the annotation.
Figure 4. (ac) The original images. (df) The corresponding ground truth.

3.4. Performance Evaluation Metrics

Metrics of evaluation are utilized to measure the quality of models of machine learning. One can evaluate whether the DL algorithm of training is effective on new data by using these metrics of evaluation. Many different evaluation metrics can be used for testing a model. More accurate results can be found using multiple metrics for evaluating the quality of a trained model because each model performing using a metric of evaluation differs from the same model using another evaluation metric.
The factors of correctly used evaluation metrics are critical as these describe whether the trained model is performing well or not. Researchers show some formulas and an explanation of the evaluation metrics utilized by academic papers.
True Positive (TP) is when a method classifies the correct category correctly, while False Positive (FP) is when a method classifies the correct category incorrectly. On the other hand, True Negative (TN) is when a method classifies the negative category correctly, while False Negative (FN) is when a method classifies the negative category incorrectly. researchers can customize these values in the medical field of cancer detection. An example is that, if the image includes cancerous cells, then the trained model predicts the malignant cells successfully, and thus this case is called TP, while if the trained model predicts that it is not a malignant cell, then this case is called FP.
On the other hand, if the image includes no malignant cells, and the model predicts that the image does not contain cancerous cells, then this case is called TN. If the image includes no malignant cells, and the trained model predicts it as a malignant cell, then this case is called FN. In the next section, researchers present an explanation and description for formulas that are related to the common evaluation metrics.
  • The Accuracy measures the proportion of true observations to the number of samples measured, which can be calculated as:
    Accuracy = TP + TN TP + TN + FN + FP
  • The Rate of Error shows the proportion of inaccurate observations to the number of measured samples, which can be calculated as:

Error Rate = FP + FN TP + TN + FN + FP

  • The Precision measures the true classified positive estimates of the total classified estimates in a correct category, which can be calculated as:
    Precision = TP TP + FP
  • The Recall is employed for measuring the ratio of correct estimates that are correctly predicted. This can be calculated as:
    Recall = TP TP + FN
  • The Specificity is presented for measuring the positive observations rate of false samples and can be calculated as:
    Specificity = TN TN + FN
  • The Sensitivity measures the number of correct samples that are classified as true and can be calculated as:
    Sensitivity = TN TN + FN
  • The ROC curve presents the ratio of false positives to the ratio of TPs by showing the performances of the possible threshold values used and can be calculated as:
    TPR = TP TP + FN FPR = FP FP + TN

This entry is adapted from the peer-reviewed paper 10.3390/s22239250


  1. Allison, J.E. Colorectal cancer screening guidelines: The importance of evidence and transparency. Gastroenterology 2010, 138, 1648–1652.
  2. An, F.P.; Liu, J.E. Medical Image Segmentation Algorithm Based on Optimized Convolutional Neural Network-Adaptive Dropout Depth Calculation. Complexity 2020, 2020, 1645479.
  3. Araghi, M.; Soerjomataram, I.; Jenkins, M.; Brierley, J.; Morris, E.; Bray, F.; Arnold, M. Global trends in colorectal cancer mortality: Projections to the year 2035. Int. J. Cancer 2019, 144, 2992–3000.
  4. Barish, M.A.; Soto, J.A.; Ferrucci, J.T. Consensus on current clinical practice of virtual colonoscopy. Am. J. Roentgenol. 2005, 184, 786–792.
  5. Thun, M.J.; Calle, E.E.; Namboodiri, M.M.; Flanders, W.D.; Coates, R.J.; Byers, T.; Boffetta, P.; Garfinkel, L.; Heath, C.W., Jr. Risk factors for fatal colon cancer in a large prospective study. JNCI J. Natl. Cancer Inst. 1992, 84, 1491–1500.
  6. Rathore, S.; Hussain, M.; Ali, A.; Khan, A. A recent survey on colon cancer detection techniques. IEEE/ACM Trans. Comput. Biol. Bioinform. 2013, 10, 545–563.
  7. Baxter, N.N.; Goldwasser, M.A.; Paszat, L.F.; Saskin, R.; Urbach, D.R.; Rabeneck, L. Association of colonoscopy and death from colorectal cancer. Ann. Intern. Med. 2009, 150, 1–8.
  8. Bera, K.; Schalper, K.A.; Rimm, D.L.; Velcheti, V.; Madabhushi, A. Artificial intelligence in digital pathology—New tools for diagnosis and precision oncology. Nat. Rev. Clin. Oncol. 2019, 16, 703–715.
  9. Kitayama, J.; Nagawa, H.; Tsuno, N.; Osada, T.; Hatano, K.; Sunami, E.; Saito, H.; Muto, T. Laminin mediates tethering and spreading of colon cancer cells in physiological shear flow. Br. J. Cancer 1999, 80, 1927–1934.
  10. Burdan, F.; Sudol-Szopinska, I.; Staroslawska, E.; Kolodziejczak, M.; Klepacz, R.; Mocarska, A.; Caban, M.; Zelazowska-Cieslinska, I.; Szumilo, J. Magnetic resonance imaging and endorectal ultrasound for diagnosis of rectal lesions. Eur. J. Med. Res. 2015, 20, 1–14.
  11. Bychkov, D.; Linder, N.; Turkki, R.; Nordling, S.; Kovanen, P.E.; Verrill, C.; Walliander, M.; Lundin, M.; Haglund, C.; Lundin, J. Deep learning based tissue analysis predicts outcome in colorectal cancer. Sci. Rep. 2018, 8, 3395.
  12. Levin, B.; Lieberman, D.A.; McFarland, B.; Andrews, K.S.; Brooks, D.; Bond, J.; Dash, C.; Giardiello, F.M.; Glick, S.; Johnson, D.; et al. Screening and surveillance for the early detection of colorectal cancer and adenomatous polyps, 2008: A joint guideline from the American Cancer Society, the US Multi-Society Task Force on Colorectal Cancer, and the American College of Radiology. Gastroenterology 2008, 134, 1570–1595.
  13. Chaddad, A.; Tanougast, C.; Dandache, A.; Al Houseini, A.; Bouridane, A. Improving of colon cancer cells detection based on Haralick’s features on segmented histopathological images. In Proceedings of the 2011 IEEE International Conference on Computer Applications and Industrial Electronics (ICCAIE), Penang, Malaysia, 21–22 May 2011; pp. 87–90.
  14. Hur, C.; Chung, D.C.; Schoen, R.E.; Gazelle, G.S. The management of small polyps found by virtual colonoscopy: Results of a decision analysis. Clin. Gastroenterol. Hepatol. 2007, 5, 237–244.
  15. Gunduz-Demir, C.; Kandemir, M.; Tosun, A.B.; Sokmensuer, C. Automatic segmentation of colon glands using object-graphs. Med. Image Anal. 2010, 14, 1–12.
  16. Wang, D.; Foran, D.J.; Ren, J.; Zhong, H.; Kim, I.Y.; Qi, X. Exploring automatic prostate histopathology image gleason grading via local structure modeling. In Proceedings of the 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Milano, Italy, 25–29 August 2015; pp. 2649–2652.
  17. Dal Molin, M.; Matthaei, H.; Wu, J.; Blackford, A.; Debeljak, M.; Rezaee, N.; Wolfgang, C.L.; Butturini, G.; Salvia, R.; Bassi, C.; et al. Clinicopathological correlates of activating GNAS mutations in intraductal papillary mucinous neoplasm (IPMN) of the pancreas. Ann. Surg. Oncol. 2013, 20, 3802–3808.
  18. Masud, M.; Sikder, N.; Nahid, A.A.; Bairagi, A.K.; AlZain, M.A. A machine learning approach to diagnosing lung and colon cancer using a deep learning-based classification framework. Sensors 2021, 21, 748.
  19. Elazab, N.; Soliman, H.; El-Sappagh, S.; Islam, S.; Elmogy, M. Objective Diagnosis for Histopathological Images Based on Machine Learning Techniques: Classical Approaches and New Trends. Mathematics 2020, 8, 1863.
  20. Bar-Shalom, R.; Valdivia, A.Y.; Blaufox, M.D. PET imaging in oncology. Semin. Nucl. Med. 2000, 30, 150–185.
  21. Moroz, M.A.; Kochetkov, T.; Cai, S.; Wu, J.; Shamis, M.; Nair, J.; De Stanchina, E.; Serganova, I.; Schwartz, G.K.; Banerjee, D.; et al. Imaging colon cancer response following treatment with AZD1152: A preclinical analysis of fluoro-2-deoxyglucose and fluorothymidine imaging. Clin. Cancer Res. 2011, 17, 1099–1110.
  22. Kalkan, H.; Nap, M.; Duin, R.P.; Loog, M. Automated classification of local patches in colon histopathology. In Proceedings of the Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012); Tsukuba, Japan, 11–15 November 2012, pp. 61–64.
  23. Greenspan, H.; Van Ginneken, B.; Summers, R.M. Guest editorial deep learning in medical imaging: Overview and future promise of an exciting new technique. IEEE Trans. Med. Imaging 2016, 35, 1153–1159.
  24. Burt, R.W. Strategies for colon cancer screening with considerations of cost and access to care. J. Natl. Compr. Cancer Netw. 2010, 8, 2–5.
  25. Fakoor, R.; Ladhak, F.; Nazi, A.; Huber, M. Using deep learning to enhance cancer diagnosis and classification. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 15–20 July 2013; Volume 28.
  26. Awan, R.; Sirinukunwattana, K.; Epstein, D.; Jefferyes, S.; Qidwai, U.; Aftab, Z.; Mujeeb, I.; Snead, D.; Rajpoot, N. Glandular morphometrics for objective grading of colorectal adenocarcinoma histology images. Sci. Rep. 2017, 7, 1–12.
  27. Gamper, J.; Koohbanani, N.A.; Benes, K.; Graham, S.; Jahanifar, M.; Khurram, S.A.; Azam, A.; Hewitt, K.; Rajpoot, N. Pannuke dataset extension, insights and baselines. arXiv 2020, arXiv:2003.10778.
  28. Sirinukunwattana, K.; Snead, D.R.; Rajpoot, N.M. A stochastic polygons model for glandular structures in colon histology images. IEEE Trans. Med. Imaging 2015, 34, 2366–2378.
  29. Graham, S.; Vu, Q.D.; Raza, S.E.A.; Azam, A.; Tsang, Y.W.; Kwak, J.T.; Rajpoot, N. Hover-net: Simultaneous segmentation and classification of nuclei in multi-tissue histology images. Med. Image Anal. 2019, 58, 101563.
  30. Jha, D.; Smedsrud, P.H.; Riegler, M.A.; Halvorsen, P.; de Lange, T.; Johansen, D.; Johansen, H.D. Kvasir-seg: A segmented polyp dataset. In Proceedings of the International Conference on Multimedia Modeling, Daejeon, Republic of Korea, 5–8 January 2020; pp. 451–462.
  31. Sirinukunwattana, K.; Raza, S.E.A.; Tsang, Y.W.; Snead, D.R.; Cree, I.A.; Rajpoot, N.M. Locality sensitive deep learning for detection and classification of nuclei in routine colon cancer histology images. IEEE Trans. Med. Imaging 2016, 35, 1196–1206.
  32. Lewer, D.; Bourne, T.; George, A.; Abi-Aad, G.; Taylor, C.; George, J. Data Resource: The Kent Integrated Dataset (KID). Int. J. Popul. Data Sci. 2018, 3, 427.
  33. Mesejo, P.; Pizarro, D.; Abergel, A.; Rouquette, O.; Beorchia, S.; Poincloux, L.; Bartoli, A. Computer-aided classification of gastrointestinal lesions in regular colonoscopy. IEEE Trans. Med. Imaging 2016, 35, 2051–2063.
  34. Shaban, M.; Awan, R.; Fraz, M.M.; Azam, A.; Tsang, Y.W.; Snead, D.; Rajpoot, N.M. Context-aware convolutional neural network for grading of colorectal cancer histology images. IEEE Trans. Med. Imaging 2020, 39, 2395–2405.
  35. Pogorelov, K.; Randel, K.R.; Griwodz, C.; Eskeland, S.L.; de Lange, T.; Johansen, D.; Spampinato, C.; Dang-Nguyen, D.T.; Lux, M.; Schmidt, P.T.; et al. Kvasir: A multi-class image dataset for computer aided gastrointestinal disease detection. In Proceedings of the eighth ACM on Multimedia Systems Conference, Taipei, Taiwan, 20–23 June 2017; pp. 164–169.
This entry is offline, you can click here to edit this entry!