Two Analytical Approaches of Quantitative Imaging

Two Analytical Approaches of Quantitative Imaging: Comparison

Please note this is a comparison between Version 2 by Jason Zhu and Version 1 by Justin S Chen.

As the most lethal major cancer, pancreatic cancer is a global healthcare challenge. Personalized medicine utilizing cutting-edge multi-omics data holds potential for major breakthroughs in tackling this critical problem. Radiomics and deep learning, two trendy quantitative imaging methods that take advantage of data science and modern medical imaging, have shown increasing promise in advancing the precision management of pancreatic cancer via diagnosing of precursor diseases, early detection, accurate diagnosis, and treatment personalization and optimization. Radiomics employs manually-crafted features, while deep learning applies computer-generated automatic features. These two methods aim to mine hidden information in medical images that is missed by conventional radiology and gain insights by systematically comparing the quantitative image information across different patients in order to characterize unique imaging phenotypes. Both methods have been studied and applied in various pancreatic cancer clinical applications.

Quantitative Imaging
Radiomics
Deep Learning

1. Technical Basis: Radiomics

The term radiomics was coined from other “omics” terms such as genomics and proteomics by combining the words “radiology” and “omics”. Simply put, radiomics is the branch of high-throughput data mining research in radiology that involves extracting an array of hand-crafted quantitative features from medical images.

For pancreatic cancer, typical image modalities include those used in clinical management of pancreatic diseases such as CT, MRI, PET, and endoscopic ultrasound (EUS). Among these, CT is the most widely used imaging modality for pancreatic cancer owing to its high spatial and temporal resolution, benchmarking sensitivity, and specificity in pancreatic diseases, as well as its lower cost and wider availability compared with MRI and PET. A bi-phase or tri-phase pancreatic protocol is usually used with an iodinated contrast agent for CT image acquisition. With its good soft tissue contrast, MRI is increasingly used to complement CT for pancreatic cancer diagnosis and management. MR cholangiopancreatography is used as a non-invasive alternative to endoscopic retrograde cholangiopancreatography. PET commonly uses fluorine 18-fluorodeoxyglucose (FDG), a glucose analogue, to image high-metabolic cancer. Other tracers can be employed in PET to image other biological information. PET has been used for pancreatic cancer diagnosis as well as post-therapy monitoring. Ultrasound is most often used in the EUS setting to visualize the pancreas from the duodenum or stomach and detect small focal lesions, as well as to guide biopsies.

While radiomics studies are most often conducted using one of the above-mentioned imaging modalities, they can combine two or more modalities to provide complementary and more comprehensive information. From the obtained image, one or more ROIs are delineated or segmented to allow subsequent analysis to focus on the ROIs. For pancreatic cancer-based radiomics the ROI is usually the pancreatic tumor, or occasionally metastatic pancreatic cancer lesions. For pancreatic cancer detection, the ROI can be the entire pancreas or sub-regions thought to potentially contain the cancer. Segmentation can be carried out either manually, semi-automatically, or automatically. Automatic segmentation is desirable because it automates a labor-intensive step and is therefore an essential factor for securing the large amount of data required for high-quality quantitative imaging studies. Numerous computer algorithms have been developed for automatic segmentation, from simple thresholding to atlas-based methods and artificial intelligence-based algorithms. Human interactions, such as setting the algorithm’s start point, may be required, making it a semi-automatic process. Automatic and semi-automatic segmentation methods are known to save labor and improve workflow efficiency and interpatient segmentation consistency [7]. However, when compared with most other types of cancers pancreatic tumor and pancreas anatomical segmentation can be quite difficult due to the lack of contrast in boundaries and due to heterogeneity both within the ROIs and in the background. Therefore, automatic/semi-automatic segmentation methods for pancreatic cancer are an active area of research and development, and manual segmentation remains the mainstay in pancreatic cancer radiomics [7]. The only exception is in PET-based studies, where automatic segmentation can be employed based on the thresholding of standardized uptake values (SUVs). It is worth noting that despite the longer time required to delineate pancreatic tumors compared to other cancers such as lung tumors, manual segmentation of pancreatic cancer suffers higher interobserver variabilities as well [8,9]; this can lead to higher segmentation uncertainties which are then propagated through the radiomics workflow. To somewhat mitigate this, studies have used multiple observers to enhance the robustness of manual segmentation; however, there is no standard practice regarding this. Existing studies showed largely varying degrees of attention to segmentation, with several taking into consideration inter- and intra-observer reproducibility, others using a single observer, and others reporting no details about segmentation.

Radiomic features are mathematically defined quantities computed from an image ROI. They can be divided into different categories, such as intensity, shape, texture, and higher-order features [6]. Intensity features are sometimes called first-order statistical features. They are histogram-based quantities such as minimum, mean, median, and maximum. Other examples include skewness, which reports the asymmetry of the intensity distribution, kurtosis, which reports the “tailedness” versus the flatness of the distribution, and others. Shape features help to identify ROI shape and size. In addition to the volume and maximum diameter often used in conventional radiology, shape features include various others used to quantitatively describe the ROI shape, such as sphericity, convexity, and irregularity. Texture features are usually second-order statistical features that can be calculated from various matrices depicting the inter-voxel statistical relationships between neighboring voxels. The most common texture matrices include the gray-level co-occurrence matrix (GLCM) that calculates the incidence of voxels with the same intensities at a specific distance along a specific direction and the gray-level run-length matrix (GLRLM), which calculates the number of consecutive voxels with the same intensity along a specific direction, as well as others. The texture features are especially useful for quantifying tumor heterogeneity, which is often missed or ambiguous in conventional radiology. While the above categories of radiomic features can be calculated from the original image, they can be calculated from the derived image as well after applying image filters or mathematical transformations to the original image. The latter are called higher-order features. The filters used to extract higher-order features are usually those used in typical image processing, serving a particular purpose such as highlighting details and suppressing noise. Common filters used in radiomics include wavelet, Laplacian of Gaussian, and more. With different image filters and filter parameter settings, the number of radiomic features can quickly go from a few dozens or hundreds to several thousand. It is worth noting that unlike intensity and texture features, shape features are invariant with image filters.

For radiomics modeling, the process usually involves feature selection as well as model development and validation. Because radiomics models are developed from a pool of hundreds to thousands of radiomic features (which are redundant) and the number of subjects is usually on the order of a few dozens to a few hundreds, model overfitting can become a major issue without an effective and robust feature selection or dimension reduction process. Overfitting tends to happen when a large number of features are used to model a dataset of limited size in which the model learns more from the noise in the dataset than from the signal, leading to a poor fit with new datasets. Various statistical and machine learning methods are used for radiomics dimension reduction, including minimum redundancy and maximum relevancy (mRMR), mutual information maximization, least absolute shrinkage and selection operator (LASSO), and random forest [5,6]. Through dimension reduction and feature selection a much smaller number of most relevant radiomic features can be selected, usually under a dozen. When a few radiomic features are determined to be the most significant and useful for the prediction in question, they are often called a radiomic signature. Radiomic signatures are used for model development with both simple statistical and more complex machine learning methods. A Cox model is one example of a statistical model that is frequently used, simple, and robust [10]. Other common methods include naïve Bayesian, support vector machine, neural network, random forest, and more [10]. If the predicted outcome is discrete, and especially if it is dichotomic (i.e., positive vs. negative), a classification model is built, whereas if it is continuous, a regression model is used. To develop a more robust and generalizable radiomic model, proper model validation and testing is often necessary. A good way to do this is to divide the dataset into a training dataset for training the model and tuning model parameters and a validation dataset for confirming model validity; one or more external independent datasets can then be used for model testing to further confirm the validity and the generalizability of the developed model. Datasets for pancreatic cancer are often more limited than other types of cancers due to the relative rarity and rapid progression of the disease. Where external datasets or large dataset sizes are lacking, methods such as cross-validation and bootstrapping have been used to maximize data usage and mitigate overfitting [6].

1. Technical Basis: Radiomics

While radiomics studies are most often conducted using one of the above-mentioned imaging modalities, they can combine two or more modalities to provide complementary and more comprehensive information. From the obtained image, one or more ROIs are delineated or segmented to allow subsequent analysis to focus on the ROIs. For pancreatic cancer-based radiomics the ROI is usually the pancreatic tumor, or occasionally metastatic pancreatic cancer lesions. For pancreatic cancer detection, the ROI can be the entire pancreas or sub-regions thought to potentially contain the cancer. Segmentation can be carried out either manually, semi-automatically, or automatically. Automatic segmentation is desirable because it automates a labor-intensive step and is therefore an essential factor for securing the large amount of data required for high-quality quantitative imaging studies. Numerous computer algorithms have been developed for automatic segmentation, from simple thresholding to atlas-based methods and artificial intelligence-based algorithms. Human interactions, such as setting the algorithm’s start point, may be required, making it a semi-automatic process. Automatic and semi-automatic segmentation methods are known to save labor and improve workflow efficiency and interpatient segmentation consistency ^[1]. However, when compared with most other types of cancers pancreatic tumor and pancreas anatomical segmentation can be quite difficult due to the lack of contrast in boundaries and due to heterogeneity both within the ROIs and in the background. Therefore, automatic/semi-automatic segmentation methods for pancreatic cancer are an active area of research and development, and manual segmentation remains the mainstay in pancreatic cancer radiomics ^[1]. The only exception is in PET-based studies, where automatic segmentation can be employed based on the thresholding of standardized uptake values (SUVs). It is worth noting that despite the longer time required to delineate pancreatic tumors compared to other cancers such as lung tumors, manual segmentation of pancreatic cancer suffers higher interobserver variabilities as well ^[2][3]; this can lead to higher segmentation uncertainties which are then propagated through the radiomics workflow. Studies have used multiple observers to enhance the robustness of manual segmentation; however, there is no standard practice. Existing studies showed largely varying degrees of attention to segmentation, with several taking into consideration inter- and intra-observer reproducibility, others using a single observer, and others reporting no details about segmentation.

Radiomic features are mathematically defined quantities computed from an image ROI. They can be divided into different categories, such as intensity, shape, texture, and higher-order features ^[4]. Intensity features are sometimes called first-order statistical features. They are histogram-based quantities such as minimum, mean, median, and maximum. Other examples include skewness, which reports the asymmetry of the intensity distribution, kurtosis, which reports the “tailedness” versus the flatness of the distribution, and others. Shape features help to identify ROI shape and size. In addition to the volume and maximum diameter often used in conventional radiology, shape features include various others used to quantitatively describe the ROI shape, such as sphericity, convexity, and irregularity. Texture features are usually second-order statistical features that can be calculated from various matrices depicting the inter-voxel statistical relationships between neighboring voxels. The most common texture matrices include the gray-level co-occurrence matrix (GLCM) that calculates the incidence of voxels with the same intensities at a specific distance along a specific direction and the gray-level run-length matrix (GLRLM), which calculates the number of consecutive voxels with the same intensity along a specific direction, as well as others. The texture features are especially useful for quantifying tumor heterogeneity, which is often missed or ambiguous in conventional radiology. While the above categories of radiomic features can be calculated from the original image, they can be calculated from the derived image as well after applying image filters or mathematical transformations to the original image. The latter are called higher-order features. The filters used to extract higher-order features are usually those used in typical image processing, serving a particular purpose such as highlighting details and suppressing noise. Common filters used in radiomics include wavelet, Laplacian of Gaussian, and more. With different image filters and filter parameter settings, the number of radiomic features can quickly go from a few dozens or hundreds to several thousand. It is worth noting that unlike intensity and texture features, shape features are invariant with image filters.

For radiomics modeling, the process usually involves feature selection as well as model development and validation. Because radiomics models are developed from a pool of hundreds to thousands of radiomic features (which are redundant) and the number of subjects is usually on the order of a few dozens to a few hundreds, model overfitting can become a major issue without an effective and robust feature selection or dimension reduction process. Overfitting tends to happen when a large number of features are used to model a dataset of limited size in which the model learns more from the noise in the dataset than from the signal, leading to a poor fit with new datasets. Various statistical and machine learning methods are used for radiomics dimension reduction, including minimum redundancy and maximum relevancy (mRMR), mutual information maximization, least absolute shrinkage and selection operator (LASSO), and random forest ^[4][5]. Through dimension reduction and feature selection a much smaller number of most relevant radiomic features can be selected, usually under a dozen. When a few radiomic features are determined to be the most significant and useful for the prediction in question, they are often called a radiomic signature. Radiomic signatures are used for model development with both simple statistical and more complex machine learning methods. A Cox model is one example of a statistical model that is frequently used, simple, and robust ^[6]. Other common methods include naïve Bayesian, support vector machine, neural network, random forest, and more ^[6]. If the predicted outcome is discrete, and especially if it is dichotomic (i.e., positive vs. negative), a classification model is built, whereas if it is continuous, a regression model is used. To develop a more robust and generalizable radiomic model, proper model validation and testing is often necessary. A good way to do this is to divide the dataset into a training dataset for training the model and tuning model parameters and a validation dataset for confirming model validity; one or more external independent datasets can then be used for model testing to further confirm the validity and the generalizability of the developed model. Datasets for pancreatic cancer are often more limited than other types of cancers due to the relative rarity and rapid progression of the disease. Where external datasets or large dataset sizes are lacking, methods such as cross-validation and bootstrapping have been used to maximize data usage and mitigate overfitting ^[4].

2. Technical Basis: Deep Learning

Machine learning is an important branch of artificial intelligence and a powerful alternative to statistical methods for data analysis and model building. Unlike statistical modeling, which relies on theories, or conventional computer science, which relies on explicit programming, machine learning involves computers learning from data and performing tasks such as model building without being explicitly programmed to. Conventional computer algorithms often rely on explicitly programmed “if–then” logics, while machine learning learns from data without rules-based programming. Traditional machine learning algorithms use rather simple structures such as linear regression and decision trees ^[6]. In contrast, deep learning, a branch of machine learning that takes this a step further and completely removes the human component, uses more complex algorithms inspired by the structure and function of human brains ^[7]. These algorithms are often called “neural networks” because they mimic the structure and information relay between neurons in a human brain, and different types of neural networks differ in how information flows through individual “neuron” layers ^[7]. In the case of quantitative imaging for personalized pancreatic cancer, neural networks all have an input, the pancreas image, and an output, the image-based prediction. The prediction can be a classification problem, such as whether a lesion is benign or malignant or, what histology a lesion belongs to, or a regression problem, such as predicting the expected survival time of a patient.

The convolutional neural network (CNN) is currently the most commonly used deep learning method for medical imaging-based studies ^[6]. Deep learning algorithms automate the process of feature extraction by inherently learning the important features and applying them to map the input to the output. In a classical artificial neural network (ANN), this is realized by processing the input through hidden layers which learn the weights of different nodes (neurons) and apply appropriate activation functions to yield nonlinearity for learning the complex relationship between the input and output. There are several challenges using ANNs for image-based deep learning problems. Because the 2D or 3D image in ANNs is first converted to a 1D vector, the number of nodes drastically scales up with the image size, making it very computationally expensive. In addition, the spatial information in the input image is lost in this linear conversion. In contrast, CNNs apply filters (called “kernels”, hence the term “convolution” in CNN) to flow the information rather than the 1D conversion used in ANNs. For each layer of kernel convolution, there is a subsequent activation step to introduce nonlinearity into the network and a pooling step to reduce the dimensionality of the feature map while preserving critical feature information. CNNs are therefore well-suited for image-based deep learning because of their ability to capture spatial features from an image and extract relevant features at a low computational cost.

2. Technical Basis: Deep Learning

Machine learning is an important branch of artificial intelligence and a powerful alternative to statistical methods for data analysis and model building. Unlike statistical modeling, which relies on theories, or conventional computer science, which relies on explicit programming, machine learning involves computers learning from data and performing tasks such as model building without being explicitly programmed to. Conventional computer algorithms often rely on explicitly programmed “if–then” logics, while machine learning learns from data without rules-based programming. Traditional machine learning algorithms use rather simple structures such as linear regression and decision trees [10]. In contrast, deep learning, a branch of machine learning that takes this a step further and completely removes the human component, uses more complex algorithms inspired by the structure and function of human brains [11]. These algorithms are often called “neural networks” because they mimic the structure and information relay between neurons in a human brain, and different types of neural networks differ in how information flows through individual “neuron” layers [11]. In the case of quantitative imaging for personalized pancreatic cancer, neural networks all have an input, the pancreas image, and an output, the image-based prediction. The prediction can be a classification problem, such as whether a lesion is benign or malignant or, what histology a lesion belongs to, or a regression problem, such as predicting the expected survival time of a patient. As described in the above section, machine learning or even deep learning approaches can be applied to analyze radiomics data; the key difference between radiomics and deep learning is that deep learning as referred to in this review takes the image as its direct input. In contrast, feature extraction takes place first in radiomics, and the inputs for subsequent data analysis are these handcrafted imaging features.

The convolutional neural network (CNN) is currently the most commonly used deep learning method for medical imaging-based studies [10]. Deep learning algorithms automate the process of feature extraction by inherently learning the important features and applying them to map the input to the output. In a classical artificial neural network (ANN), this is realized by processing the input through hidden layers which learn the weights of different nodes (neurons) and apply appropriate activation functions to yield nonlinearity for learning the complex relationship between the input and output. There are several challenges using ANNs for image-based deep learning problems. Because the 2D or 3D image in ANNs is first converted to a 1D vector, the number of nodes drastically scales up with the image size, making it very computationally expensive. In addition, the spatial information in the input image is lost in this linear conversion. In contrast, CNNs apply filters (called “kernels”, hence the term “convolution” in CNN) to flow the information rather than the 1D conversion used in ANNs. For each layer of kernel convolution, there is a subsequent activation step to introduce nonlinearity into the network and a pooling step to reduce the dimensionality of the feature map while preserving critical feature information. CNNs are therefore well-suited for image-based deep learning because of their ability to capture spatial features from an image and extract relevant features at a low computational cost.

Similar to the radiomics workflow, a good deep learning study will be designed to have a training dataset, validation dataset, and test dataset. An advantage of deep learning over radiomics is that it no longer requires the ROI segmentation step. This saves substantial time and effort requirements, as segmentation is a labor-intensive step of radiomics, and avoids propagation of segmentation uncertainty into the downstream steps. Due to the complex image intensity both within the pancreas and in its background, this is especially important. On the other hand, without using domain-guided ROI or handcrafted features, the dataset size required to train a robust deep learning model is higher than for a radiomics study. In medical studies, the dataset size is usually limited; for pancreatic cancer this is even more of an issue, because rapid disease progression and typically poor outcomes further limit available data. A few different approaches are used in deep learning to mitigate the data size issue. CT, MRI, and PET images of the pancreas are 3D images; however, 2D image slices or even “patches” of 2D slices (subsections of the 2D image) are usually used as the network inputs instead of the 3D images. As each 3D image set can yield dozens to hundreds of 2D images and hundreds to thousands of 2D image patches, the data size increases dramatically, and even the much smaller image or patch drastically reduces the trainable parameters of the network. Currently, 2D CNNs are the method of choice for CNN-based deep learning approaches for pancreatic cancer. On the other hand, this approach may miss the potentially relevant 3D spatial context, which on occasion motivates investigators to employ a 3D or a 2.5D CNN architecture. Another important method for mitigating data size limitations for deep learning is data augmentation, in which operations such as flipping, rotation, translation, and scaling are used to synthesize modified data from the original data in order to increase the training dataset size. Another method is transfer learning, where standard architectures designed based on natural images with pretrained weights, such as ImageNet, are fine-tuned with specific medical images [12,13]. Despite making a certain intuitive sense, methods such as data augmentation and transfer learning may not always improve deep learning model performance.

Similar to the radiomics workflow, a good deep learning study will be designed to have a training dataset, validation dataset, and test dataset. An advantage of deep learning over radiomics is that it no longer requires the ROI segmentation step. This saves substantial time and effort requirements, as segmentation is a labor-intensive step of radiomics, and avoids propagation of segmentation uncertainty into the downstream steps. Due to the complex image intensity both within the pancreas and in its background, this is especially important. On the other hand, without using domain-guided ROI or handcrafted features, the dataset size required to train a robust deep learning model is higher than for a radiomics study. In medical studies, the dataset size is usually limited; for pancreatic cancer this is even more of an issue, because rapid disease progression and typically poor outcomes further limit available data. A few different approaches are used in deep learning to mitigate the data size issue. CT, MRI, and PET images of the pancreas are 3D images; however, 2D image slices or even “patches” of 2D slices (subsections of the 2D image) are usually used as the network inputs instead of the 3D images. As each 3D image set can yield dozens to hundreds of 2D images and hundreds to thousands of 2D image patches, the data size increases dramatically, and even the much smaller image or patch drastically reduces the trainable parameters of the network. Currently, 2D CNNs are the method of choice for CNN-based deep learning approaches for pancreatic cancer. On the other hand, this approach may miss the potentially relevant 3D spatial context, which on occasion motivates investigators to employ a 3D or a 2.5D CNN architecture. Another important method for mitigating data size limitations for deep learning is data augmentation, in which operations such as flipping, rotation, translation, and scaling are used to synthesize modified data from the original data in order to increase the training dataset size. Another method is transfer learning, where standard architectures designed based on natural images with pretrained weights, such as ImageNet, are fine-tuned with specific medical images ^[8][9]. Despite making a certain intuitive sense, methods such as data augmentation and transfer learning may not always improve deep learning model performance.