Microvascular Invasion in Hepatocellular Carcinoma

Microvascular Invasion in Hepatocellular Carcinoma: Comparison

Please note this is a comparison between Version 1 by Qiang Wang and Version 2 by Dean Liu.

Microvascular invasion (MVI) is regarded as a sign of early metastasis in liver cancer and can be only diagnosed by a histopathology exam in the resected specimen. Preoperative prediction of MVI status may exert an effect on patient treatment management, for instance, to expand the resection margin.

radiomics
microvascular invasion
hepatocellular carcinoma
prediction model
systematic review

1. Introduction

Microvascular invasion (MVI) has been recognized as an independent predictor for early recurrence and poor prognosis after liver resection or transplantation in hepatocellular carcinoma (HCC) ^[1][2][1,2]. Its reported incidence ranges from 15% to 57% according to different diagnostic criteria and study population [3]. The diagnosis of MVI, however, is only made by a postoperative histopathology exam on the resected specimen, which exerts little or no influence on the patient treatment management, while with the knowledge of MVI, clinicians can optimize a patient treatment strategy, for example, to expand the resection margin in operation or to adopt an alternative treatment option. To implement personalized medicine, it is of utmost importance to preoperatively identify and stratify patients with MVI. Therefore, a reliable, noninvasive biomarker for preoperative prediction of MVI is urgently needed.

Medical imaging has evolved from a primarily diagnostic tool to an essential role in clinical decision making. Clinically, radiologists use pattern recognition after establishing links between radiological features at CT or MRI images and MVI ^[4][5][4,5], such as arterial peritumoral enhancement, non-smooth tumor margins, and rim arterial enhancement [2]. The Liver Imaging Reporting and Data System (LI-RADS) has recently been developed and has evolved as a comprehensive and standardized diagnostic algorithm for HCC imaging reporting [6]. LI-RADS has been proven to be an effective tool not only for HCC diagnosis but also for outcome prediction after liver resection, radiofrequency ablation, or liver transplantation ^[6][7][8][6,7,8], exerting an increasing influence on the treatment management of HCC. Previous studies have demonstrated the diagnostic value of LI-RADS in the prediction of MVI ^[9][10][9,10]. However, these qualitative features suffer from their subjectivity and high inter-observer variability [11].

Radiomics is an emerging field that can extract high-throughput imaging features from biomedical images and convert them into mineable data for quantitative analysis ^[12][13][12,13]. Its basic assumption lies on that the alterations and heterogeneity of the tumor on the micro scale (e.g., cell or molecular levels) can be reflected in the images [14]. Therefore, through radiomics analysis, the cancerous cell emboli (i.e., MVI) in the hepatic vasculature can be detected in the preoperative images, which holds promise for the preoperative prediction of MVI and personalized treatment. In recent years, a number of radiomics models for MVI prediction have emerged. However, there has not been any research systematically summarizing current radiomics research for MVI prediction, and the overall efficacy of the prediction model is still unknown. In addition, as radiomics research is a sophisticated process and consists of several steps, it is important to evaluate the methodological variability to obtain a reliable and reproducible model before translating it to clinical applications.

2. General Characteristics and the Incidence of MVI

Studies were retrospectively designed and, in total, included 5552 patients with a sample size varying from 69 to 637 patients (median: 174). Most studies (20/22) split the cohort into a training and a test cohort, while only two of them further validated their model using an independent external cohort ^[15][16][25,29]. Nine studies (8/22) focused on solitary HCC, among which five focused on HCC with a diameter of less than 5 cm. The incidence of MVI ranged from 25.3% to 67.5% for an individual entire cohort, and 25.3% to 56.4% for HCC less than 5 cm. Around two thirds (16/22) of the studies explicitly stated their definition of MVI. Table 1 gives more details about the general characteristics of the reviewed studies.

Table 1. Study and patient characteristics.

First Author	Year	Study Design	No. of Patients (Train vs. Test Cohort)	Independent Validation Cohort	Age (Mean/Median)	Gender (M/F, %)	Indication	MVI Incidence
Jian Zheng ^[17][20]	2017	R#	120 (NA)	No	70	73/27	HCC	44%
Jie Peng ^[18][21]	2018	R	304 (184:120)	No	53 vs. 55 ^†	85/15	HCC (solitary)	66%
Xiaohong Ma ^[19][22]	2018	R	157 (110:47)	No	53 vs. 55 ^†	85/15	HCC (≤6 cm, solitary)	35%
ShiTing Feng ^[20][23]	2019	R	160 (110:50)	No	54.8	91/9	HCC	38.8%
Ming Ni ^[21][24]	2019	R	206 (148:58)	No	57 vs. 59 ^†	NA	HCC (>1 cm)	42.7%
Rui Zhang ^[15][25]	2019	R	267 (194:73)	No	57.9	86/14	HCC (solitary)	33.7%
Yong-Jian Zhu ^[22][26]	2019	R	142 (99:43)	No	57	87/13	HCC (<5 cm, solitary)	37.3%
Giacomo Nebbia ^[23][27]	2020	R	99 (NA)	No	51 vs. 54 (MVI vs. non-MVI)	84/16	HCC	61.6%
Qiu-ping Liu ^[24][28]	2020	R	494 (346:148)	No	NA	84/16	HCC	30.2%
Xiuming Zhang ^[16][29]	2020	R	637 (451:111)	Yes (75, external)	57.5 vs. 56.2 vs. 60.7 ^§	86/14	HCC	40%
Yi-quan Jiang ^[25][30]	2020	R	405 (324:81)	No	48.5	85/15	HCC	54.3%
Mu He ^[26][31]	2020	R	163 (101:44)	Yes (18, internal)	50.0 vs. 47.5 vs. 52.0 ^§	82/18	HCC	67.5%
Huan-Huan Chong ^[27][32]	2021	R	356 (250:106)	No	54.2	85/15	HCC (≤5 cm)	25.3%
Yidi Chen ^[28][33]	2021	R	269 (188:81)	No	51.5	81/19	HCC	41.3%
Youcai Li ^[29][34]	2021	R	80 (50:30)	No	NA	91/9	HCC (BCLC 0/A)	45%
Danjun Song ^[30][35]	2021	R	601 (461:140)	No	56.5	82/18	HCC (solitary)	37.40%
Houjiao Dai ^[31][36]	2021	R	69 (LOOCV)	No	52.7	96/4	HCC (solitary)	42.0%
Peng Liu ^[32][37]	2021	R	185 (124:61)	No	54 vs. 52 ^†	84/26	HCC (≤5 cm, solitary)	34.1%
Shuai Zhang ^[33][38]	2021	R	130 (91:39)	No	57.8 vs. 58.6 ^†	68/32	HCC (>1 cm)	61.5%
Wanli Zhang ^[34][39]	2021	R	111 (88:23)	No	NA	88/12	HCC	51.4%
Xiang-pan Meng ^[35][40]	2021	R	402 (300:102)	No	57 vs. 57 ^†	85/15	HCC (solitary)	40%
Yang Zhang ^[36][41]	2021	R	195 (136:59)	No	57.7	88/12	HCC (≤5 cm)	56.4%

Note: #, respective study; ^†, train vs. test cohort; ^§, train vs. test vs. validation cohort; BCLC, the Barcelona Clinic Liver Cancer staging system; HCC, hepatocellular carcinoma; LOOCV, leave-one-out cross validation; MVI, microvascular invasion; NA, not applicable.

3. MVI Prediction

Five items of the RQS in which all included studies performed poorly are “prospective study”, “phantom study”, “biological correlates”, “cost-effectiveness analysis”, and “openness of data and code”. Studies are given the highest weighting in the RQS tool (7 points, accounting for around 20% of the full scale). Phantom study process ensures that only robust features are included in the following radiomics analysis. Biological correlates aim to link imaging findings with gene or molecular signatures. Previous studies have detected a 91-gene signature that highly correlates with vascular invasion in HCC ^[37][43]. Based on this finding, a contrast-enhanced CT imaging biomarker, i.e., radiogenomic venous invasion (RVI), which includes three imaging features (internal arteries, a hypo-dense halo, and a tumor-liver difference), has been shown to be an accurate predictor of MVI ^[38][44]. Future studies are required to explore and verify the correlations between radiomics features and gene expressions. A cost-effectiveness analysis can evaluate a radiomics prediction model in terms of health economics when applied in clinical routines. It assumes that a novel predictor should not be more expensive than currently available predictors when accuracy is comparable. It also compares the health effect of a radiomics predictor with a condition without a radiomics predictor, such as a quality-adjusted life year analysis. researchers think that evaluating this point seems less urgent, given that the methodological standardization and clinical/biological validation of current radiomics models are still lacking. Data and code openness aims to repeat and reproduce results and findings and to further validate and promote the prediction model in other centers. Though some initiatives have been proposed in an attempt to remove the obstacles in data sharing, other factors, such as legal/privacy issues, culture/language barriers, and insufficient staff/time, still exist [10]. None of the studies shared their codes or imaging data publicly.

Regarding the items of “imaging at multiple time points” and “multiple segmentations”, both aim to select stable imaging features for modelling considering subjective and temporal variations. However, less than half of the studies performed ICC analysis and seldom explicitly stated that imaging features from different phases/sequences were evaluated during that analysis (i.e., test–retest analysis). Furthermore, there is no generally accepted ICC threshold at which radiomics features can be considered robust. Generally, when reporting ICC, values of 0.75–0.90 are regarded as indicating good reliability, and values higher than 0.9 are regarded as excellent ^[39][45]. However, among the studies that calculated ICC, the applied threshold varied among 0.75, 0.80, and 0.9. A future study should be applied to determine the proper threshold at which robust radiomics features for modelling can be defined. Some of the studies did not rule out features with low ICC and constructed their model using only the full features extracted from their images.

When evaluating the performance and clinical utility of the radiomics model considering the items of “cut-off analysis”, “calibration statistics”, “comparison with gold-standard”, “potential clinical utility”, and “validation”, the included studies again were insufficient. The performance metrics of a model, such as the sensitivity and specificity, are often determined by a specified cut-off value, and this value can further classify a patient cohort into high and low risk groups for a certain condition. A cut-off value is also one of the prerequisites for reproducing the results of previous research. However, only five studies reported their cut-off values. Regarding calibration analysis, which evaluates the agreement between predictions and the actual events, less than half of the studies performed one. Regarding the comparison with “gold-standard”, there is currently no surrogate that can serve as a “gold-standard” for MVI prediction. As the value of semantic imaging features have been extensively explored for MVI prediction, we therefore defined conventional imaging features as the “gold-standard”. Among the 10 studies that compared prediction performance between radiomics and radiologist models, all declared that the radiomics models outperformed the radiologists’ semantic models. However, the publishing bias should be borne in mind when interpreting these results. Only two studies validated their models using independent external cohorts. However, one of them validated their model in only 18 patients, which is not a sufficiently large validation cohort according to the “10-EPV” principle (at least 10 events per variable) ^[40][41][16,46]. When developing a prediction model, the ratio of event and variable should be maintained at a certain level to avoid potential overfitting or underfitting. Among the 16 studies with an EPV ratio available, the median EPV (MVI positive cases/features) ratio was 4.2, indicating a potential risk of overfitting. Therefore, it is assumed that, before translating these models into a clinical routine utility, some practical issues should be well addressed, such as the reproducibility of the radiomics model, the standardization of imaging protocols, model overfitting, and the external validation of the prediction models.

Though the RQS tool aims for high-quality radiomics research, there are concerns that should be optimized in future revisions. The RQS is mainly focused on radiomics itself and ignores non-radiomics components during radiomics model/predictor development, such as blindness to outcomes and measurement, intervals between the index test and reference standard (in the case of MVI, the time between imaging and liver resection), and the influence of sample size and enrollment of study subjects. All these factors may also introduce bias. Under this context, the tool of QUADAS-2 can serve as a vital supplement to RQS when evaluating the quality of radiomics research. Most of the studies reported in this systematic search showed a low or unclear risk in the four domains of risk of bias evaluation. The missing or unclear parts observed using the RQS and QUADAS-2 tools were obvious, which implies that these tools might not be so well known or adopted. Future researchers will ideally apply the RQS or QUADAS-2 as a checklist to improve the quality of their reports. In fact, a specified checklist, i.e., CLAIM (Checklist for Artificial Intelligence in Medical Imaging) for artificial intelligence research ^[42][47], and a general guideline for diagnostic/prognostic prediction, i.e., TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) ^[43][15], have already been proposed.