An accurate prediction of cancer survival is very important for counseling, treatment planning, follow-up, and postoperative risk assessment in patients with Oral Squamous Cell Carcinoma (OSCC). There has been an increased interest in the development of clinical prognostic models and nomograms which are their graphic representation.
Methodological characteristics of prognostic models developed are summarized on Table 31.
Authors and Year | Internal Validation | Modelling Method | Handling of Missing Data | Model Discrimination | Model Calibration | Model Presentation | Handling of Continuous Predictors | Non-Linearity | Internal Validation C-Index | External Validation C-Index |
---|
Bobdey 2016 [20] | Bobdey 2016 [17] |
1000-time bootstrapping | Multivariable Cox proportional hazards regression models and stepdown reduction method | n/a | C-statistic | n/a | Nomogram | Mixed: Continuous; Categorical/dichotomous | none | 0.7263 | none |
Li 2017 [21] | Li 2017 [18] |
1000-time bootstrapping | Multivariable Cox proportional hazards regression models | n/a | C-statistic | Calibration plot | Nomogram | Categorical/dichotomous | n/a | 0.709 | 0.691 |
Montero 2014 [22] | Montero 2014 [19] |
1000-time bootstrapping | Multivariable Cox proportional hazards regression models and stepdown reduction method | Imputation | C-statistic | Calibration plot | Nomogram | Categorical/dichotomous | Cubic splines | 0.67 | none |
Sun 2019 [23] | Sun 2019 [20] |
Combination of methods: 500-time bootstrapping; 5-fold cross-validation | Multivariable Cox proportional hazards regression models | n/a | C-statistic | Calibration plot | Nomogram | Mixed: Continuous; Categorical/dichotomous | none | 0.705 | 0.664 |
Bobdey 2017 [24] | Bobdey 2017 [21] |
1000-time bootstrapping | Multivariable Cox proportional hazards regression models and stepdown reduction method | n/a | C-statistic | n/a | Nomogram | Categorical/dichotomous | n/a | 0.7266 | 0.740 |
Chang 2018 [25] | Chang 2018 [22] |
1000-time bootstrapping | Multivariable Cox proportional hazards regression models | n/a | AUC | Calibration plot | Nomogram | Categorical/dichotomous | Cubic splines | 0.78 | 0.71 |
An accurate prediction of cancer survival is very important for counseling, treatment planning, follow-up and postoperative risk assessment in patients with OSCC [27][23]. Although the use of prognosis models is still relatively new for OSCC, these models are already widely used for other human diseases [28,29,30,31][24][25][26][27]. It is now well known that cancer-related outcomes are influenced by several factors that are not included in the TNM system. The vast majority of these factors has not been incorporated into the staging system because they may not predict outcome “independently” in multivariate prognosis models, however many of them may work in tandem and have varying degrees of influence on each other [32,33][28][29].
Six studies included correctly developed models according to the TRIPOD, all the included studies carried out internal validation of the model and four models were also externally validated [21,22,23,24,25,26][18][19][20][21][22][30]. The majority of models assessed OS in patients with squamous cell carcinoma of the tongue [22,24[19][21][30],26], two assessed all possible sites of tumor onset [21,23][18][20], and one model only assessed the buccal mucosa cancer [25][22]. All models rated OS at five years, except for Bobdey et al [25][22]. who only rated it at three years; furthermore, Li et al. and Sun et al., also evaluated OS at eight and three years respectively [21,23][18][20]. Among the clinical factors, those most included in the models are age, race, martial state, comorbidities and smoking; while among the histopathological ones the most investigated were T stage, N stage and M stage.
It is well known that the performance of a prognostic model is overestimated when it is just assessed in the patient sample that was used to build the model [34][31]. Internal validation provides a better estimate of model performance in new patients when done by adjusting overfitting, that is the difference between the accuracy of the apparent prediction and the accuracy of the prediction measured on an independent test set. Resampling techniques are a set of methods to provide an assessment of accuracy for the developed prognostic prediction models [35][32]. As an exception, Sun et al. [23][20] used a combined bootstrapping and cross-validation method, although all other studies used 1000-time bootstrapping as a resampling technique. Nevertheless, an evaluation of a model’s performance by using bootstrapping or cross-validation is not enough to overcome overfitting, such type of studies should also apply shrinkage, which is a method used adjust the regression coefficients [36,37][33][34].
Calibration reflects the agreement between the model’s predictions and the observed outcomes. It is preferably reported graphically, usually with a calibration plot [39][35]. Another key aspect of the characterization of a prognostic model is discrimination, that is, the ability of a forecasting model to differentiate between those who experience the outcome event or not [13]. The most used measure for discrimination is the Concordance Index (C-index), which reflects the probability that for any pair of individuals randomly, one with and one without the outcome, the model assigns a higher probability to the individual with the outcome [40][36]. For survival models, many c-indices have been proposed, so it is important to underline that, from our results, the most commonly used is the discrimination model proposed by Harrell [41][37]. In any case, discrimination can vary in a range from 0 to 1 and is considered good when higher than 0.5, considering that all the studies included in this systematic review presented a C-index at least higher than 0.6, all of them showed a good prognostic accuracy [42][38]. In addition, improvements in study design and analysis are crucial to allow evidence of more reliable prognostic factors that can be incorporated into new prognostic models, or to update existing models, to improve discrimination [43][39]. Another important finding was the almost total lack of handling of the missing data, except for Montero et al. [22][19] who carried out the multivariate imputations by chained equations (MICE) [44][40] before conducting multivariable regression statistical analysis [23][20]. The absence of a mention of the missing data leads to a so-called “full case analysis”. Including only participants with complete data, as well as being inefficient as it reduces the sample, can also lead to biased results due to a subsample [12].
External validation is preferable to internal validation for testing the transportability of a model since it is impossible for the population, or distribution of predictors, in an independent population to be the same as in the model development population [45][41]. Secondly, to improve the generalizability of a model, it should ideally be validated in different contexts with different population [46][42]. Furthermore, in the literature, there are currently no external validation by independent researchers of prognostic models for OS in patients with OSCC. A reliable model should be tested by independent researchers in different contexts to ensure the generalizability of prognostic models [15].