The prediction of chronological age from methylation-based biomarkers represents one of the most promising potential applications in the field of forensic sciences
[1,2,3][1][2][3]. In the last decade, extensive efforts have been made to identify such biomarkers and thanks to several epigenome-wide association studies, many CpG sites for which methylation levels are strongly correlated with age have been identified. Several authors proposed to combine these markers to formulate models for age prediction
[4,5,6][4][5][6]. Currently, the most robust methylation-based age prediction methods are represented by the so-called “epigenetic clocks” that are based on microarray technologies. These methods are technically not achievable in a typical forensic laboratory and require more DNA than the usual amount available for most casework samples. In addition, they are based on technologies that are very expansive and complex also provide very sophisticated classification algorithms. More recently, forensic DNA technology has triggered efforts toward simplification of the array-based epigenetic clocks, and several models have been developed to date. Due to the existence of complex nonlinear relationships between the methylation levels of the assessed CpG markers and chronological age, several authors have also taken advantage of machine learning approaches to obtain more accurate age predictions
[7,8][7][8]. These algorithms include support vector machine (SVM)
[9], artificial neural networks
[10], gradient boosting regressor
[11], and missMDA
[8]. However, their translation to forensic genetic practices is still far away for several important reasons. Firstly, with some exceptions, they are still based on technological formats that are not available in a typical forensic genetic laboratory. Secondly, even if formulated to exploit the same technology, the detection of the methylation values might be based on different genes, and therefore, the CpG sites (and their combination) included in such models are usually different. Thirdly, the complexity of the statistical framework within which these models have been obtained limits their application.
2. ELOVL2
Age estimation using DNA-based methodologies is a crucial step in forensic science analysis, as well as in other fields, such as the monitoring of ageing rate. Although several methods for forensic age estimation have been proposed to date, none of these approaches is currently used in forensic laboratories for identification purposes. In order to translate new discoveries in casework analysis, the definition of precise guidelines for the implementation of the developed methods in a practical manner is a fundamental requirement. To pursue this objective, it is crucial to define a set of methylation markers to be analysed, the relevant methodology for their detection, and an easy-to-use mathematical model for the analysis of the laboratory data allowing for a reliable forensic age estimation. Regarding the definition of the markers, many candidate loci have been proposed, such as ELOVL2, C1orf132, TRIM59, FHL2, KLF14, PDE4C, EDARADD, ASPA, and PENK
[8,28,29,30][8][19][20][21]. As it pertains to the detection methodology, different sequencing/typing techniques have been proposed for forensic age prediction. They include pyrosequencing
[15[15][22][23][24][25],
25,31,32,33], massive parallel sequencing (MPS)
[34[26][27][28][29],
35,36,37], SNaPshot assays
[38[30][31],
39], and EpiTYPER
[13,28,40,41][13][19][32][33]. It is important to point out that methylation profiles obtained with these different sequencing/typing methods provide largely comparable results
[42][34]. On the other hand, MPS seems to be the most advantageous approach due to its capability of dealing with low quantity/degraded samples, which can be very common in forensic investigations
[43][35]. Furthermore, MPS is already used in most forensic laboratories for DNA profiles with STR markers, but also for biogeographical ancestry information, mitochondrial DNA sequence analysis, and for forensic DNA phenotyping applications
[44,45,46][36][37][38]. Regarding the different algorithms that have been formulated, the Machine Learning approach significantly outperforms other approaches
[8,24,34,47][8][26][39][40].
Systematic reviews carried out in the last years identified hundreds of age prediction models based on DNA methylation data
[1,48,49][1][41][42]. These models relied on different tissues (blood or other body fluids) and included fewer than a dozen markers using mainly pyrosequencing to several tens or hundreds of loci using methylation arrays
[4,5,6][4][5][6]. A variety of different epigenetic models exploiting different DNA methylation technologies and different statistical methods for forensic age prediction have been developed to date
[8,14,15,20,24,50,51,52][8][14][15][39][43][44][45][46]. Among them, the most accurate provide prediction errors of 3–4 years, which are in line with those from eyewitness reports. Most of them are based on multiple CpG sites from blood samples for which donors were restricted to adult age ranges, while only a few models covered a full spectrum of human ages from childhood to old age
[24,47][39][40]. Only few attempts to simplify such epigenetic models have been proposed to date to make them easily applicable in forensic casework
[16,53][16][47]. These were mainly based on (i) a reduction in the number of markers and (ii) a technological format suitable for forensic laboratories
[38,39][30][31] resulting in a simple statistical approach (e.g., liner regression) applicable to the data collected in routine practice. Among these several attempts, those recently proposed by Garali and co-workers seem to fulfil all previously mentioned conditions
[16]. The proposed single-locus model was based on the seven CpGs sites of the ELOVL2 promoter and showed a prediction error of about 5 years. In addition, despite the fact that multi-locus age prediction models seem to generally perform better than the proposed single-locus model, in independent validation studies, this difference became negligible
[54][48].
In the present meta-analysis, researchers included nine studies involving more than 2200 participants from different populations to build a single-locus ELOVL2-based epigenetic model of forensic age prediction from blood samples. This allowed people to obtain the largest dataset ever analysed, as well as to improve the understanding of the impact of epigenetic variability of ELOVL2 on forensic age prediction. By using five different statistical approaches, researchers then compared the differences in the performances obtained using the five different corresponding models. The models giving the best age prediction accuracies were the GBR and the SVM models with a prediction error of about 5.6 years. Sensitivity analysis showed that this error remained stable, indicating that the results obtained were robust.
The
ELOLV2 single locus model was also proposed by two previous studies. The first study was carried out by Garali et al.
[16] and was based on a smaller sample size (1413 individuals) with a different methodology. The second study was reported by Zbieć-Piekarska et al.
[53][47] who developed an epigenetic model based on the pyrosequencing of the promoter region of ELOVL2 from 303 blood samples. However, the meta-analysis provides more robust and clearer results since it included new additional studies involving more than 2200 participants. With respect to the study carried out by Garali et al.
[16], the classification performances reported in the
pres
ent sttudy are slightly lower, and this discrepancy might be partially due to overfitting. Garali et al.
[16] used every combination of the seven CpGs sites during model building, developing a total of 17,018 age prediction models. This procedure might have overfitted the data, finally resulting in poorer performance when the models are applied in independent validation studies.
Another important point to consider for the formulation of a single-locus age prediction model is tissue specificity. In fact, even if ELOVL2 methylation levels did not show tissue specificity
[18], a significant performance reduction was evident when the obtained models were applied to tissue different from blood. Using methylation data from buccal swab samples of Becker et al. for German and Japanese populations
[50][44] and exploiting the five previously reported statistical models, researchers obtained prediction errors in terms of the MAE ranging from 17.84 years for the GBR model to 22.7 years for the MLR model for the German samples; similar results were also obtained for the Japanese samples (see
Supplementary Tables S1 and S2). These results support the idea that age-prediction models may be cohort- and tissue-specific, and thus, caution should be exercised during their application. Population-specific differences in DNA methylation patterns and their impact on forensic age estimations have already emerged from several published studies on this topic
[22,55][49][50]. However, the prediction error obtained using the ELOVL2-based epigenetic models is not sufficiently low to make them suitable for forensic practice. This suggests that alongside ELOVL2, the inclusion of additional non-redundant markers is a fundamental requirement to apply molecular models to forensic application with robust results. For instance, starting from the observation that the prediction accuracy of an epigenetic clock is influenced by the proportions of naive and activated immune blood cells
[22[49][51],
56], in a recent study, it was demonstrated that a molecular clock based on ELOVL2 together with a biomarker of immunosenescence (sjTREC) showed a significantly improved prediction accuracy, especially at old ages
[22,57][49][52] where most epigenetic clocks may become less accurate
[15,34,35,58,59][15][26][27][53][54].
On the other hand, the formulation of a single-locus epigenetic model represents an easy and cost-effective approach since methylation levels of such candidate regions can be assessed using PCR methods
[17,60,61][17][55][56]. In fact, methylation analysis is usually carried out using different DNA methylation technologies (e.g., EpiTYPER
®, SNaPshot, or pyrosequencing) for which high costs represent a constraining factor for most forensic laboratories. As a cost-effective approach, this last strategy might allow a re-analysis of the same blood sample, a procedure that has been demonstrated to clearly improve the detection of the methylation status of the analysed CpG sites and consequently the corresponding age prediction models
[16]. Further studies are required, as also highlighted by Garali et al.
[16], to validate the single-locus model researchers proposed based on DNA samples from different types of tissues to define the applicability of these models to such samples. It might also be interesting to study the capability of this marker to gauge the individual rate of ageing and to evaluate the effects of specific interventions
[62,63,64,65][57][58][59][60].