Asthma is a widespread respiratory disease caused by complex contribution from genetic, environmental and behavioral factors. For several decades, its sensitivity to environmental factors has been investigated in single exposure (or single family of exposures) studies, which might be a narrow approach to tackle the etiology of such a complex multifactorial disease. The emergence of the exposome concept, introduced by C. Wild (2005), offers an alternative to address exposure–health associations.
Asthma is a heterogeneous chronic respiratory disease characterized by an inflammation of the airways and which manifests by variable respiratory symptoms (wheeze, shortness of breath, chest tightness and/or cough) and variable expiratory airflow limitation [1]. Asthma affects approximately 300 million children and adults worldwide [2]. The prevalence of asthma has dramatically increased over the last decades [3]. The huge research efforts in identifying the causes of asthma led to the identification of genetic (such as the 17q21 ORMDL3/GSDML region for early childhood onset asthma), environmental (such as urban vs. rural area) and lifestyle risk factors (such as tobacco smoking). It also highlighted the complex etiology of this multifactorial disease, e.g., by the identification of specific windows of susceptibility and complex gene-by-environment interactions [4]. The exposome concept, introduced in the recent years to complement the genome for a better understanding of the development of complex diseases [5], offers new avenues in environmental epidemiology. In this review, the main objective was to present how the new methodological framework represented by the exposome has been applied to asthma research to date. After presenting an overview of the concept of exposome, we will review different statistical approaches to study the exposome–health associations. Finally, recent studies linking multiple families of exposures to asthma-related outcomes will be discussed.
Exposome studies imply collection of a large number of exposures. This can be done relying on different methods of assessment (e.g., self-reported questionnaire, exposure biomarkers, geographic information system-based (GIS) models, personal sensors, …), for different time windows (pre-natal, early postnatal, during childhood, adolescence, adulthood), and, for external factors, with different locations (home, school, work) and spatial resolutions (e.g., urban indicators measured for various buffers (100, 300, 500 m)). From a methodological point of view, this large number of variables (possibly larger than the size of the study population) raises issues in terms of statistical power and false discovery rate [32][6]. Indeed, the multiplicity of tests implies the rise of the alpha risk, and methods developed to correct the p-value of an association for multiple hypothesis testing [33,34,35][7][8][9] lead to a decreased statistical power. Therefore, exposome–health association studies deserve a sufficient sample size to achieve adequate statistical power to detect associations of low to moderate associations sizes, as expected for most exposures [36,37][10][11]. Several other statistical challenges specifically linked to exposome studies have to be taken into account, such as the increased false discovery rate related to the high level of correlation between exposures and the difficulty to consider “mixture” effects [38,39][12][13]. Until now, no consensus establishing which statistical methods are to be used in exposome–health association studies has been reached [40][14]. However, some simulations have allowed the comparison of the performance of various methods in the exposome research context under some specific settings. For example, simulation studies compared i) the efficiency of various regression-based approaches in terms of false positive rate and sensitivity, with and without interactions between exposures [32,39][6][13]; ii) the performance of variable selection models in case-control studies [41][15]; iii) the performance of variables and function selection methods in the case of nonlinear effects of correlated exposures [40][14]; and iv) methods to correct for classical-type exposure measurement error [31][16]. Using the findings of these studies and a review of the literature, we summarized in Table 1 the strengths and weaknesses of the main statistical approaches used in exposome studies in the field of respiratory health.
Table 1. Main statistical methods used in exposome-health association studies.
Type of Analysis | Examples of Methods | Strengths | Weaknesses | Reference of the Method | Use of the Method in Exposome or Asthma Field |
---|
Single-exposure regression-based method | Exposome-Wide Association Study (ExWAS) |
|
| Patel et al., 2010 [42] | Patel et al., 2010 [17] | Sbihi et al., 2017 [43]; North et al., 2017 [44]; Lepeule et al., 2018 [45]; Agier et al., 2019 [46]; Vrijheid et al., 2020 [47]; Agier et al., 2020 [48]; Warembourg et al., 2019 [49]; Nieuwenhuijsen et al., 2019 [50]; Granum et al. [51] | Sbihi et al., 2017 [18]; North et al., 2017 [19]; Lepeule et al., 2018 [20]; Agier et al., 2019 [21]; Vrijheid et al., 2020 [22]; Agier et al., 2020 [23]; Warembourg et al., 2019 [24]; Nieuwenhuijsen et al., 2019 [25]; Granum et al. [26] |
||||
Multiple-exposures regression-based methods | Deletion–Substitution–Addition (DSA) algorithm |
|
| Sinisi and van der Laan 2004 [52] | Sinisi and van der Laan 2004 [27] | Agier et al., 2019 [46]; Vrijheid et al., 2020 [47]; Agier et al., 2020 [48]; Warembourg et al., 2019 [49]; Nieuwenhuijsen et al., 2019 [50] Granum et al. [51] | Agier et al., 2019 [21]; Vrijheid et al., 2020 [22]; Agier et al., 2020 [23]; Warembourg et al., 2019 [24]; Nieuwenhuijsen et al., 2019 [25] Granum et al. [26] | ||||
Elastic Net (ENET) and Least Absolute Shrinkage and Selection Operator (LASSO) |
|
| Zou and Hastie 2005 [53]; Tibshirani 1996 [54] | Zou and Hastie 2005 [[28]; Tibshirani 1996 [29] | Pries et al., 2019 [55]; Cowell et al., 2019 [56] | Pries et al., 2019 [30]; Cowell et al., 2019 [31] | |||||
Weighted Quantile Sum (WQS) regression |
|
| Carrico et al., 2015 [57] | Carrico et al., 2015 [32] | - | ||||||
Supervised clustering approaches | Latent Class Analysis (LCA) |
|
| Goodman et al., 1974 [59] | Goodman et al., 1974 [34] | Buck Louis et al., 2019 [60]; Harmouche-Karaki et al., 2019 [61] | Buck Louis et al., 2019 [35]; Harmouche-Karaki et al., 2019 [36] | ||||
Bayesian Profile Regression (BPR) |
|
| Molitor et al., 2020 [62] | Molitor et al., 2020 [37] | Berger et al., 2020 [63]; Belloni et al., 2020 [64] | Berger et al., 2020 [38]; Belloni et al., 2020 [39] | |||||
Analysis accounting for the hierarchical structure of the data | Meet-in-the-Middle (MITM) |
|
| Chadeau-Hyam M et al., 2011 [65]. | Chadeau-Hyam M et al., 2011 [40]. | Vineis et al., 2020 [66]; Jeong et al., 2018 [67]; Cadiou et al., 2020 [68] | Vineis et al., 2020 [41]; Jeong et al., 2018 [42]; Cadiou et al., 2020 [43] | ||||
Bayesian Kernel Machine Regression (BKMR) |
|
| Bobb et al., 2015 [69] | Bobb et al., 2015 [44] | Berger et al., 2020 [63] | Berger et al., 2020 [38] |
Asthma is a widespread multifactorial disease, which deserves a comprehensive approach to better understand its etiology and development. Although most of previous studies in environmental epidemiology focused on a single exposure (or single exposure family), with the recent emergence of the exposome concept, several studies and European projects have started to assess the effect of multiple exposures on respiratory health. These studies are expected to contribute to a better understanding of the associations between the environment and health by using various holistic approaches. Although the first association studies between the exposome and asthma-related outcomes conducted so far mainly rely on the ExWAS method for successive single-exposure analysis and the DSA algorithm for multi-exposures analysis [46,48,49,50,51,76][21][23][24][25][26][45], further studies on larger sample size should attempt to apply more comprehensive statistical approaches, either able to account for the hierarchical structure of the multiple layers of the exposome or to account for the possible mixture effects in order to be more consistent with the complex structure of exposure data.