The Bayes factor is a ratio of the marginal likelihood of two competing models. The marginal likelihood for a model class is a weighted average of the likelihood over all the parameter values represented by the prior distribution. Therefore, carefully choosing priors and conducting a prior sensitivity analysis play an essential role when using Bayes factors as a model selection tool. This section briefly discusses the prior distributions, prior elicitation, and prior sensitivity analysis.
1. Prior Distributions
In Bayesian statistical inference, a prior probability distribution (or simply called the prior) estimates the probability of incorporating one’s beliefs or prior knowledge about an uncertain quantity before collecting the data. The unknown quantity may be a parameter of the model or a latent variable. In Bayesian hierarchical models, there are more than one level of prior distribution corresponding to a hierarchical model structure. The parameters of a prior distribution are called hyperparameters. And can either assume values for the hyperparameters or assume a probability distribution, which is referred to as a hyperprior.
It is common to categorize priors into four types: informative priors, weakly informative priors, uninformative priors, and improper priors
[1]. The Bayes factor computation requires proper priors, i.e., a prior distribution that integrates to 1. Various available software provide default priors, but it is the researchers’ responsibility to perform sensitivity analysis to check the impact of applying different priors.
2. Prior Elicitation
The prior distribution is an important ingredient of the Bayesian paradigm and must be designed coherently to make Bayesian inference operational
[2]. Priors can be elicited using multiple methods, e.g., from past information, such as previous experiments, or elicited purely from the experts’ subjective assessments. When no prior information is available, an uninformative prior can be assumed, and most of the model information that is given by the posterior will come from the likelihood function itself. Priors can also be chosen according to some principles, such as symmetry or maximum entropy, given constraints. Examples are the Jeffreys prior
[3] and Bernardo’s reference prior
[4]. When a family of conjugate priors exist, choosing a prior from that family simplifies the calculation of the posterior distribution.
With the advancement of computational power, ad hoc searching for priors can be done more systemically. Hartmann et al.
[5] utilized the prior predictive distribution implied by the model to automatically transform experts’ judgments about plausible outcome values to suitable priors on the parameters. They also provided computational strategies to perform inference and guidelines to facilitate practical use. Their methodology can be summarized as follows: (i) define the parametric model for observable data conditional on the parameters
θ and a prior distribution with hyperparameters
λ for the parameters
θ, (ii) obtain experts’ beliefs or probability for each mutually exclusive data category partitioned from the overall data space, (iii) model the elicited probabilities from step 2 as a function of the hyperparameters
λ, (iv) perform iterative optimization of the model from step 3 to obtain an estimate for
λ best describing the expert opinion within the chosen parametric family of prior distributions, and (v) evaluate how well the predictions obtained from the optimal prior distribution can describe the elicited expert opinion. Prior predictive tools relying on machine learning methods can be useful when dealing with hierarchical modeling where a grid search method is not possible
[6].
3. Sensitivity Analysis
In the Bayesian approach, it is important to evaluate the impact of prior assumptions. This is performed through a sensitivity analysis where the prior is perturbed, and the change in the results is examined. Various authors have demonstrated how priors affect Bayes factors and provided ways to address the issue. When comparing two nested models in a low dimensional parameter space, the authors in
[7] propose a point mass prior Bayes factor approach. The point mass prior distribution for the Bayes factor is computed for a grid of extra parameter values introduced by a generalized alternative model. The resulting Bayes factor is obtained by averaging the point mass prior Bayes factor over the prior distribution of the extra parameter(s).
For binomial data, Ref.
[8] shows the impact of different priors on the probability of success. The authors used four different priors: (i) a uniform distribution, (ii) the Jeffreys prior, which is a proper Beta(0.5,0.5) distribution, (iii) the Haldane prior by assuming a Beta(0,0) distribution (an improper prior), and (iv) an informative prior. The uniform, Jeffreys, and Haldane priors are noninformative in some sense. Although the resulting parameter estimation is similar in all four scenarios, the resulting Bayes factor and posterior probability of
H1 vary. Using the four different priors produces very different Bayes factors with values of
0.09 for the Haldane,
0.6 for the Jeffreys,
0.91 for the uniform, and
1.55 for the informative prior. The corresponding posterior probabilities of
H1 are
0.08 (Haldane),
0.38 (Jeffreys),
0.48 (uniform), and
0.61 (informative). In this example, the sensitivity analysis reveals that the effect of the priors on the posterior distribution is different from their effect on the Bayes factor. The authors emphasize that Bayes factors should be calculated, ideally, for a wide range of plausible priors whenever used as a model selection tool. Besides using the Bayes factor based on prior predictive distribution, they also suggest seeking agreement with the other model selection criterion designed to assess local model generalizability (i.e., based on posterior predictive distribution).
The author in
[9] describe several interesting points with regards to prior sensitivity. The author views prior sensitivity analysis in theory testing as an opportunity rather than a burden. They argue that it is an attractive feature of a model evaluation measure when psychological models containing quantitatively instantiated theories are sensitive to priors. Ref.
[9] believes that using an informative prior expressing a psychological theory and evaluating models using prior sensitivity measures can serve to advance knowledge. Finally, sensitivity analysis is accessible through an interactive Shiny Application developed by the authors in
[10]. The software is designed to help user understand how to assess the substantive impact of prior selection in an interactive way.
4. Conclusion
The Bayes factor is only one of many aspects of Bayesian analysis, and it serves as a bridge to Bayesian inference for researchers interested in testing. The Bayes factor can provide evidence in favor of the null hypothesis and is a relatively intuitive approach for communicating statistical evidence with a meaningful interpretation. The relationships between the Bayes factor and other aspects of the posterior distribution, for example, the overlap of Bayesian highest posterior density intervals, form a topic of interest.
This entry is adapted from the peer-reviewed paper 10.3390/e24020161