Given the reproducibility crisis (or replication crisis), more psychologists and social-cultural scientists are getting involved with Bayesian inference. Therefore, the current article provides a brief overview of programs (or software) and steps to conduct Bayesian data analysis in social sciences.
The persistence of ‘stargazing’, p-hacking and HARKing issues has currently led to a severe reproducibility crisis in which 70% of researchers have failed to reproduce other scientists’ experiments, and more than half have failed to replicate their experiments [1][2][3][4]. In psychology, it is also found that over half of the studies cannot be replicated [5][6], whereas only 62% of replicated social science experiments published in prestigious journals, like Nature and Science, show the same direction as the original papers [7].
For easing the problem, many initiatives have been proposed. The Editorial of Nature Human Behaviour releases the option of a registered report for neutralizing the publication bias and improving scientific research validity [2]. Some authors suggest redefining the statistical significance that the p-value threshold is lowered to 0.005 “for claims of new discoveries” [8]. Nevertheless, the proposal faces criticism from Amrhein and Greenland [9] for being overconfident about mathematical results and ignoring unmodeled uncertainties. They argue that considering results’ reliability has to be regardless of statistical significance and based on the combination of “multiple studies and lines of evidence”.
The Bayesian analysis seems to offer a solution to the crisis, given its natural properties, such as treating all quantities (including hypotheses and unknown effects) probabilistically, incorporating prior information into estimation using the current evidence, etc. [10][11][12]. In the last few decades, Bayesian inference was not widely applied in practical research due to its complexity in estimating posterior. Nonetheless, the recent development of Markov chain Monte Carlo (MCMC) algorithms and rapid improvement in computational power have made the Bayesian data analysis more feasible. As a result, more psychologists and social-cultural scientists are getting involved with Bayesian data analysis [13][14][15][16][17][18][19].
Since the first development of WinBUGS software [20], an increasing number of programs and software packages implementing Bayesian analysis has been produced. OpenBUGS [21], JAGS [22], MCMCglmm [23], Stan [24][25], brms [26], rethinking [11], and rstanarm [27] are most common examples. The JAGS program primarily uses the Gibbs sampling for estimating the Bayesian multilevel models, while the WinBUGS, OpenBUGS programs and MCMCglmm package employ the combination of Metropolis-Hastings updates, Gibbs sampling, and even Slice sampling in some cases.
In contrast, Stan, a software written in C++, uses Hamiltonian Monte Carlo (HMC) [28], and its extension, the No-U-Turn Sampler (NUTS) for MCMC simulation [29]. The HMC and NUTS algorithms obtain faster convergence speed than their counterparts, Metropolis-Hasting and Gibbs algorithms, especially for high-dimensional models. As a result, the probabilistic programming language Stan has been integrated into recently developed R packages for Bayesian multilevel analysis using MCMC simulation, like brms, rstanarm, and rethinking.
Although there have been many Bayesian programs or software available, Bayesian inference is still not widely used in social sciences. This might be attributable to three reasons: 1) the fear of mathematical challenges, 2) the fear of computer code writing, and 3) the fear of leaving their comfort zone [4]. More recently, the bayesvl package has been developed for helping social scientists to overcome such fears through 1) replacing mathematical formulas with directed acyclic graphs (DAGs, or “relationship trees”), 2) automatic generation of Stan code, and 3) offering graphical visualization of models, results, and diagnostic tests [30][31].
Normally, Bayesian analysis is conducted following three steps: model building, model fitting, and model interpretation and improvement [32]. To better describe these steps, we only focus on the software or programs using Markov Chain Monte Carlo (MCMC) techniques. Moreover, the exemplary figures are generated using the bayesvl package due to its ability to create eye-catching graphics [33].
The first step is to construct the model based on prior knowledge, experience, or theories. This step also includes the selection of prior probability distribution. For constructing a model, the research outcome needs to be considered, whether for prediction or explanation (Ripley, 2004). The predictive model is usually used for finding the predictors or impacts of a particular phenomenon, for example, finding the predictive factors of vaccination intention [34]. Meanwhile, the primary objective of an explanatory model is to seek the most parsimonious explanation for a given phenomenon or process, for example, explaining the suicidal ideation mechanism or pathways to book-reading interest [14][35].
The second step is model fitting. Just like its name, it is to fit the constructed model using MCMC algorithms. The outcomes are estimated posterior probability distributions and credibility intervals of studied parameters. During this step, the scientists have to determine the technical settings for the Bayesian analysis. The numbers of iterations, warm-up iterations, and Markov chains have to be decided based on the types of data and MCMC techniques integrated into the employed program.
Figure 1: Trace plot
Figure 2: Gelman plot
Figure 3: Autocorrelation plot
Finally, the estimated results are interpreted and, if necessary, compared with results simulated by different models for specifying the optimal model. A posterior result can be deemed qualified for interpretation when the Markov chain central limit theorem holds (or the Markov chains converge). Two primary diagnostic statistics have to be interpreted: the effective sample size (n_eff) and the Gelman shrink factor (Rhat). For Markov chains of a model to be considered convergent, the posterior parameters’ n_eff values should be higher than 1,000, and Rhat values are equal to 1. Notably, the n_eff value’s threshold can vary according to programs and software used. Visually, the Markov chain central limit theorem can be diagnosed using the trace plot (see Figure 1), Gelman plot (see Figure 2), and autocorrelation plot (see Figure 3). The simulated posterior results can be interpreted using probability distribution plots, like the interval plot (see Figure 4), the density plot (see Figure 5), the pairwise density plot (see Figure 6), etc.
Figure 4: Interval plot
Figure 5: Density plot
Figure 6: Pairwise density plot
The loo package – performing approximate leave-one-out cross-validation for Bayesian models fit using MCMC – offers an alternative to compare the predictive accuracy on new data among models (or models’ goodness-of-fit) [36].