Bayesian Statistics and Machine Learning; Why should we pay more attention to this relationship?

August 14, 2023 PLOS Uncategorized

^1,2Gayathri Delanerolle ^1,2,3Jian Qing Shi

¹Digital Evidence Based Medicine Lab

²Southern Health NHS Foundation Trust

³Southern University of Science and Technology

Bayesian statistics is a mathematical method to evaluate probabilities in the presence of subjective inferences. The term “Bayesian” is from the 18^th century British mathematician Thomas Bayes who was keen to identify probabilities linked with measuring subjective views within a given hypothesis. Bayes’ contributions to the field were significant, although much of his work was unpublished during his lifetime. His work was brought to light by Richard Prince, who edited and published the Bayes’ theorem after his death. The theorem provides a mathematical framework for updating probabilities based on new evidence by way of probability prediction to be updated if the observations evolve over a period of time. This means, bayes theorem is applicable to real-world and real-time data. For example, symptoms linked to ovarian cancer could become more complex and severe over a period of time. Thus, by using this information, Bayes theorem could predict progression free survival of the given patient.

The fundamentals of Bayes theorem is displayed with Bayesian statistics and has application variability across many fields. The development of Bayesian statistics continued to the 19^th and 20^th centuries by French mathematicising, Pierre-Simon Laplace who furthered the Bayesian inference and probability theory. Laplace expanded upon Bayes’ work and formulated Bayesian statistics as a coherent framework. Bayesian approaches became more popular in the 20^th century with the use of spatial epidemiology, political science and marketing. However, Bayesian faced criticism and overshadowed by frequentists statistics which gained dominance in the early 20^th century. Frequentist statistics focus on extensive behaviour of repeated experiments and relies on concepts such as p-values and confidence intervals. The revival era of Bayesian statistics began in the latter half of the 20^th century where the advancements were based in computational techniques and the increasing availability of computers allowed for more practical use of Bayesian methods. Key figures in the resurgence included the work of Jimmie Savage, Harold Jeffreys, Leonard Jimmie Savage and Bruni de Finetti. In the 21^st century, Bayesian statistics is undergoing a revival among a variety of non-mathematicians given its role in developing and executing machine learning algorithms that give flexibility to anyone working with big-data. Bayesian methodology is useful for a few issues linked to developing and implementing machine learning algorithms. The development of computational techniques such as Markov Chain Monte Carlo (MCMC) methods and advancements in Bayesian modelling approaches such as hierarchical models and Bayesian networks further expanded the scope and applications of Bayesian statistics. Bayesian statistics have the ability to widely use and consider an essential tool for data analysis and inference, hence its’ flexibility and ability to handle complex problems make it useful even in situations with limited or missing data. Variational inference is another Bayesian statistical approach where the it is used as an alternative to the MCMC. It involves minimising the Kullback-Leibler divergence between the true posterior and the approximating distribution. Variational inference is computationally faster than MCMC that introduces an approximation error. These techniques, along with computational advancements and software packages, have made Bayesian statistics a powerful and flexible tool in the decision making processes in various fields of research

Machine learning and Bayesian inference

Bayesian inference can be practically applied to machine learning modelling in several ways. A few common approaches include the following;

Bayesian Parameter Estimation; instead of using point estimates for model parameters, Bayesian inference allows for the estimation of entire probability distributions over parameter values. This is achieved by specifying prior distributions over the parameters and updating them based on observed data using Bayes theorem. This results in posterior distributions that reflect the updated beliefs about the parameters given the data. These posterior distributions can then be used for inference, prediction and uncertainly quantification
Bayesian Model Selection; Bayesian inference provides a principled framework for comparing different models and selecting the most appropriate one for a given dataset. This is done by calculating the posterior probability of each model given the data, taking into account the prior probabilities of the models. The model with the highest posterior probability is chosen as the most suitable model
Bayesian Model Averaging; In situations where multiple models are plausible, Bayesian model averaging combines the prediction from multiple models, weighted by their posterior probabilities. This allows for more robust and accurate predictions by incorporating the uncertainty associated with different models.
Bayesian Neural Networks; In traditional neural networks, the weights and basis are typically treated as fixed valued learned from data. In Bayesian neural networks, the weights and biases are treated as random variables with prior distributions. By applying Bayesian inference, the posterior distributions over the weights and biases can be obtained, which provides a measure of uncertainty in the predictions. This uncertainty can be useful in various applications such as active learning where the model can decide which data points to query for further labelling.
Hierarchical Bayesian Modelling; Bayesian inference supports the construction of hierarchical models where parameters are assumed to have their own distributions often referred to as hyperparameters. This allows for the modelling of complex relationships and dependencies in the data. Hierarchical Bayesian models can be particularly useful in cases where data grouped or nested, and information cab be shared across different levels of hierarchy.
Sequential Bayesian Inference; Bayesian inference is well-suited for sequential or online learning scenarios where data arrives incrementally. As new data becomes available, the posterior distribution is sequentially updated using the previously obtain posterior as the prior. This sequential updating allows for adaptive learning where the model can dynamically update its beliefs based on new information.

Implementing Bayesian inference in machine learning often involves the use of computational techniques such as the MCMC methods, variational inference or approximate Bayesian computation to sample from or approximate the posterior distributions. These techniques enable the practical implementation of Bayesian inference in complex models.

Machine learning and Bayesian Modelling

Bayesian model selection is an important aspect to understand especially if comparisons of different datasets are an important task a machine learning model is to perform. This approach involved comparing different models using their posterior probabilities. The Bayes factor is often used as a measure of the relative support for one model over another. Model averaging techniques such as the Bayesian Information Criterion (BIC) or the Deviance Information Criterion (DIC) provides methods for selecting the most appropriate model based on their posterior probabilities.

A few common ways to apply Bayesian modelling to machine learning is as follows;

Define the Model; start by specifying a probabilistic model that captures the relationship between the input features and the target variable. This involves selecting an appropriate likelihood function that models the data generation process given the parameters of interest. In addition, specify prior distributions over the model parameters to incoroporate prior knowledge or beliefs.
Bayesian inference; Apply Bayesian inference to estimate the posterior distribution over the model parameters given observed data. This is done by combining the prior distributions with the likelihood function using Bayes theorem. However, obtaining the exact posterior distribution is often analytically intractable, especially for complex models.
Approximate inference; Use the MCMC methods or variational inference to obtain approximate samples from the posterior distribution. MCMC methods such as Metropolis-Hastings algorithm or Gibbs sampling, iteratively sample from the parameter space based on acceptance probabilities. Variational inference approximates the posterior distribution by minimising the Kullback-Leibler divergence to a simpler distribution.
Model assessment and selection; Assess the fit of the model by evaluating goodness-of-fit measures such as posterior predictive checks or DIC. Compare different models using techniques like Bayes factors or model averaging to select the most appropriate model for the given data.
Prediction and Uncertainty Quantification; Bayesian modelling provides a natural way to make predictions and quantify uncertainty. Predictions can be made by averaging over the posterior distribution of the model parameters, resulting in probabilistic predictions that reflect the uncertainty in the model. The posterior distribution also allows for uncertainty quantification through credible intervals or posterior predictive intervals.
Iterative learning; Bayesian modelling supports iterative learning, where new data can be sequentially incorporated to update the posterior distribution. This is particularly useful in online learning scenarios where data arrives incrementally. The posterior from previous iterations serves as the prior for the next iteration, allowing the model to adapt and update its beliefs as new data becomes available.
Model Interpretation; Bayesian modelling provides a rich framework for interpreting models. Posterior distributions can be examined to understand the uncertainty in the parameter estimates and identify important features. Additionally, exploring the relationships between parameters and making inferences about the underlying processes can provide deeper insights into the data.

It is vital to note that implementing Bayesian modelling in machine learning requires computational resources and efficient algorithms due to the potentially high-dimensional parameter space and complex models. It is vital to understand the context of these aspects when applying this method to communicable and non-communicable diseases. However, the Bayesian approach offers advantages such as incorporating prior knowledge handing small data sets and providing uncertainty estimates, making it a powerful tool in machine learning applications.