We had used Bayesian learning for house price prediction project, notebook is available at [0]. Purpose of this blog is to have quick summary of concept involved.
- Bayesian learning allows us to have distribution for parameters rather than point estimate.
- We keep sampling values of parameters say 10000 times. Histogram of last 6000 sample represent approximate posterior of parameter.
- Every parameters, latent variables, output have distribution associated with them.
- Top parameters in the hierarchy will have prior associated with them.
- We sample for these top parameters
- Calculate corresponding value for all latent variable coming down the tree
- Finally at output calculate likelihood against observed data
- Different MCMC algorithms differs in two steps:
- How to jump to next to next sample
- How to decide whether next sample is acceptable or not
- Metropolis algorithm:
- Jumps considering normal distribution at previous parameter value and some fixed standard deviation
- Acceptance test : random(0,1) < ratio of (Pnew/Pold)
- When Pnew is higher we will definitely accept
- Else probability of acceptance depends on how large Pold is
- If Pold = 2 * Pnew it is 50 %
- Pnew = liklihood_new * prior_new
- Pold = liklihood_old * prior_old
- Example of other MCMC algorithms:
- Metropolis-Hastings
- The Gibbs Sampler
- Hamiltonian MCMC
- The No-U-Turn Sampler (NUTS)
Reference
[1] : http://twiecki.github.io/blog/2015/11/10/mcmc-sampling/
