Probability Distribution

We have learned various probability distribution during high school and  engineering courses. However at times we forget them, so here I am providing simple practical scenarios for each distribution with no theories involved.

Bernoulli Distribution

  • When the random variable has just two outcomes
  • Probability of Drug/Medicine will be approved by government is p = 0.65
    • Probability that it will not approve is 0.35
  • Below formula works when we have probability available, in real life we estimate them from data :
    • Mean = p
    • Variance (Sigma Square) = p*(1-p)
  • Parameters : p
  • Probability evaluation P(x|params) = p if x = 1, (1-p) if x = 0
  • MLE : p = n/N, where n = no of time 1 observed , N = no of experiments
  • MLE = Maximum Likelihood Estimation

Binomial Distribution

  • When you perform the Bernoulli experiment multiple times and want to see how many times certain outcome appears.
  • For example you flip a coin(fair/biased) 10 time and probability that head will appear for x (1, 2, …..10) times.
  • Another more practical example :
    • Suppose oil price can increase by 3 bucks or decreased by 1 buck each day
    • Probability of increasing p = 0.65, and that of decreasing = 0.35
    • What price can we expect after three days
    • Note (Increase, Increase, Decrease) and (Increase, Decrease, Increase) will give same price.
      • (2,1) success -> 2 success 1 failure
  • From another point of view it count no of successes in an experiment :
    • No of patient responding to treatment
    • Binary classification problem (Does not seem correct now, it should be Bernoulli, we take logit and sigmoid)
  • Below formula works when we have probability available, in real life we estimate them from data :
    • n = no of times experience is performed
    • Mean = n*p
    • Variance (Sigma Square) = n*p*(1-p)
  • Example of binomial used in modeling :
  • Parameters : n, p
  • Probability evaluation P(x|params) = nCx * p^x * (1-p)^(1-x)
  • MLE
    • n = no of samples = N
    • p = n/N where n = no of successes
    • Interestingly MLE for binomial and multinomial distribution is very simple

Normal Distribution

  • Very popular distribution
  • Observed very often because of central limit theorem (CLT)
  • Example :
    • % change in a stock price of google from a previous day
    • Heights and weights of persons
    • Exam scores
  • It is good to remember empirical numbers for normal distribution :
    • 68 % – one standard deviation
    • 95 % – two standard deviation
    • 99.7 % – three standard deviation
  • We use Z score as a distance in the unit of standard deviation from mean
  • Parameters : μ, σ
  • Probability estimation P(x|params) = 1/sqrt(2*pi*sigma^2) * exp(-(x-μ)^2/(2*sigma^2))
  • MLE :
    • μ = average (x)
    • σ = sqrt((x – μ)^2/N-1)

Poisson Distribution

Multi-nomial Distribution

  • Binomial distribution has two parameters : n, p
    • Multi-nomial distribution has n, p1, p2, p3, p4 (, . . . .pn)
    • Its like throrwing a biased coin vs throwing biased dice
  • Probability estimation :
    • P(x1, x2, x3 | n, p1, p2, p3) = n!/(x1! x2! x3!)  * p1^x1 * p2^x2 * p3^x3
    • Above is probability of observing event1 x1 times, event2 x2 times
    • In binomial we estimate probability of success x times
      • From that we can easily determine probability of failure
  • Parameter estimation
    • n = No of samples (easy)
    • p1 = x1/n, p2=x2/n, p3 = x3/n
    • Interestingly MLE for binomial and multinomial distribution is very simple
    • Derivation of above is constrained optimisation problem, solved using Lagrangian [2]
  • Wikipedia
    • For example, it models the probability of counts for rolling a k-sided die n times.
    • When k is 2 and n is 1, the multinomial distribution is the Bernoulli distribution.
    • When k is 2 and n is bigger than 1, it is the binomial distribution.
    • When k is bigger than 2 and n is 1, it is the categorical distribution.
  • It is k dimensional distribution
    • It is joint distribution of k variables

T Distribution

  • It has just one parameter called df (Degrees of Freedom)
  • mean = 0
  • std. deviation= sqrt(df/(df-2))
  • As df increases it moves more and more toward standard normal curve
    • There is a saying for (n>30) or (n>50) it becomes standard normal distribution only
  • In general it is more wider than bell curve.
    • Reason being from above formula std. deviation is always greater than 1
    • For standard bell curve std. deviation = 1
  • Area under t distribution is 1
  • Parameter : df
  • Probability Estimation P(x | params) = check Wikipedia
  • MLE : df = N -1 where N is no of samples

Fitting the Distribution?

Fitting the distribution means, we are using some distribution as the model and we want to estimate the parameters. In case of Gaussian/Normal we estimate u and sigma, in case of poisson we estimate lambda.

This allows us to reason about data using few parameters. Another extreme end would be to store probability for each possible values.

What is probabilistic models ?

Models that propagate uncertainty of input to target variables are probabilistic models. Examples are :

  • Regression
  • Probability Trees
  • Monte Carlo Simulations
  • Markov chains

Further Reference :

[0] : stats.stackexchange

[1] : MLE for various distributions : https://onlinecourses.science.psu.edu/stat504/node/28/

[2] : https://math.stackexchange.com/questions/421105/maximum-likelihood-estimator-of-parameters-of-multinomial-distribution

Leave a comment