We have learned various probability distribution during high school and engineering courses. However at times we forget them, so here I am providing simple practical scenarios for each distribution with no theories involved.
Bernoulli Distribution
- When the random variable has just two outcomes
- Probability of Drug/Medicine will be approved by government is p = 0.65
- Probability that it will not approve is 0.35
- Below formula works when we have probability available, in real life we estimate them from data :
- Mean = p
- Variance (Sigma Square) = p*(1-p)
- Parameters : p
- Probability evaluation P(x|params) = p if x = 1, (1-p) if x = 0
- MLE : p = n/N, where n = no of time 1 observed , N = no of experiments
- MLE = Maximum Likelihood Estimation
Binomial Distribution
- When you perform the Bernoulli experiment multiple times and want to see how many times certain outcome appears.
- For example you flip a coin(fair/biased) 10 time and probability that head will appear for x (1, 2, …..10) times.
- Another more practical example :
- Suppose oil price can increase by 3 bucks or decreased by 1 buck each day
- Probability of increasing p = 0.65, and that of decreasing = 0.35
- What price can we expect after three days
- Note (Increase, Increase, Decrease) and (Increase, Decrease, Increase) will give same price.
- (2,1) success -> 2 success 1 failure
- From another point of view it count no of successes in an experiment :
- No of patient responding to treatment
- Binary classification problem (Does not seem correct now, it should be Bernoulli, we take logit and sigmoid)
- Below formula works when we have probability available, in real life we estimate them from data :
- n = no of times experience is performed
- Mean = n*p
- Variance (Sigma Square) = n*p*(1-p)
- Example of binomial used in modeling :
- Parameters : n, p
- Probability evaluation P(x|params) = nCx * p^x * (1-p)^(1-x)
- MLE
- n = no of samples = N
- p = n/N where n = no of successes
- Interestingly MLE for binomial and multinomial distribution is very simple
Normal Distribution
- Very popular distribution
- Observed very often because of central limit theorem (CLT)
- Example :
- % change in a stock price of google from a previous day
- Heights and weights of persons
- Exam scores
- It is good to remember empirical numbers for normal distribution :
- 68 % – one standard deviation
- 95 % – two standard deviation
- 99.7 % – three standard deviation
- We use Z score as a distance in the unit of standard deviation from mean
- Parameters : μ, σ
- Probability estimation P(x|params) = 1/sqrt(2*pi*sigma^2) * exp(-(x-μ)^2/(2*sigma^2))
- MLE :
- μ = average (x)
- σ = sqrt((x – μ)^2/N-1)
Poisson Distribution
- Number events occurring in interval
- Interval can be time, distance, area
- an individual keeping track of the amount of mail they receive each day may notice that they receive an average number of 4 letters per day.
- number of phone calls received by a call centre per hour
- number of decay events per second from a radioactive source.
- number of file server virus infection at a data center during a 24-hour period
- https://www.quora.com/What-is-the-real-life-example-of-Poisson-distribution
- mean = variance = lambda (average no of events)
- For a fix region if we know the average no of events, it helps formulate probability for no of events.
- PDF(Probability Distribution Function) is a skewed curve
- There is just one parameter (lambda)
- While normal has two parameters (u and sigma)
- Bernoulli has just one parameter (p)
- Binomial has two parameters (n and p)
- Poisson regression example :
- Poisson distribution can be derived as special case of binomial distribution as n -> ∞
- Parameters : λ
- Probability Estimation P(x|params) = λ^x * e^(-λ) / x !
- Which is probability of observing x successes
- Or probability of observing x events
- MLE : λ = average (x)
Multi-nomial Distribution
- Binomial distribution has two parameters : n, p
- Multi-nomial distribution has n, p1, p2, p3, p4 (, . . . .pn)
- Its like throrwing a biased coin vs throwing biased dice
- Probability estimation :
- P(x1, x2, x3 | n, p1, p2, p3) = n!/(x1! x2! x3!) * p1^x1 * p2^x2 * p3^x3
- Above is probability of observing event1 x1 times, event2 x2 times
- In binomial we estimate probability of success x times
- From that we can easily determine probability of failure
- Parameter estimation
- n = No of samples (easy)
- p1 = x1/n, p2=x2/n, p3 = x3/n
- Interestingly MLE for binomial and multinomial distribution is very simple
- Derivation of above is constrained optimisation problem, solved using Lagrangian [2]
- Wikipedia
- For example, it models the probability of counts for rolling a k-sided die n times.
- When k is 2 and n is 1, the multinomial distribution is the Bernoulli distribution.
- When k is 2 and n is bigger than 1, it is the binomial distribution.
- When k is bigger than 2 and n is 1, it is the categorical distribution.
- It is k dimensional distribution
- It is joint distribution of k variables
T Distribution
- It has just one parameter called df (Degrees of Freedom)
- mean = 0
- std. deviation= sqrt(df/(df-2))
- As df increases it moves more and more toward standard normal curve
- There is a saying for (n>30) or (n>50) it becomes standard normal distribution only
- In general it is more wider than bell curve.
- Reason being from above formula std. deviation is always greater than 1
- For standard bell curve std. deviation = 1
- Area under t distribution is 1
- Parameter : df
- Probability Estimation P(x | params) = check Wikipedia
- MLE : df = N -1 where N is no of samples
Fitting the Distribution?
Fitting the distribution means, we are using some distribution as the model and we want to estimate the parameters. In case of Gaussian/Normal we estimate u and sigma, in case of poisson we estimate lambda.
This allows us to reason about data using few parameters. Another extreme end would be to store probability for each possible values.
What is probabilistic models ?
Models that propagate uncertainty of input to target variables are probabilistic models. Examples are :
- Regression
- Probability Trees
- Monte Carlo Simulations
- Markov chains
Further Reference :
[1] : MLE for various distributions : https://onlinecourses.science.psu.edu/stat504/node/28/

