Probability Distribution

We have learned various probability distribution during high school and engineering courses. However at times we forget them, so here I am providing simple practical scenarios for each distribution with no theories involved.

Bernoulli Distribution

When the random variable has just two outcomes
Probability of Drug/Medicine will be approved by government is p = 0.65
- Probability that it will not approve is 0.35
Below formula works when we have probability available, in real life we estimate them from data :
- Mean = p
- Variance (Sigma Square) = p*(1-p)
Parameters : p
Probability evaluation P(x|params) = p if x = 1, (1-p) if x = 0
MLE : p = n/N, where n = no of time 1 observed , N = no of experiments
MLE = Maximum Likelihood Estimation

Binomial Distribution

When you perform the Bernoulli experiment multiple times and want to see how many times certain outcome appears.
For example you flip a coin(fair/biased) 10 time and probability that head will appear for x (1, 2, …..10) times.
Another more practical example :
- Suppose oil price can increase by 3 bucks or decreased by 1 buck each day
- Probability of increasing p = 0.65, and that of decreasing = 0.35
- What price can we expect after three days
- Note (Increase, Increase, Decrease) and (Increase, Decrease, Increase) will give same price.
  - (2,1) success -> 2 success 1 failure
From another point of view it count no of successes in an experiment :
- No of patient responding to treatment
- Binary classification problem (Does not seem correct now, it should be Bernoulli, we take logit and sigmoid)
Below formula works when we have probability available, in real life we estimate them from data :
- n = no of times experience is performed
- Mean = n*p
- Variance (Sigma Square) = n*p*(1-p)
Example of binomial used in modeling :
- https://people.duke.edu/~ccc14/sta-663/PyStan.html#estimating-parameters-of-a-logistic-model
Parameters : n, p
Probability evaluation P(x|params) = nCx * p^x * (1-p)^(1-x)
MLE
- n = no of samples = N
- p = n/N where n = no of successes
- Interestingly MLE for binomial and multinomial distribution is very simple

Normal Distribution

Very popular distribution
Observed very often because of central limit theorem (CLT)
Example :
- % change in a stock price of google from a previous day
- Heights and weights of persons
- Exam scores
It is good to remember empirical numbers for normal distribution :
- 68 % – one standard deviation
- 95 % – two standard deviation
- 99.7 % – three standard deviation
We use Z score as a distance in the unit of standard deviation from mean
Parameters : μ, σ
Probability estimation P(x|params) = 1/sqrt(2*pi*sigma^2) * exp(-(x-μ)^2/(2*sigma^2))
MLE :
- μ = average (x)
- σ = sqrt((x – μ)^2/N-1)

Poisson Distribution

Number events occurring in interval
- Interval can be time, distance, area
- an individual keeping track of the amount of mail they receive each day may notice that they receive an average number of 4 letters per day.
- number of phone calls received by a call centre per hour
- number of decay events per second from a radioactive source.
- number of file server virus infection at a data center during a 24-hour period
- https://www.quora.com/What-is-the-real-life-example-of-Poisson-distribution
mean = variance = lambda (average no of events)
For a fix region if we know the average no of events, it helps formulate probability for no of events.
PDF(Probability Distribution Function) is a skewed curve
There is just one parameter (lambda)
- While normal has two parameters (u and sigma)
- Bernoulli has just one parameter (p)
- Binomial has two parameters (n and p)
Poisson regression example :
- http://www.clayford.net/statistics/poisson-regression-ch-6-of-gelman-and-hill/
- https://docs.pymc.io/notebooks/GLM-poisson-regression.html
Poisson distribution can be derived as special case of binomial distribution as n -> ∞
- https://medium.com/@andrew.chamberlain/deriving-the-poisson-distribution-from-the-binomial-distribution-840cc1668239
Parameters : λ
Probability Estimation P(x|params) = λ^x * e^(-λ) / x !
- Which is probability of observing x successes
- Or probability of observing x events
MLE : λ = average (x)

Multi-nomial Distribution

Binomial distribution has two parameters : n, p
- Multi-nomial distribution has n, p1, p2, p3, p4 (, . . . .pn)
- Its like throrwing a biased coin vs throwing biased dice
Probability estimation :
- P(x1, x2, x3 | n, p1, p2, p3) = n!/(x1! x2! x3!) * p1^x1 * p2^x2 * p3^x3
- Above is probability of observing event1 x1 times, event2 x2 times
- In binomial we estimate probability of success x times
  - From that we can easily determine probability of failure
Parameter estimation
- n = No of samples (easy)
- p1 = x1/n, p2=x2/n, p3 = x3/n
- Interestingly MLE for binomial and multinomial distribution is very simple
- Derivation of above is constrained optimisation problem, solved using Lagrangian [2]
Wikipedia
- For example, it models the probability of counts for rolling a k-sided die n times.
- When k is 2 and n is 1, the multinomial distribution is the Bernoulli distribution.
- When k is 2 and n is bigger than 1, it is the binomial distribution.
- When k is bigger than 2 and n is 1, it is the categorical distribution.
It is k dimensional distribution
- It is joint distribution of k variables

T Distribution

It has just one parameter called df (Degrees of Freedom)
mean = 0
std. deviation= sqrt(df/(df-2))
As df increases it moves more and more toward standard normal curve
- There is a saying for (n>30) or (n>50) it becomes standard normal distribution only
In general it is more wider than bell curve.
- Reason being from above formula std. deviation is always greater than 1
- For standard bell curve std. deviation = 1
Area under t distribution is 1
Parameter : df
Probability Estimation P(x | params) = check Wikipedia
MLE : df = N -1 where N is no of samples

Fitting the Distribution?

Fitting the distribution means, we are using some distribution as the model and we want to estimate the parameters. In case of Gaussian/Normal we estimate u and sigma, in case of poisson we estimate lambda.

This allows us to reason about data using few parameters. Another extreme end would be to store probability for each possible values.

What is probabilistic models ?

Models that propagate uncertainty of input to target variables are probabilistic models. Examples are :

Regression
Probability Trees
Monte Carlo Simulations
Markov chains

Further Reference :

[0] : stats.stackexchange

[1] : MLE for various distributions : https://onlinecourses.science.psu.edu/stat504/node/28/

[2] : https://math.stackexchange.com/questions/421105/maximum-likelihood-estimator-of-parameters-of-multinomial-distribution

Data Stories

Probability Distribution

Bernoulli Distribution

Binomial Distribution

Normal Distribution

Poisson Distribution

Multi-nomial Distribution

T Distribution

Fitting the Distribution?

What is probabilistic models ?

Further Reference :

Leave a comment Cancel reply

Bernoulli Distribution

Binomial Distribution

Normal Distribution

Poisson Distribution

Multi-nomial Distribution

T Distribution

Fitting the Distribution?

What is probabilistic models ?

Further Reference :

Share this:

Related

Leave a comment Cancel reply