In standard linear regression we make two assumption :
- P(Y/X) is a normal distribution
- Mean is a linear function of parameter µ = β*X
- P(Y/X) = Ν(µ, σ^2* I) # σ is standard deviation and I is identity matrix
In GLM we relax two things :
- P(Y/X) is from any exponential family
- Mean is some function of β*X
- µ = f(β*X)
- g(µ) = β*X
- g = f^(-1)
- g is called link function
Example of link functions:
- log link
- reciprocal link
- logistic link
Derivation of log-likelihood matches that of normal distribution. However closed form solution is not defined and is generally solved by least square and convex optimization.
Here is one example from MIT course mentioned in references.

Logistic
- In gaussian regression we predict μ for each sample
- This μ comes from β0, β1, β2 which are same for each sample
- For binomial regression we want to predict p for each sample
- This p comes from β0, β1, β2 which are same for each sample
- One option :
- p = β0 + β1*x1 + β2*x2
- Second option
- p = sigmoid (β0 + β1*x1 + β2*x2)
- f(p) = log(p/(1-p)) = β0 + β1*x1 + β2*x2
- It is logit link function
- What are other options apart from sigmoid
- step function (Not differentiable, that is why we use (sigmoid)
- tanh is sometime used in deep learning
- What if we go with option 1:
- Binomial distribution requires p to be in (0,1)
- Example :
- How many fishes survive (alive/dead) given food and water
Poisson
- Poisson distribution models probability of observing count
-
- P(k) = exp(-λ) * (λ^k) / k !
Parameter λ >= 0
-
- Option one:
- λ = β0 + β1*x1 + β2*x2
- Option two:
- λ = exp ( β0 + β1*x1 + β2*x2 )
- f ( λ ) = log ( λ ) = ( β0 + β1*x1 + β2*x2 )
- It is log link function
- What if we go with option one:
- We want λ > 0
- Relationship between input and output is not additive but multiplicative ?
- Suppose the seeds have germinated as many as 1.5 times by the enough water and as many as 1.2 times by the enough fertilizer. When you give both enough water and enough fertilizer, the seeds would germinate as many as 1.5 + 1.2 = 2.7 times ?
Of course, it’s not. The estimated value would be 1.5 * 1.2 = 1.8 times. [3]
- Suppose the seeds have germinated as many as 1.5 times by the enough water and as many as 1.2 times by the enough fertilizer. When you give both enough water and enough fertilizer, the seeds would germinate as many as 1.5 + 1.2 = 2.7 times ?
- Example:
- How many seed will germinate given water and fertilizer
Parameter Estimation
- We can do maximum likelihood estimate and find parameters β0, β1, β2
- Deriving maximum likelihood for binomial:
- max_lh = Multiply (Likelihood of y_acutal of each sample for predicted distribution)
- max_lh = Multiply ( Binomial(p) )
- max_lh = Multiply ( p if y=1 else (1-p) )
- log(max_lh) = Summation (y*logp + (1-y) log (1-p))
- Deriving maximum likelihood for Poisson:
- max_lh = Multiply (Likelihood of y_acutal of each sample for predicted distribution)
- max_lh = Multiply ( Poisson (u) )
- max_lh = Multiply (exp(-u) * u^y / y! )
- log(max_lh) = summation ( -u + y*log(u) – log (y!) )
- Above two are rough derivations but conveys the idea
- For Gaussian it turns out to OLS (Ordinary Least Squares) and has closed form solution
- For other we solve it via gradient/newton’s method.
References :
[1] Wonderful MIT lecture : https://www.youtube.com/watch?v=X-ix97pw0xY
[2] https://onlinecourses.science.psu.edu/stat504/node/216/






















