exponential family

In standard linear regression we make two assumption :

P(Y/X) is a normal distribution
Mean is a linear function of parameter µ = β*X
P(Y/X) = Ν(µ, σ^2* I) # σ is standard deviation and I is identity matrix

In GLM we relax two things :

P(Y/X) is from any exponential family
Mean is some function of β*X
1. µ = f(β*X)
2. g(µ) = β*X
3. g = f^(-1)
4. g is called link function

Example of link functions:

log link
reciprocal link
logistic link

Derivation of log-likelihood matches that of normal distribution. However closed form solution is not defined and is generally solved by least square and convex optimization.

Here is one example from MIT course mentioned in references.

poisson

Logistic

In gaussian regression we predict μ for each sample
- This μ comes from β0, β1, β2 which are same for each sample
For binomial regression we want to predict p for each sample
- This p comes from β0, β1, β2 which are same for each sample
One option :
- p = β0 + β1*x1 + β2*x2
Second option
- p = sigmoid (β0 + β1*x1 + β2*x2)
- f(p) = log(p/(1-p)) = β0 + β1*x1 + β2*x2
- It is logit link function
What are other options apart from sigmoid
- step function (Not differentiable, that is why we use (sigmoid)
- tanh is sometime used in deep learning
What if we go with option 1:
- Binomial distribution requires p to be in (0,1)
Example :
- How many fishes survive (alive/dead) given food and water

Poisson

Poisson distribution models probability of observing count
- - P(k) = exp(-λ) * (λ^k) / k !
Parameter λ >= 0
Option one:
- λ = β0 + β1*x1 + β2*x2
Option two:
- λ = exp ( β0 + β1*x1 + β2*x2 )
- f ( λ ) = log ( λ ) = ( β0 + β1*x1 + β2*x2 )
- It is log link function
What if we go with option one:
- We want λ > 0
- Relationship between input and output is not additive but multiplicative ?
  - Suppose the seeds have germinated as many as 1.5 times by the enough water and as many as 1.2 times by the enough fertilizer. When you give both enough water and enough fertilizer, the seeds would germinate as many as 1.5 + 1.2 = 2.7 times ?
    Of course, it’s not. The estimated value would be 1.5 * 1.2 = 1.8 times. [3]
Example:
- How many seed will germinate given water and fertilizer

Parameter Estimation

We can do maximum likelihood estimate and find parameters β0, β1, β2
Deriving maximum likelihood for binomial:
- max_lh = Multiply (Likelihood of y_acutal of each sample for predicted distribution)
- max_lh = Multiply ( Binomial(p) )
- max_lh = Multiply ( p if y=1 else (1-p) )
- log(max_lh) = Summation (y*logp + (1-y) log (1-p))
Deriving maximum likelihood for Poisson:
- max_lh = Multiply (Likelihood of y_acutal of each sample for predicted distribution)
- max_lh = Multiply ( Poisson (u) )
- max_lh = Multiply (exp(-u) * u^y / y! )
- log(max_lh) = summation ( -u + y*log(u) – log (y!) )
Above two are rough derivations but conveys the idea
For Gaussian it turns out to OLS (Ordinary Least Squares) and has closed form solution
For other we solve it via gradient/newton’s method.