Generalized Linear Models (GLM)

In standard linear regression we make two assumption :

  1. P(Y/X) is a normal distribution
  2. Mean is a linear function of parameter µ  = β*X
  3. P(Y/X) = Ν(µ, σ^2* I)       # σ is standard deviation and I is identity matrix

 

In GLM we relax two things :

  1. P(Y/X) is from any exponential family
  2. Mean is some function of β*X
    1. µ = f(β*X)
    2. g(µ) = β*X
    3. g = f^(-1)
    4. g is called link function

 

Example of link functions:

  1. log link
  2. reciprocal link
  3. logistic link

 

Derivation of log-likelihood matches that of normal distribution. However closed form solution is not defined and is generally solved by least square and convex optimization.

Here is one example from MIT course mentioned in references.

poisson

Logistic

  • In gaussian regression we predict μ for each sample
    • This μ comes from β0, β1, β2 which are same for each sample
  • For binomial regression we want to predict p for each sample
    • This p comes from β0, β1, β2 which are same for each sample
  • One option :
    • p = β0 + β1*x1 + β2*x2
  • Second option
    • p = sigmoid (β0 + β1*x1 + β2*x2)
    • f(p) = log(p/(1-p)) = β0 + β1*x1 + β2*x2
    • It is logit link function
  • What are other options apart from sigmoid
    • step function (Not differentiable, that is why we use (sigmoid)
    • tanh is sometime used in deep learning
  • What if we go with option 1:
    • Binomial distribution requires p to be in (0,1)
  • Example :
    • How many fishes survive (alive/dead) given food and water

 

Poisson

  • Poisson distribution models probability of observing count
      • P(k) = exp(-λ) * (λ^k) / k !

    Parameter λ >= 0

  • Option one:
    • λ = β0 + β1*x1 + β2*x2
  • Option two:
    • λ = exp ( β0 + β1*x1 + β2*x2 )
    • f ( λ ) = log ( λ ) = ( β0 + β1*x1 + β2*x2 )
    • It is log link function
  • What if we go with option one:
    • We want λ > 0
    • Relationship between input and output is not additive but multiplicative ?
      • Suppose the seeds have germinated as many as 1.5 times by the enough water and as many as 1.2 times by the enough fertilizer. When you give both enough water and enough fertilizer, the seeds would germinate as many as 1.5 + 1.2 = 2.7 times ?
        Of course, it’s not. The estimated value would be 1.5 * 1.2 = 1.8 times. [3]
  • Example:
    • How many seed will germinate given water and fertilizer

 

Parameter Estimation

  • We can do maximum likelihood estimate and find parameters β0, β1, β2
  • Deriving maximum likelihood for binomial:
    • max_lh = Multiply (Likelihood of y_acutal of each sample for predicted distribution)
    • max_lh = Multiply ( Binomial(p) )
    • max_lh = Multiply ( p if y=1 else (1-p) )
    • log(max_lh) = Summation (y*logp + (1-y) log (1-p))
  • Deriving maximum likelihood for Poisson:
    • max_lh = Multiply (Likelihood of y_acutal of each sample for predicted distribution)
    • max_lh = Multiply ( Poisson (u) )
    • max_lh = Multiply (exp(-u) * u^y / y! )
    • log(max_lh) = summation ( -u + y*log(u) – log (y!) )
  • Above two are rough derivations but conveys the idea
  • For Gaussian it turns out to OLS (Ordinary Least Squares) and has closed form solution
  • For other we solve it via gradient/newton’s method.

 

References :

[0] https://ocw.mit.edu/courses/mathematics/18-650-statistics-for-applications-fall-2016/lecture-slides/MIT18_650F16_GLM.pdf

[1] Wonderful MIT lecture : https://www.youtube.com/watch?v=X-ix97pw0xY

[2] https://onlinecourses.science.psu.edu/stat504/node/216/

[3] https://tsmatz.wordpress.com/2017/08/30/glm-regression-logistic-poisson-gaussian-gamma-tutorial-with-r/

 

Exponential Family

Here is the basic concept :

simple_exponential

  • θ are parameters and X is data, both can be multidimensional
  • We want to restrict terms inside exponential to the form θ*X

 

Formal Definition:

formal_defination

  • η and T functions also help for the case when there is a mismatch in the dimension of θ and X.
  • g(θ) in basic concept above has been taken into exponential as B(θ).
    • It vaguely serves as normalization factor.
  • h(x) serves a distribution and exponential transfer this basic distribution

 

examples

 

Further Reading

Click to access chapter8.pdf

Click to access lecture12.pdf