On LDA, QDA

{In practice it is used more for classification than for regression}

This resemble gaussian mixture models in that you git one gaussian for each class. 
Don't forget one important difference though. LDA is supervised, Mixture models are unsupervised.

Linear Discriminant Analysis (LDA)

In logistic regression (LR), we estimate the posterior probability directly. In LDA we estimate likelihood and then use Bayes theorem. Calculating posterior using bayes theorem is easy in case of classification because hypothesis space is limited.

1234

Equation 2 computes probability of class k given x. This is a posterior instead of just point estimates.

Equation 4 is derived from equation 3 only. Probability(k) would be highest for the class for which Delta(k) will be highest.

LDA estimates mean and variance from data and uses equation 4 for classification.

5

We also need to estimate π_k, which I think would be n_k/N.

 

Assumptions made:

  • f(x) is normal
  • Variance(sigma) is same for all classes

 

When more than one predictor, we go for multivariate gaussian

67

Some comparisons

  • Compare this with mixture models, where there is a responsibility vector for each sample
    • There labels are not available (unsupervised learning) and hence is solved by EM (Expectation Maximization)
  • Compare this with naive bayes, there assumption is each feature is independent
    • Here we have parameter for each (class, feature), there we have parameter for each feature
    • Also here f captures probability of class (k) given x, there after bayes rules we calculate probability of x given class k
    • Hence the name naive bayes
    • Here we have joint distribution (multivariate gaussian, there it is independent distribution for each features)
    • Both LDA and navie bayes try to calculate posterior while logistic regression maximizes likelihood function

 

Quadratic Descriminant Analysis (QDA)

Unlike LDA, QDA assumes that each class has its own covariance matrix. It is called quadratic because below function is quadratic of x.

8

When to use LDA, QDA

  • This is related to bias variance trade-off
  • For p predict and k classes
    • LDA estimates k*p parameters
    • QDA estimates additional k*p*(p+1)/2 parameters
  • So LDA has much lower variance and classifier built can suffer from high bias
  • LDA should be used when number of training sample are less, because we want to avoid high variance problem
  • QDA has high variance, so it should be used when number of training samples are more
    • Another scenario would the case when common covariance matrix among K classes is untenable

 

A note on Fisher’s Linear Discriminant Analysis

  • It is simply LDA in case of two classes.
  • We can derive this similarity mathematically.
  • In literature we found it from the perspective that it project data on a line which achieves maximum separation
  • We can state without loss of generality that LDA also provides low dimensional view on data

 

Math

  • We want to project 2-D data on a line which
    • maximizes the difference between projected mean
    • minimizes within class variance
  • Such a direction (w) can be found by maximizing fisher criterion (J)

fisher1fisher2fisher3

fisher4fisher5fisher6fisher7

 

 

2 thoughts on “On LDA, QDA

Leave a comment