On LDA, QDA

{In practice it is used more for classification than for regression}

This resemble gaussian mixture models in that you git one gaussian for each class. 
Don't forget one important difference though. LDA is supervised, Mixture models are unsupervised.

Linear Discriminant Analysis (LDA)

In logistic regression (LR), we estimate the posterior probability directly. In LDA we estimate likelihood and then use Bayes theorem. Calculating posterior using bayes theorem is easy in case of classification because hypothesis space is limited.

Equation 2 computes probability of class k given x. This is a posterior instead of just point estimates.

Equation 4 is derived from equation 3 only. Probability(k) would be highest for the class for which Delta(k) will be highest.

LDA estimates mean and variance from data and uses equation 4 for classification.

We also need to estimate π_k, which I think would be n_k/N.

Assumptions made:

f(x) is normal
Variance(sigma) is same for all classes

When more than one predictor, we go for multivariate gaussian

Some comparisons

Compare this with mixture models, where there is a responsibility vector for each sample
- There labels are not available (unsupervised learning) and hence is solved by EM (Expectation Maximization)
Compare this with naive bayes, there assumption is each feature is independent
- Here we have parameter for each (class, feature), there we have parameter for each feature
- Also here f captures probability of class (k) given x, there after bayes rules we calculate probability of x given class k
- Hence the name naive bayes
- Here we have joint distribution (multivariate gaussian, there it is independent distribution for each features)
- Both LDA and navie bayes try to calculate posterior while logistic regression maximizes likelihood function

Quadratic Descriminant Analysis (QDA)

Unlike LDA, QDA assumes that each class has its own covariance matrix. It is called quadratic because below function is quadratic of x.

When to use LDA, QDA

This is related to bias variance trade-off
For p predict and k classes
- LDA estimates k*p parameters
- QDA estimates additional k*p*(p+1)/2 parameters
So LDA has much lower variance and classifier built can suffer from high bias
LDA should be used when number of training sample are less, because we want to avoid high variance problem
QDA has high variance, so it should be used when number of training samples are more
- Another scenario would the case when common covariance matrix among K classes is untenable

A note on Fisher’s Linear Discriminant Analysis

It is simply LDA in case of two classes.
We can derive this similarity mathematically.
In literature we found it from the perspective that it project data on a line which achieves maximum separation
We can state without loss of generality that LDA also provides low dimensional view on data

Math

We want to project 2-D data on a line which
- maximizes the difference between projected mean
- minimizes within class variance
Such a direction can be found by maximizing fisher criterion (J)

fisher1 fisher2

Data Stories

On LDA, QDA