Central limit theorem

What does CLT says ?

  • Sum of random samples forms normal distribution
    • This samples may not come from normal distribution
  • Sum forming random distribution implies that mean would also form normal distribution

Straight facts

  • Central limit theorem helps getting confidence interval for parameters
  • It works for all distributions when n > 30
  • For normal distribution it works even if n < 30
  • Why do we need to have distribution
    • To make variance estimation stable
    • We want to have just one unknown that is mean
    • We need to test normality of samples before applying t-test

Slide from MIT course : https://ocw.mit.edu/courses/mathematics/18-650-statistics-for-applications-fall-2016/

sigma / (sqrt(n)) is standard error of mean. We are saying this distribution reaches to standard normal distribution.

Law of Larger Number

  • As a sample size grows, its mean gets closer to the average of the whole population. This is due to the sample being more representative of the population

Example:

  • During significance testing we calculate left hand side. For examples testing fairness of coin that number comes out to be 3.54. Now for standard normal 3*sigma = 3*1 = 3 is 99 % of area. We are further away than it. So we can reject null hypothesis. [1]
  • Thing to understand is that distribution of Bernoulli parameter(p) is normal.
  • We are not saying how far observed mean is from 0.5 in Bernoulli distribution. If we were doing that we would not have used sqrt(n).
    • Also more importantly Bernoulli can take only two values 0 and 1. From that perspective as well it does not make sense.
    • See the equation in the slide below in central limit theorem. It is a normal distribution N(0,1).

Refereces

[0] : Slide from MIT course : https://ocw.mit.edu/courses/mathematics/18-650-statistics-for-applications-fall-2016/

[1] : https://ocw.mit.edu/courses/18-650-statistics-for-applications-fall-2016/resources/mit18_650f16_parametric_ht/

Leave a comment