Chi Square Test

The chisquare independence test is a procedure for testing if two categorical variables are related in some population.

Here is handwritten example : https://github.com/arcarchit/datastories/blob/master/notes/chi2.pdf

Chi square distribution

  • Chi square distribution
    • Squaring samples from standard normal distribution [0]
    • Distribution changes with degrees of freedom
    • When DoF = 1 it is more concentrated around 0
  • It is distribution is sum of squares
    • When dice is biased sum of squares will be higher. Hence more significant.
    • When it is fair it will be closed to zero. Difference is with expected value.
d1
d2

Chi Square Test for Equality of Proportions

h1

Chi square vs T test

  • When to use which one
  • T-test is used to compare mean of two distributions
  • Chi square is used to check whether observation gathered of categorical data meets the assumption

Chi Square for goodness of fit testing

  • Chi Square Goodness of fit
    • Restaurant example
    • H0 = Percentage given by customer is correct
  • We calculate expected for each cell and calculate chi^2

Chi Square for relationship testing

  • H0 : Variables are independent of each other
  • It helps testing if two categorical variables are related
  • Calculate Chi square statistics by summing all cells and check against degree’s of freedom
  • Examples
    • Hypothesis testing :
      • H0 = Herbs1, Herb2, placebo are same
      • H0 = Herbs do nothing
      • We can’t say herb does nothing
        • We are working on accumulated data here
        • Whereas ANOVA is about variancei1
    • Homogeneity testing  :
      • H0 = Left and Right handed people have same preference for arts, science
      • H0 = Preference of arts/science is independent of natural hand left/right
      • H0 = Variables are independent
      • Filling up table
        • P(STEM | right) = P(STEM)
        • x / 60 = 40/100 => x = 40 * 60 / 100 = 24
        • We can also say that value of cell is product of marginals divide by total
      • Degrees of freedom = (r-1)*(c-1)  = 2 * 1 = 2
i3

References

[0] : https://www.khanacademy.org/math/statistics-probability/inference-categorical-data-chi-square-tests#chi-square-goodness-of-fit-tests

[1] : https://www.khanacademy.org/math/statistics-probability/inference-categorical-data-chi-square-tests#chi-square-goodness-of-fit-tests

[2] : https://biology.stackexchange.com/questions/13486/deciding-between-chi-square-and-t-test

[3] : https://fhssrsc.byu.edu/SitePages/ANOVA,%20t-tests,%20Regression,%20and%20Chi%20Square.aspx

Probability Rules and Tricky Questions

Introduction:

Understanding probability rules and solving tricky probability questions can be challenging. In this blog post, we will explore key probability rules and discuss solutions to some intriguing questions.

Probability Rules:

  1. Joint Distribution: The probability of events X and Y occurring together is denoted as p(X, Y) and is known as the joint distribution.
  2. Conditional Distribution: The probability of event X given event Y is denoted as p(X/Y) and is known as the conditional distribution.
  3. Marginal Distribution: The probability of event X, with event Y marginalized out, is denoted as p(X) and is known as the marginal distribution.

Operations:

  • Making Conditional Distribution: To obtain the conditional distribution, normalization is required.
  • Marginalization: Marginalization does not require normalization.

Note: It is not possible to derive the conditional distribution from the joint distribution solely through integration. There is no direct relationship between them.

There are just two rules for probability. Sum rule and product rules. And then there is Bayes theorem. Bayes theorem can be derived from product rule and the fact that P(x,y) = P(y,x)

p1
p2
p3

We might want to look at a table like below and calculate joint and conditional distribution and marginalized out one of the variable. [1]

prob

Probability Tricky Question

This questions are taken from [2]. One key to solve this question is write down the sample space and keep eliminating choices. Don’t conclude in hurry.

Q1 :  A man comes up to you on the street and says: I have two children. At least one of them is a boy. What is the probability that the other child is also a boy?

Q2 : I have two kids, what are the odds I have 2 boys?

Q3 : A man comes up to you on the street and says: I have two children. The older one is a boy. What is the probability that the other child is also a boy?

Q4 : A man comes up to you on the street and says: I have two children. One is the boy standing here next to me. What is the probability that the other child is also a boy?

Q5 : Q. A man comes up to you on the street and says: I have two children. One of them is a boy who was born in the summer. What is the probability that the other child is also a boy? (There are four seasons : spring, summer, fall, winter)[0]

Ans1 : (1/3)

  • P(BG) is 1/2 and p(BB) = 1/4 in the universe

Ans2 : (1/4)

Ans3 : (1/2)

Ans4 : (1/2)

Ans5 : (7/15) [0]

Compare Q1 and Q5. Odd increases. Being born in summer is rare thing. If that rare thing has occurred there are higher chances of having two boys.

A bag contains (x) one rupee coins and (y) 50 paise coins. Four coins are taken from the bag and put away.
If a coin is now taken at random from the bag, what is the probability that it is a one rupee coin?

Ans is x/(x+y). It will remain same if we take either 1/2/3/4/5 coins because we don’t know which coin has been withdrawn. It is like trying out all possibilities and when we sum, it would come out as 1 only. [4]

The probability of a car passing a certain intersection in a 20 minute windows is 0.9. What is the probability of a car passing the intersection in a 5 minute window? (Assuming a constant probability throughout)

Ans : 0.4377 [5]

Independent Events

  • Mutually exclusive events means dependent event
  • For independent event = P(A/B) = P(A)
  • For mutually exclusive event if we know B has occurred, A will never occur.
independent_event

If two random variables, X and Y, are independent, they satisfy the following conditions.

  • P(x|y) = P(x), for all values of X and Y.
  • P(X, Y) = P(x y) = P(x) * P(y), for all values of X and Y.

Here is an example from [6]. Ans is that X and Y are independent, A and B are not.

prob_independent
prod_inde

Further reading : 

[0] : https://math.stackexchange.com/questions/198713/why-is-the-probability-of-having-2-boys-7-15

[1]:https://www.coursera.org/learn/probabilistic-graphical-models/lecture/slSLb/distributions

[2] : http://adit.io/posts/2017-12-05-A-Mind-Boggling-Probability-Problem.html

[3] : https://en.wikipedia.org/wiki/Boy_or_Girl_paradox

[4] : http://moorthythanu.blogspot.com/2016/03/probability-of-getting-one-rupee-coin.html

[5] : https://math.stackexchange.com/questions/1016268/probability-of-crossing-a-point-in-a-given-time-window

[6] : https://stattrek.com/random-variable/independence.aspx