Time series week 1

March 10, 2018March 30, 2018Archit Vora Leave a comment

Plotting in R
Linear regression properly fitted or not
- Residue are important thing to observed
- Q-Q plots for normality test
- Residues over time
  - Zoomed in residues over time
Hypothesis test
- One, two sided t test
- Confidence interval
  - Where we think mean lies
  - If it dose not contain 0 we tend to reject null hypothesis (Very broad statement, but I think you got the concept)
Correlation function
- Which quarter data false

Ref : https://www.coursera.org/learn/practical-time-series-analysis/home/welcome

Parametric and Nonparametric tests

July 24, 2017October 25, 2020Archit Vora Leave a comment

We rarely heard of nonparametric tests while reading standard statistical books. However there are some scenarios where they should be used instead of parametric tests. [1] has beautiful blog about it, I am putting just a summary from that.

Different Tests

Table below displays various tests, I have verified that all of these tests are available in python stats package.

tests_1

When to Use Parametric Tests

Parametric tests can perform well with skewed and nonnormal distributions
- It is important to follow guidance in the sample size of data as shown in table below
Parametric tests can perform well when the spread/variance of each group is different
It has Statistical power

tests_2

Reasons to Use Nonparametric Tests

Your area of study is better represented by the median
- Income distribution is skewed and median is more useful than mean
- Few billionaires can boost up the mean significantly
You have a very small sample size
- Even less than what is mentioned in table above
You have ordinal data, ranked data, or outliers that you can’t remove

References:

[1] : http://blog.minitab.com/blog/adventures-in-statistics-2/choosing-between-a-nonparametric-test-and-a-parametric-test

On multivariate Gaussian

June 23, 2017May 16, 2023Archit Vora Leave a comment

Formulas

Formula for multivariate gaussian distribution

Formula of univariate gaussian distribution

Notes:

There is normality constant in both equations
Σ being a positive definite ensure quadratic bowl is downwards
σ2 also being positive ensure that parabola is downwards

On Covariance Matrix

Definition of covariance between two vectors:

When we have more than two variable we present them in matrix form. So covariance matrix will look like

Above is very similar to how we compute sigma^2 in 1-D = (x – mu)^2
Formula of multivariate gaussian distribution demands Σ to be singular and symmetric positive semidefinite, which in terms means sigma will be symmetric positive semidefinite.
For some data above demands might not meet

Side Note

Covariance is directional measure
Correlation is scaled measure
- We normalise by individual variance

Derivations

Following derivations are available at [0]:

We can prove[0] that when covariance matrix is diagonal (i.e there is variables are independent) multivariate gaussian distribution is simply multiplication of single gaussian distribution of each variable.
It was derived that shape of isocontours (figure 1) is elliptical and axis length is proportional to individual variance of that variable
Above is true even when covariance matrix is not diagonal and for dimension n>2 (ellipsoids)

Notes and example of bi-variant Gaussian

https://github.com/arcarchit/datastories/blob/master/notes/bivariant_gaussian.pdf

First part above says that bi-variant destitution can be generated from two standard normal distribution z = N(0,1).

For any given k-variant Gaussian we can represent it as linear combination of k standard normal distribution. One simpler way to find these coefficient is Cholesky decomposition. Theorem 1 below stats the same thing.

This has a reference from [1].

Linear Transformation Interpretation

This was proved in two steps [0]:

Step-1 : Factorizing covariance matrix

Step-2 : Change of variables, which we apply to density function

On Practical Example

Height, wight and waist size of men in US (Of course it weight can be negative, so it is approximately normal)

References

[0] http://cs229.stanford.edu/section/gaussians.pdf

[1] https://www2.stat.duke.edu/courses/Spring12/sta104.1/Lectures/Lec22.pdf

Probability Rules and Tricky Questions

June 21, 2017June 13, 2026Archit Vora Leave a comment

Introduction:

Understanding probability rules and solving tricky probability questions can be challenging. In this blog post, we will explore key probability rules and discuss solutions to some intriguing questions.

Probability Rules:

Joint Distribution: The probability of events X and Y occurring together is denoted as p(X, Y) and is known as the joint distribution.
Conditional Distribution: The probability of event X given event Y is denoted as p(X/Y) and is known as the conditional distribution.
Marginal Distribution: The probability of event X, with event Y marginalized out, is denoted as p(X) and is known as the marginal distribution.

Operations:

Making Conditional Distribution: To obtain the conditional distribution, normalization is required.
Marginalization: Marginalization does not require normalization.

Note: It is not possible to derive the conditional distribution from the joint distribution solely through integration. There is no direct relationship between them.

There are just two rules for probability. Sum rule and product rules. And then there is Bayes theorem. Bayes theorem can be derived from product rule and the fact that P(x,y) = P(y,x)

We might want to look at a table like below and calculate joint and conditional distribution and marginalized out one of the variable. [1]

Probability Tricky Question

This questions are taken from [2]. One key to solve this question is write down the sample space and keep eliminating choices. Don’t conclude in hurry.

Q1 : A man comes up to you on the street and says: I have two children. At least one of them is a boy. What is the probability that the other child is also a boy?

Q2 : I have two kids, what are the odds I have 2 boys?

Q3 : A man comes up to you on the street and says: I have two children. The older one is a boy. What is the probability that the other child is also a boy?

Q4 : A man comes up to you on the street and says: I have two children. One is the boy standing here next to me. What is the probability that the other child is also a boy?

Q5 : Q. A man comes up to you on the street and says: I have two children. One of them is a boy who was born in the summer. What is the probability that the other child is also a boy? (There are four seasons : spring, summer, fall, winter)[0]

Ans1 : (1/3)

P(BG) is 1/2 and p(BB) = 1/4 in the universe

Ans2 : (1/4)

Ans3 : (1/2)

Ans4 : (1/2)

Ans5 : (7/15) [0]

Compare Q1 and Q5. Odd increases. Being born in summer is rare thing. If that rare thing has occurred there are higher chances of having two boys.

A bag contains (x) one rupee coins and (y) 50 paise coins. Four coins are taken from the bag and put away.
If a coin is now taken at random from the bag, what is the probability that it is a one rupee coin?

Ans is x/(x+y). It will remain same if we take either 1/2/3/4/5 coins because we don’t know which coin has been withdrawn. It is like trying out all possibilities and when we sum, it would come out as 1 only. [4]

The probability of a car passing a certain intersection in a 20 minute windows is 0.9. What is the probability of a car passing the intersection in a 5 minute window? (Assuming a constant probability throughout)

Ans : 0.4377 [5]

Independent Events

Mutually exclusive events means dependent event
For independent event = P(A/B) = P(A)
For mutually exclusive event if we know B has occurred, A will never occur.

If two random variables, X and Y, are independent, they satisfy the following conditions.

P(x|y) = P(x), for all values of X and Y.

P(X, Y) = P(x ∩ y) = P(x) * P(y), for all values of X and Y.

Here is an example from [6]. Ans is that X and Y are independent, A and B are not.

Further reading :

[0] : https://math.stackexchange.com/questions/198713/why-is-the-probability-of-having-2-boys-7-15

[1]:https://www.coursera.org/learn/probabilistic-graphical-models/lecture/slSLb/distributions

[2] : http://adit.io/posts/2017-12-05-A-Mind-Boggling-Probability-Problem.html

[3] : https://en.wikipedia.org/wiki/Boy_or_Girl_paradox

[4] : http://moorthythanu.blogspot.com/2016/03/probability-of-getting-one-rupee-coin.html

[5] : https://math.stackexchange.com/questions/1016268/probability-of-crossing-a-point-in-a-given-time-window

[6] : https://stattrek.com/random-variable/independence.aspx

Hypothesis and T-Distribution

March 14, 2017May 17, 2023Archit Vora Leave a comment

We calculate the t-score using hypothesis data, which also provides us with the degrees of freedom. This value is then supplied to a function that gives us the probability of the hypothesis being true.

The t-test can be seen as a ratio, similar to a signal-to-noise ratio. The numerator allows us to center it around zero, while the denominator represents the standard error of the mean (SEM) calculated as s/sqrt(n), where s is the standard deviation of the samples.

The t-score indicates how many SEM the current mean is away from the mean given in the hypothesis. If it is far away, it suggests a low probability of the null-hypothesis-mean being true, leading us to reject the null hypothesis.

In engineering, we typically assume that the mean and standard deviation are given and true, and we compute the probability of observing the sample. However, in hypothesis testing with a small number of samples, we are testing whether the given mean is true or not.

To address this, we need a distribution that adjusts itself based on the number of observations, widening when there are fewer samples. The t-distribution serves this purpose, as it is dependent on the sample size.

There are different types of t-tests:

One-sample t-test: Compares the mean of a sample with a known population mean.
- Discussion so far is for one sample test
Two-sample t-test: Compares the means of two independent groups.
- To compare means of two independent groups
- Scores of student who get 8 hour sleep vs four hour sleep
- Question we want to answer is are there any significant difference in there scores?
- In one sample test (In numerator of t-score) we are comparing sample mean with population mean
- In two sample test it compares means of two independently drawn sample
- And in denominator as well SEM formula is modified
- Example
  - A/B testing on e-commerce site where you compare CTR before and after
    - This is two sample because you don’t have standard value of CTR before the feature
    - Even you will see some difference in AA test
Paired t-test: Compares the means of two conditions using the same samples.
- This is essentially a one-sample t-test on the differences between values at two conditions.
- Same samples are used in two different conditions
- 10 people before medication and same 10 people after medication
  - We want to check if medication has any effect
- Different time points are used for market calculation
- This essentially is a one sample T-test on the differences of value at two different conditions
- Example
  - Interleaving test in e-commence search system
  - For each search page you will assign some score to control and variant
One-sided t-test: Tests a hypothesis in one direction (e.g., weight of dairy milk is less than 100g).
Two-sided t-test: Tests a hypothesis in both directions (e.g., weight of dairy milk is not equal to 100g).

P-values represent the probability of finding the observed or more extreme results when the null hypothesis is true. It is described in terms of rejecting the null hypothesis when it is actually true, but it is not a direct probability of this state.

For further examples and details, you can refer to the following link: Example Link

Q – Q Plots

February 4, 2017August 21, 2020Archit Vora Leave a comment

update :

Q-Q plots can be used to test goodness of fit for any distribution. There are p-p plots and other method as well.

https://stats.stackexchange.com/questions/132652/how-to-determine-which-distribution-fits-my-data-best

There is also a P-P plot.

https://en.wikipedia.org/wiki/P%E2%80%93P_plot

Since few days I was coming across to Q-Q plots very often and thought to learn more about it.

Full form is Quartile-Quantile plot.

Many a times we want our data to be normal, this is because we normality is an assumption behind many statistical models. Now how to test normality. Wikipedia has an article about this which lists many method, one of them is Q-Q plots.

Here is how to create Q-Q plot manually (This steps will show the theory behind it):

Sort your samples (Call it Y). Let n be no of sample. n = len(Y)
Find n values(quartiles) from standard normal distribution to divide it in (n+1) equal areas
1. Standard normal distribution is a distribution with mean = 0 and standard deviation = 1
Call above X
Plot Y against X
For normal distribution it would be approximately straight line
1. However considering probability, outliers and no of smaples have role to play

Here is the example code and plots in pythons:

maxresdefault SnLhAm

Reference :

You tube video

Probability Distribution

February 4, 2017August 29, 2020Archit Vora Leave a comment

We have learned various probability distribution during high school and engineering courses. However at times we forget them, so here I am providing simple practical scenarios for each distribution with no theories involved.

Bernoulli Distribution

When the random variable has just two outcomes
Probability of Drug/Medicine will be approved by government is p = 0.65
- Probability that it will not approve is 0.35
Below formula works when we have probability available, in real life we estimate them from data :
- Mean = p
- Variance (Sigma Square) = p*(1-p)
Parameters : p
Probability evaluation P(x|params) = p if x = 1, (1-p) if x = 0
MLE : p = n/N, where n = no of time 1 observed , N = no of experiments
MLE = Maximum Likelihood Estimation

Binomial Distribution

When you perform the Bernoulli experiment multiple times and want to see how many times certain outcome appears.
For example you flip a coin(fair/biased) 10 time and probability that head will appear for x (1, 2, …..10) times.
Another more practical example :
- Suppose oil price can increase by 3 bucks or decreased by 1 buck each day
- Probability of increasing p = 0.65, and that of decreasing = 0.35
- What price can we expect after three days
- Note (Increase, Increase, Decrease) and (Increase, Decrease, Increase) will give same price.
  - (2,1) success -> 2 success 1 failure
From another point of view it count no of successes in an experiment :
- No of patient responding to treatment
- Binary classification problem (Does not seem correct now, it should be Bernoulli, we take logit and sigmoid)
Below formula works when we have probability available, in real life we estimate them from data :
- n = no of times experience is performed
- Mean = n*p
- Variance (Sigma Square) = n*p*(1-p)
Example of binomial used in modeling :
- https://people.duke.edu/~ccc14/sta-663/PyStan.html#estimating-parameters-of-a-logistic-model
Parameters : n, p
Probability evaluation P(x|params) = nCx * p^x * (1-p)^(1-x)
MLE
- n = no of samples = N
- p = n/N where n = no of successes
- Interestingly MLE for binomial and multinomial distribution is very simple

Continue reading “Probability Distribution” →

Interpreting Statistical Values

January 15, 2017May 21, 2023Archit Vora 1 Comment

In this post, we will explore the values in the summary(model) output in R and understand their significance.

Here is a screenshot illustrating the summary:

Significance of Residue

We desire our residues to be normally distributed and centered around zero.
It’s similar to aiming at the bullseye on a dartboard.
- If the residues are biased in one direction, there is room for improvement.
- If the residues are equally biased in all directions, we can attempt to reduce the standard deviation.
- Irreducible error should be observed in all directions simultaneously.
Residues quantile provides an initial insight into symmetry.
R also provides the standard deviation of residuals, known as RSE (residue standard error).

The Relationship between t-value and p-value in the Coefficient Section

The values test if a variable has a relationship with the output.
This is a preset statistical question (null hypothesis) that cannot be changed.
If the coefficient is zero, it does not contribute; otherwise, it does.
The t-value indicates the number of standard deviations the mean is away from zero.
A larger t-value signifies a more significant variable.

Calculating p-values

Incorrect thinking: Taking samples from a larger population.
Each sample yields a different coefficient, which can be zero for some samples.
The variance of the estimated parameter can be mathematically derived using (X^T * X)-1 with σ2.
σ2 can be obtained from the residue error.
Bayesian view helps appreciate the distribution of coefficients rather than point estimation.
- P-values can be calculated naturally using the T-distribution, as there are no assumptions.
In the R result display, we have a mean and standard deviation.
- The coefficient is a probabilistic variable centered at the mean (Estimate in R summary).
- The mean is t standard deviations away from zero.
- The p-value represents the probability of observing a coefficient beyond t standard deviations from the mean.

Role of R^2

R^2 indicates how much of the variance is explained by the model. Refer to formulas above for a better understanding.
R^2 has an advantage over RSE as it always falls between 0 and 1.

Determining a Good Value of R^2

A good value of R^2 depends on the problem setting.
When we make perfect predictions, RSS = 0 and hence R^2 = 1
In physics, if we are confident the data follows a linear model, R^2 close to 1 is desirable.
In marketing, a small proportion of the variance can be explained by predictors, so R^2 = 0.1 can be realistic.

Difference between Absolute and Adjusted R^2

R^2 always increases with the number of variables, while adjusted R^2 decreases if the added variable is not significant.
The formula of adjusted R^2 incorporates the number of variables, so when a non-significant variable is added, the result decreases.
The formulas below illustrate that RSE may increase while RSS decreases, but they are not directly related to R^2.

Significance of F Statistics

The F-test determines if a group of variables is jointly significant, whereas the t-test examines the significance of individual variables.
F-statistics also have associated p-values.
The null hypothesis for the F-test is that the intercept-only model and your model are equal.
While R-squared provides an estimate of the relationship strength between the model and response variable, it does not offer a formal hypothesis test. This test is provided by the F-test.

Why Use F Statistics when Individual Coefficient p-values are Available?

It may seem that if one coefficient is significant (good p-value), the overall model will also be significant.
However, this assumption breaks down when the number of variables with poor p-values is large.

Determining Good Values of F-statistics

It depends on the values of n (number of observations in the training set) and p (number of independent variables).
When n is large, an F-value slightly greater than 1 is sufficient to reject the null hypothesis.
It is advisable to base decisions on corresponding p-values, which consider both n and p.

Degrees of Freedom:

Suppose you have two features, x1 and x2, and a target variable y.
The line equation is y = a1x1 + a2x2 + a3.
In a 3D space, three points define a unique line.
With n points, p(2) features, and 1 target, three points will always lie on the line, while (n-p-1) points can deviate from it. This difference represents the degrees of freedom.
Degrees of freedom are the difference between n and the number of non-zero coefficients, including the intercept.

Significance Score “***” in the Coefficient Section

R indicates the significance of a p-value by displaying stars.
The calculation of this value is likely done through bootstrapping.
Bootstrapping allows assigning measures of accuracy to sample estimates, such as bias, variance, confidence intervals, or prediction error.
In Bayesian inference, parameter distributions are obtained, allowing the calculation of p-values.

References

Found the formula for adjusted R2 here

Data Stories

Category Statistics

Time series week 1

Parametric and Nonparametric tests

Different Tests

When to Use Parametric Tests

Reasons to Use Nonparametric Tests

References:

On multivariate Gaussian

Formulas

On Covariance Matrix

Derivations

Notes and example of bi-variant Gaussian

Linear Transformation Interpretation

On Practical Example

References

Probability Rules and Tricky Questions

Probability Tricky Question

Independent Events

Hypothesis and T-Distribution

Q – Q Plots

Probability Distribution

Bernoulli Distribution

Binomial Distribution

Interpreting Statistical Values

Significance of Residue

The Relationship between t-value and p-value in the Coefficient Section

Calculating p-values

Role of R^2

Determining a Good Value of R^2

Difference between Absolute and Adjusted R^2

Significance of F Statistics

Why Use F Statistics when Individual Coefficient p-values are Available?

Determining Good Values of F-statistics

Degrees of Freedom:

Significance Score “***” in the Coefficient Section

References