Interpreting Statistical Values

In this post, we will explore the values in the summary(model) output in R and understand their significance.

Here is a screenshot illustrating the summary:

Significance of Residue

We desire our residues to be normally distributed and centered around zero.
It’s similar to aiming at the bullseye on a dartboard.
- If the residues are biased in one direction, there is room for improvement.
- If the residues are equally biased in all directions, we can attempt to reduce the standard deviation.
- Irreducible error should be observed in all directions simultaneously.
Residues quantile provides an initial insight into symmetry.
R also provides the standard deviation of residuals, known as RSE (residue standard error).

The Relationship between t-value and p-value in the Coefficient Section

The values test if a variable has a relationship with the output.
This is a preset statistical question (null hypothesis) that cannot be changed.
If the coefficient is zero, it does not contribute; otherwise, it does.
The t-value indicates the number of standard deviations the mean is away from zero.
A larger t-value signifies a more significant variable.

Calculating p-values

Incorrect thinking: Taking samples from a larger population.
Each sample yields a different coefficient, which can be zero for some samples.
The variance of the estimated parameter can be mathematically derived using (X^T * X)-1 with σ2.
σ2 can be obtained from the residue error.
Bayesian view helps appreciate the distribution of coefficients rather than point estimation.
- P-values can be calculated naturally using the T-distribution, as there are no assumptions.
In the R result display, we have a mean and standard deviation.
- The coefficient is a probabilistic variable centered at the mean (Estimate in R summary).
- The mean is t standard deviations away from zero.
- The p-value represents the probability of observing a coefficient beyond t standard deviations from the mean.

Role of R^2

R^2 indicates how much of the variance is explained by the model. Refer to formulas above for a better understanding.
R^2 has an advantage over RSE as it always falls between 0 and 1.

Determining a Good Value of R^2

A good value of R^2 depends on the problem setting.
When we make perfect predictions, RSS = 0 and hence R^2 = 1
In physics, if we are confident the data follows a linear model, R^2 close to 1 is desirable.
In marketing, a small proportion of the variance can be explained by predictors, so R^2 = 0.1 can be realistic.

Difference between Absolute and Adjusted R^2

R^2 always increases with the number of variables, while adjusted R^2 decreases if the added variable is not significant.
The formula of adjusted R^2 incorporates the number of variables, so when a non-significant variable is added, the result decreases.
The formulas below illustrate that RSE may increase while RSS decreases, but they are not directly related to R^2.

Significance of F Statistics

The F-test determines if a group of variables is jointly significant, whereas the t-test examines the significance of individual variables.
F-statistics also have associated p-values.
The null hypothesis for the F-test is that the intercept-only model and your model are equal.
While R-squared provides an estimate of the relationship strength between the model and response variable, it does not offer a formal hypothesis test. This test is provided by the F-test.

Why Use F Statistics when Individual Coefficient p-values are Available?

It may seem that if one coefficient is significant (good p-value), the overall model will also be significant.
However, this assumption breaks down when the number of variables with poor p-values is large.

Determining Good Values of F-statistics

It depends on the values of n (number of observations in the training set) and p (number of independent variables).
When n is large, an F-value slightly greater than 1 is sufficient to reject the null hypothesis.
It is advisable to base decisions on corresponding p-values, which consider both n and p.

Degrees of Freedom:

Suppose you have two features, x1 and x2, and a target variable y.
The line equation is y = a1x1 + a2x2 + a3.
In a 3D space, three points define a unique line.
With n points, p(2) features, and 1 target, three points will always lie on the line, while (n-p-1) points can deviate from it. This difference represents the degrees of freedom.
Degrees of freedom are the difference between n and the number of non-zero coefficients, including the intercept.

Significance Score “***” in the Coefficient Section

R indicates the significance of a p-value by displaying stars.
The calculation of this value is likely done through bootstrapping.
Bootstrapping allows assigning measures of accuracy to sample estimates, such as bias, variance, confidence intervals, or prediction error.
In Bayesian inference, parameter distributions are obtained, allowing the calculation of p-values.

References

Found the formula for adjusted R2 here

Data Stories

Interpreting Statistical Values

Significance of Residue

The Relationship between t-value and p-value in the Coefficient Section

Calculating p-values

Role of R^2

Determining a Good Value of R^2

Difference between Absolute and Adjusted R^2

Significance of F Statistics

Why Use F Statistics when Individual Coefficient p-values are Available?

Determining Good Values of F-statistics

Degrees of Freedom:

Significance Score “***” in the Coefficient Section

References

One thought on “Interpreting Statistical Values”

Leave a comment Cancel reply

Significance of Residue

The Relationship between t-value and p-value in the Coefficient Section

Calculating p-values

Role of R^2

Determining a Good Value of R^2

Difference between Absolute and Adjusted R^2

Significance of F Statistics

Why Use F Statistics when Individual Coefficient p-values are Available?

Determining Good Values of F-statistics

Degrees of Freedom:

Significance Score “***” in the Coefficient Section

References

Share this:

Related

One thought on “Interpreting Statistical Values”

Leave a comment Cancel reply