Correlation and Regression Slope

In a simple regression model, the regression slope (β) represents the estimated change in the dependent variable (Y) corresponding to a one-unit increase in the independent variable (X). It quantifies the linear relationship between X and Y and indicates the direction and magnitude of the relationship.

The correlation coefficient (r) measures the strength and direction of the linear relationship between X and Y. It ranges from -1 to +1, with a value of 1 indicating a perfect positive linear relationship, -1 indicating a perfect negative linear relationship, and 0 indicating no linear relationship.

When the standard deviations of both X and Y are equal (SD(X) = SD(Y)), the regression slope (β) and the correlation coefficient (r) coincide.

The slope can be calculated as the correlation coefficient multiplied by the ratio of the standard deviations (β = r * SD(Y) / SD(X)). The correlation coefficient essentially represents the slope you would obtain from a regression of standardized variables (Y / SD(Y) on X / SD(X) or vice versa).

However, when the standard deviations of X and Y are not equal, the regression slope and the correlation coefficient provide distinct information:

  1. The correlation coefficient is a bounded measure that can be interpreted independently of the scale of the variables. It indicates the strength of the linear relationship between X and Y, with values closer to ±1 indicating a stronger linear relationship. The regression slope, on its own, does not provide this information.
  2. The regression slope represents the estimated change in the expected value of Y for a given unit increase in X. It provides information about the direction and magnitude of the relationship between X and Y in the original units of measurement. This information cannot be deduced from the correlation coefficient alone.

One more thing to add here is the relationship between correlation coefficient and co-variance. Formula is : r = Covariance (Y, X) / [ SD(Y) * SD(x) ]. We are normalising by SD of each variable. Also SD = sqrt ( variance ). We can also say that b = Covariance(X,Y) / VAR(X)

References

[0] : https://stats.stackexchange.com/questions/32464/how-does-the-correlation-coefficient-differ-from-regression-slope

[1] : https://www.quora.com/Is-there-a-relationship-between-the-correlation-coefficient-and-the-slope-of-a-linear-regression-line

Interpretation of Multiple Regression Coefficients

  • In case of simple linear regression with one variable we interpret slop coefficient as follows
    • y = b0*x0 + c
    • b0 is increase in y for unit increase in x0
  • In case of multiple regression:
    • y = b0*x0 + b1*x1 + c
    • b0 is increase in y for unit increase in x0 keeping x1 constant

 

Implication of keeping other variable constant:

  • Consider
    • house_price = b0 * no_of_bedrooms + c       ….. (1)
    • house_price = b0 * no_of_bedrooms + b1*square_feet + c     …..(2)
  • In (1) there are high chances that b0 will be positive
  • In (2) it can be negative
    • If you increase no of bedrooms keeping square feet constant each room will be smaller
    • This may decrease house price

 

 

Reference :

Coursera course on regression by University of Washington DC : https://www.coursera.org/specializations/machine-learning

 

Regression Trees

Figure below shows how decision tree creates rectangle in predictor space:

As we have describe classification tree, the only difference here is how to come up with output instead of label and how to define a metric other than entropy/gini.

For output it uses the mean of leaves.

For metric CART methodology uses simple SSE.

  • SSE = ∑ (y_i – y1)² + ∑(y_i – y2)²
    • where y1 and y2 are means of two newly crated groups.
    • It is simple sum of squares, not standard deviation 
    • While deriving R2 in linear recession we were concerned with sum of squares
    • Ward linkage criterion in hierarchical clustering also uses difference in sum of squares
  • diff = SSE_before – SSE_after
    • We choose a predictor which minimizes SSE the most.
    • diff is highest

My hand-coded regression tree is available at : https://github.com/arcarchit/datastories/tree/master/regression_tree

Reference

Applied predictive modeling by Max Kuhn and Kjell Johnson