- Suppose you have just two parameters a and b and have n observation
- y = a + b*x
- a and b can be found by simple optimization problem [1]
- By setting partial derivative to 0



Simpler R2 formula is available at https://datastoriesweb.wordpress.com/2017/01/15/interpreting-statistical-values/
Matrix closed form Solution

Gradient Descent
- We assume some initial values for parameters
- We calculate gradient for each parameter and move in the negative direction of it by step size

I have coded gradient descent here at [2]
References :
[1] http://seismo.berkeley.edu/~kirchner/eps_120/Toolkits/Toolkit_10.pdf
[2] https://github.com/arcarchit/datastories/blob/master/gradient_descent.ipynb
