Elimination with matrices

March 13, 2021August 3, 2021Archit Vora Leave a comment

We use it to find solution of linear equations
- Elimination method is also popular among software packages like Matlab.
We would perform row operations to make matrix upper triangular
We can do this with equations itself (without putting in matrix)
- We don’t want to keep writing variable names
Process
- Iteratively select diagonal element as pivot
- Make all entires below it 0
- Repeat
Allowed row operations are the ones which does not change systems
- Switching rows
- Multiplying row with constant
- Add rows

What would happen when we have 2 variable and 100 equations (observation) ?

Here are four terms
- Underdetermined System (no of equation > no of unknowns)
- Overdetermined system (no of equation < no of unknowns)
- Consistent System (one/more set of values satisfy equations)
- Inconsistent System (no set of values satisfy equations)
Overdetermined system is generally inconsistent except
- Same equation multiplied by constant
- One equation is linear combination of other
More specifically, according to the Rouché–Capelli theorem, any system of linear equations (underdetermined or otherwise) is inconsistent if the rank of the augmented matrix is greater than the rank of the coefficient matrix.

Snippets for Prof Gilbert’s course

Elimination matrices – we can create matrix for each elimination operation

Remember – Multiply on left is row operation, multiply on right is column operation

Permutation matrix (P) – Exchange the rows of identity matrix

It is easy to find inverse of this elimination matrices

Reference

These are notes from Prof. Gilbert Strang’s lecture on MIT open courseware. https://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/video-lectures/lecture-1-the-geometry-of-linear-equations/

Geometry of Linear Equations – Column Picture

July 5, 2019May 17, 2023Archit Vora Leave a comment

Notes from Prof. Gilbert Strang’s Lecture on MIT OpenCourseWare: The Geometry of Linear Equations

In linear algebra, when faced with equations, we often try to visualize them using the row picture.

Row Picture: In 2-D, we can think of it as a line and aim to find its intersection.

Column Picture: We aim to find the weights of a linear combination of columns. In the image on the right, we see the addition of two vectors. We start from the origin and add them at the tail.

As we move to higher dimensions, the row picture becomes more challenging to visualize, while the column picture remains straightforward.

Furthermore, the column picture allows us to check if the combination of all columns fills the entire space. We can verify this through elimination.

Additionally, we can develop a habit of viewing matrix multiplication as a linear combination of columns.

Taylor Series Expansion

December 8, 2018December 9, 2018Archit Vora Leave a comment

General

general

Single Variable

single

Multivariable

vector

where x and a are vectors. We had discussed gradient vector (D) and hessian matrix(H) in this post.

Reference

[0] : http://www.math.ucdenver.edu/~esulliva/Calculus3/Taylor.pdf

[1] : https://mathinsight.org/taylors_theorem_multivariable_introduction

Matrices Definitions

December 6, 2018May 8, 2020Archit Vora 1 Comment

Jacobian Matrix

jscobian

Gradient Vector

gradient

Hessian Matrix

hessian

Hessian Matrix is Jacobian of a gradient.

Symmetric Matrix

Matrix and its transpose are same

Hermitian Matrix

Matrix is hermitian if A_transpose (A^T) = A_bar (Conjugate of complex no)

Positive Semi-definite Matrices

One definition:
- Matrix M ∈ L(V) is positive definite iff
  - M is symmetric
  - v^T * M * v >= 0 for all v ∈ V
Now following are equal [0]
- v^T * M * v >= 0 for all v ∈ V
- All eigenvalues are non negative
- There exists a matrix B such that B^T * B = M
Application
- A twice differentiable function of n variable is convex if and only if Hessian of it is positive semi-definite (PSD) [2] [3]

References

[0] : https://www.cse.iitk.ac.in/users/rmittal/prev_course/s14/notes/lec11.pdf

[1] : http://www.princeton.edu/~amirali/Public/Teaching/ORF363_COS323/F15/ORF363_COS323_F15_Lec2.pdf

[2] : https://wiki.math.ntnu.no/_media/tma4180/2016v/note2.pdf

[3] : https://math.stackexchange.com/questions/946156/proving-convexity-of-a-function-whose-hessian-is-positive-semidefinite-over-a-co

Principal Component Analysis (PCA)

December 5, 2018May 5, 2020Archit Vora Leave a comment

PCA find its application in dimensionality reduction. It in turn helps in data visualization and inference. It is also useful in image compression.

PCA arises naturally as maximum likelihood estimation of a particular form of a linear gaussian latent model. However in this post we are focusing on standard non probabilistic view of PCA.

Applications

Dimentionality Reduction
Data Compression
- We had an example of images of digit, where in we can re-generate images with minimum loss by using first 4 eigen vectors
Factor Analysis
- Personality test answers are driven by underlying skill IQ and EQ
Data normalization
- Principle component Regression
Data Visualization

Intuition of Continuous Latent Variables

Bishop’s book[8] had example of images of digits
Latent variable can correspond to stretching or rotation
- Rotation can be say 30 degree, 31 degree which is continuous
- Contrast this with latent variable of hidden Markov models, where latent variables like rainy or sunny atmosphere is discrete
Wikipedia article[7] mentions classes of analysis when latent/observed variables are continuous/discrete
- Gaussian mixture model is example where latent variables are discrete and observed variables are continuos
- Factor analysis is example when both of them are continuous

Two Definitions

Direction that maximizes variance
- Variance is given by eigen values
Direction that minimizes error
- Error is given by eigen value

PCA vs Fisher’s discriminant

We have a note about Fisher’s discriminant analysis
Both PCA and fisher’s discriminant analysis projects data
- But PCA is unsupervised

Derivation

https://github.com/arcarchit/datastories/blob/master/notes/pca.pdf

Calculation

There are two ways to calculate principle component, 1) via covariance matrix and 2) SVD. It has been derived at [0] that both of them are actually the same.

Principle components represent direction with maximum variance. All the principle components are orthogonal.

We are computing major axis of variation. We can see this as projecting data on this axis and variance will be maximized here.

1) Via covariance Matrix

Steps involved [1]

Get data (X_raw)
Normalize( X = data after subtracting mean and divide by standard deviation )
Calculate covariance matrix ( C = X*X^T )
Calculate eigenvalues and eigenvectors of covariance matrix ( C = V*L*V^T )
1. Cx = λx
2. ( C – Iλ )x = 0
3. C – Iλ = 0
4. Determinant of | C – Iλ | = 0 given eigenvalues lambda
5. For each λ we can find x using (2), they will be eigenvectors [5]
Choosing components and forming a feature vector (F = Put eigenvector in columns )
Deriving new data-set (X_new = X * F, you may need to transpose)
How to get old data back ( X = X_new * F^(-1))

2) Via SVD

X = U*S*V^T

where

X – is n * p matrix

U^T * U = I | U is n * n | U are left singular vectors

V^T * V = I and V is p * p | V are right singular vectors

S is diagonal and n * p

X_new = Ur * Sr where Ur and Sr are reduced matrices based on variance we want to keep (say 90 %).

SVD is represents expansion of original data in a coordinate system where covariance matrix is orthogonal. [4]. Values of S are square root of eigenvalues.

U and V can are calculated by finding eigen-vectors of A*A^T and A^T*A. [4][6]

On eigenvalues and eigenvectors

Consider input matrix as transformation matrix [3]
There are some v for given matrix, which when transformed by the matrix just changes their scale but directions remain same
Such a vectors are called eigenvectors of the transformation matrix
Scale by which they change is represented by eigenvalues.