Elimination with matrices

  • We use it to find solution of linear equations
    • Elimination method is also popular among software packages like Matlab.
  • We would perform row operations to make matrix upper triangular
  • We can do this with equations itself (without putting in matrix)
    • We don’t want to keep writing variable names
  • Process
    • Iteratively select diagonal element as pivot
    • Make all entires below it 0
    • Repeat
  • Allowed row operations are the ones which does not change systems
    • Switching rows
    • Multiplying row with constant
    • Add rows

What would happen when we have 2 variable and 100 equations (observation) ?

  • Here are four terms
    • Underdetermined System (no of equation > no of unknowns)
    • Overdetermined system (no of equation < no of unknowns)
    • Consistent System (one/more set of values satisfy equations)
    • Inconsistent System (no set of values satisfy equations)
  • Overdetermined system is generally inconsistent except
    • Same equation multiplied by constant
    • One equation is linear combination of other
  • More specifically, according to the Rouché–Capelli theorem, any system of linear equations (underdetermined or otherwise) is inconsistent if the rank of the augmented matrix is greater than the rank of the coefficient matrix.

Snippets for Prof Gilbert’s course

Elimination matrices – we can create matrix for each elimination operation

Remember – Multiply on left is row operation, multiply on right is column operation

Permutation matrix (P) – Exchange the rows of identity matrix

It is easy to find inverse of this elimination matrices

Reference

These are notes from Prof. Gilbert Strang’s lecture on MIT open courseware. https://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/video-lectures/lecture-1-the-geometry-of-linear-equations/

Geometry of Linear Equations – Column Picture

Notes from Prof. Gilbert Strang’s Lecture on MIT OpenCourseWare: The Geometry of Linear Equations

In linear algebra, when faced with equations, we often try to visualize them using the row picture.

Row Picture: In 2-D, we can think of it as a line and aim to find its intersection.

Column Picture: We aim to find the weights of a linear combination of columns. In the image on the right, we see the addition of two vectors. We start from the origin and add them at the tail.

mit_row_2d
mit_col_2d

As we move to higher dimensions, the row picture becomes more challenging to visualize, while the column picture remains straightforward.

mit_3d

Furthermore, the column picture allows us to check if the combination of all columns fills the entire space. We can verify this through elimination.

mit_subspace

Additionally, we can develop a habit of viewing matrix multiplication as a linear combination of columns.

mit_mm

Matrices Definitions

Jacobian Matrix

jscobian

Gradient Vector

gradient

Hessian Matrix

hessian

Hessian Matrix is Jacobian of a gradient. 

Symmetric Matrix

  • Matrix and its transpose are same

 

Hermitian Matrix

  • Matrix is hermitian if A_transpose (A^T) = A_bar (Conjugate of complex no)

Positive Semi-definite Matrices

  • One definition:
    • Matrix M ∈ L(V) is positive definite iff
      • M is symmetric
      • v^T * M * v >= 0 for all v ∈ V
  • Now following are equal [0]
    • v^T * M * v >= 0 for all v ∈ V
    • All eigenvalues are non negative
    • There exists a matrix B such that B^T * B = M
  • Application
    • A twice differentiable function of n variable is convex if and only if Hessian of it is positive semi-definite (PSD) [2] [3]

 

References

[0] : https://www.cse.iitk.ac.in/users/rmittal/prev_course/s14/notes/lec11.pdf

[1] : http://www.princeton.edu/~amirali/Public/Teaching/ORF363_COS323/F15/ORF363_COS323_F15_Lec2.pdf

[2] : https://wiki.math.ntnu.no/_media/tma4180/2016v/note2.pdf

[3] : https://math.stackexchange.com/questions/946156/proving-convexity-of-a-function-whose-hessian-is-positive-semidefinite-over-a-co

 

 

Principal Component Analysis (PCA)

PCA find its application in dimensionality reduction. It in turn helps in data visualization and inference. It is also useful in image compression.

PCA arises naturally as maximum likelihood estimation of a particular form of a linear gaussian latent model. However in this post we are focusing on standard non probabilistic view of PCA.

Applications

  • Dimentionality Reduction
  • Data Compression
    • We had an example of images of digit, where in we can re-generate images with minimum loss by using first 4 eigen vectors
  • Factor Analysis
    • Personality test answers are driven by underlying skill IQ and EQ
  • Data normalization
    • Principle component Regression
  • Data Visualization

 

Intuition of Continuous Latent Variables

  • Bishop’s book[8] had example of images of digits
  • Latent variable can correspond to stretching or rotation
    • Rotation can be say 30 degree, 31 degree which is continuous
    • Contrast this with latent variable of hidden Markov models, where latent variables like rainy or sunny atmosphere is discrete
  • Wikipedia article[7] mentions classes of analysis when latent/observed variables are continuous/discrete
    • Gaussian mixture model is example where latent variables are discrete and observed variables are continuos
    • Factor analysis is example when both of them are continuous

Two Definitions

  1. Direction that maximizes variance
    • Variance is given by eigen values
  2. Direction that minimizes error
    • Error is given by eigen value

PCA vs Fisher’s discriminant

 

Derivation

https://github.com/arcarchit/datastories/blob/master/notes/pca.pdf

 

 

 

Calculation

There are two ways to calculate principle component, 1) via covariance matrix and 2) SVD. It has been derived at [0] that both of them are actually the same.

Principle components represent direction with maximum variance. All the principle components are orthogonal.

We are computing major axis of variation. We can see this as projecting data on this axis and variance will be maximized here.

1) Via covariance Matrix

Steps involved [1]

  1. Get data (X_raw)
  2. Normalize( X = data after subtracting mean and divide by standard deviation )
  3. Calculate covariance matrix ( C = X*X^T )
  4. Calculate eigenvalues and eigenvectors of covariance matrix ( C = V*L*V^T )
    1. Cx = λx
    2. ( C – Iλ )x = 0
    3. C – Iλ = 0
    4. Determinant of | C – Iλ | = 0 given eigenvalues lambda
    5. For each λ we can find x using (2), they will be eigenvectors [5]
  5. Choosing components and forming a feature vector (F = Put eigenvector in columns )
  6. Deriving new data-set (X_new = X * F, you may need to transpose)
  7. How to get old data back ( X = X_new * F^(-1))

 

2) Via SVD

X = U*S*V^T

where

X – is n * p matrix

U^T * U = I | U is n * n | U are left singular vectors

V^T * V = I and V is p * p | V are right singular vectors

S is diagonal and n * p

X_new = Ur * Sr where Ur and Sr are reduced matrices based on variance we want to keep (say 90 %).

SVD is represents expansion of original data in a coordinate system where covariance matrix is orthogonal. [4]. Values of S are square root of eigenvalues.

U and V can are calculated by finding eigen-vectors of A*A^T and A^T*A. [4][6]

 

On eigenvalues and eigenvectors

  • Consider input matrix as transformation matrix [3]
  • There are some v for given matrix, which when transformed by the matrix just changes their scale but directions remain same
  • Such a vectors are called eigenvectors of the transformation matrix
  • Scale by which they change is represented by eigenvalues.

 

References

[0] : https://stats.stackexchange.com/questions/134282/relationship-between-svd-and-pca-how-to-use-svd-to-perform-pca

[1] : http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf

[2] : http://people.ciirc.cvut.cz/~hlavac/TeachPresEn/11ImageProc/15PCA.pdf

[3] : https://www.mathplanet.com/education/geometry/transformations/transformation-using-matrices

[4] : http://web.mit.edu/be.400/www/SVD/Singular_Value_Decomposition.htm

[5] : https://www.scss.tcd.ie/~dahyotr/CS1BA1/SolutionEigen.pdf

[6] : http://mysite.science.uottawa.ca/phofstra/MAT2342/SVDproblems.pdf

[7] : https://en.wikipedia.org/wiki/Latent_variable_model

[8] : Book : Pattern Recognition and Machine Learning by Christopher Bishop https://www.springer.com/gp/book/9780387310732