Principal Component Analysis (PCA)
- is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly [[Correlation|correlated variables]] into a set of values of linearly uncorrelated/independent variables called principal components
- helpful when most of the variation of the data is due to variations of a few principal components
- depends on the:
	- eigen-decomposition of a [[Positive Semi-Definite Matrix|positive semi-definite matrices]]
	- [[Singular Value Decomposition/Factorization (SVD) - Reduced SVD|singular value decomposition]] of rectangular matrices
- learns a linear projection that aligns the direction of greatest variance with axes of the new space
- can be viewed as an [[Unsupervised Learning|unsupervised learning algorithm]] that learns a new "representation" of data:
	- that has lower dimensionality than the original input data
	- whose elements have no linear correlation with each other

PCA is a linear technique for dimensionality reduction which performs a linear mapping of the data to a lower-dimensional space in such a way that the variance of the data in the low-dimensional representation is maximized. In practice, the covariance or correlationmatrix of the data-variables is constructed and the eigenvectors on this matrix are computed. The eigenvectors that correspond to the largest eigenvalues can now be used to reconstruct a large fraction of the variance of the original data

PCA - Introduction

let 𝑋 be a 𝑝✕𝑛 matrix of 𝑛 observations:

  • 𝑋 = [𝑋1, …, 𝑋𝑛]

where each 𝑋𝑖is a 𝑝✕1 vector

sample mean 𝑀 be a 𝑝✕1 vector defined as:

  • 𝑀 = (1/𝑛) (𝑋1 + … + 𝑋𝑛)

translate the 𝑛 observations as so:

  • 𝑋𝑖ˆ = 𝑋𝑖- 𝑀

assign 𝑋 to be the mean-deviation form (having sample mean = 0)

  • 𝑋 = [𝑋1ˆ, …, 𝑋𝑛ˆ]

let 𝑆 be a 𝑝✕𝑝 sample covariance matrix

  • 𝑆 = (1/(𝑛-1)) 𝑋𝑋𝑇

find the eigenvalues and eigenvectors of 𝑆

  • eigenvalues {𝜆1, …, 𝜆𝑝}
  • eigenvectors {𝑣1, …, 𝑣𝑝}

normalize the eigenvectors to get the principal components:

  • 𝑢𝑖= 𝑣𝑖/ ||𝑣𝑖||
  • principal components = {𝑢1, …, 𝑢𝑝}

let 𝑃 be the change of variable/basis matrix that contains the principal components as columns

  • 𝑃 = [𝑢1, …, 𝑢𝑝]

𝑃 is used to transform vector 𝑋𝑖with basis defined by the observations axis to a vector 𝑌𝑖with basis {𝑢1, …, 𝑢𝑝}

  • 𝑋𝑖= 𝑃𝑌𝑖

  • 𝑋= 𝑃𝑌

  • 𝑌𝑖= 𝑃𝑇𝑋𝑖

  • 𝑌= 𝑃𝑇𝑋

for any orthogonal 𝑃 the covariance matrix of 𝑌 = [𝑌1, …, 𝑌𝑝] is:

  • 𝑆 = (1/(𝑛-1)) 𝑋𝑋𝑇
  • 𝑆 = (1/(𝑛-1)) (𝑃𝑌)(𝑃𝑌)𝑇
  • 𝑆 = (1/(𝑛-1)) 𝑃𝑌𝑌𝑇𝑃𝑇
  • 𝑃𝑇𝑆𝑃 = (1/(𝑛-1)) 𝑌𝑌𝑇

thus, covariance matrix of 𝑌 = 𝑃𝑇𝑆𝑃

PCA - Reducing the Dimension of Multivariate Data

  • an orthogonal change of variable/basis does not change the total-variance of the data (because left-multiplication by 𝑃 does not change lengths of vectors nor angles between them)
  • this means if 𝑆 = 𝑃𝐷𝑃𝑇 then:
    • {total-variance of observation 𝑥1, …, 𝑥𝑝} = {total-variance of 𝑦1, …, 𝑦𝑝} = 𝑡𝑟𝑎𝑐𝑒(𝑆) = 𝑡𝑟𝑎𝑐𝑒(𝐷) = 𝜆1 + … + 𝜆𝑝
  • the variance of 𝑦𝑖= 𝜆𝑖
  • the quotient 𝜆𝑖/𝑡𝑟𝑎𝑐𝑒(𝐷) measures the fraction of total variance explained or captured by 𝑦𝑖

PCA - Example

PCA - Subpages

Resources