Usually, when we’re dealing with many variables, we don’t have a great understanding of how they relate to each other.
Purposes:
Bonus: Many of the other methods from the class can be applied after dimensionality reduction with little or no adjustment!
Many dimensionality reduction methods involve matrix factorization
Basic Idea: Find two (or more) matrices whose product best approximates the original matrix
Low rank approximation to original \(N\times M\) matrix:
\[ \mathbf{X} \approx \mathbf{W} \mathbf{H}^\top \]
where \(\mathbf{W}\) is \(N\times R\), \(\mathbf{H}\) is \(M\times R\), and \(R \ll \min(N, M)\).
Visualization of matrix factorization, showing a data matrix X approximated by the product of W and H transpose, with sample and feature axes and common names for the factor matrices in PCA and NMF.
Generalization of many methods (e.g., SVD, QR, CUR, Truncated SVD, etc.)
\[ \mathbf{X} \approx \mathbf{W} \mathbf{H}^\top \]
where \(\mathbf{W}\) is \(N\times R\), \(\mathbf{H}\) is \(M\times R\), and \(R \ll \min(N, M)\).
https://www.aaronschlegel.com/image-compression-principal-component-analysis/
https://www.aaronschlegel.com/image-compression-principal-component-analysis/
https://www.aaronschlegel.com/image-compression-principal-component-analysis/
For data table \(\mathbf{X}\) with its singular value decomposition, \(\mathbf{X} = \mathbf{U}\mathbf{D}\mathbf{V}^\top\),
Visualization of PCA as a low-rank approximation: data points are approximated by projection onto the leading principal component directions.
sklearn.decomposition.PCA
PCA.fit_transform(X) fits the model to X, then provides the data in principal component spacePCA.components_ provides the “loadings matrix”, or directions of maximum variancePCA.explained_variance_ provides the amount of variance explained by each componentPCA.explained_variance_ratio_ gives the fraction of total variance explained by each PCimport matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.decomposition import PCA
iris = datasets.load_iris()
X = iris.data
y = iris.target
target_names = iris.target_names
pca = PCA(n_components=2)
X_r = pca.fit_transform(X)
# Print PC1 loadings
print(pca.components_[0, :])
# Print PC1 scores
print(X_r[:, 0])
# Percentage of variance explained for each component
print(pca.explained_variance_ratio_)
# [ 0.92461621 0.05301557]A statistical summary of genetic data from 1,387 Europeans based on principal component axis one (PC1) and axis two (PC2). Novembre et al. Nature. 2008.
What if we have data wherein effects always accumulate?
Helleday et al, Nat Rev Gen, 2014
Helleday et al, Nat Rev Gen, 2014
Alexandrov et al, Cell Rep, 2013
Alexandrov et al, Cell Rep, 2013
Find non-negative matrices \(\mathbf{W}\) and \(\mathbf{H}\) such that
\[ \mathbf{X} \approx \mathbf{W}\mathbf{H}^\top \qquad\text{with}\qquad \mathbf{W}_{ij} \ge 0,\ \mathbf{H}_{ij} \ge 0 \]
Typically, we minimize reconstruction error:
\[ \min_{\mathbf{W},\mathbf{H}} \left\|\mathbf{X} - \mathbf{W}\mathbf{H}^\top\right\|_F^2 \qquad \text{subject to } \mathbf{W}, \mathbf{H} \ge 0 \]
\[[W]_{ij} \leftarrow [W]_{ij} \frac{[\color{darkred} X \color{darkblue}H \color{black}]_{ij}}{[\color{darkred}W \color{darkblue}{H^T H} \color{black}]_{ij}}\]
Color indicates the data term and the factor term.
\[[H]_{ij} \leftarrow [H]_{ij} \frac{[\color{darkred}{X^T} \color{darkblue}W \color{black}]_{ij}}{[\color{darkred}H \color{darkblue}{W^T W} \color{black}]_{ij}}\]
Color indicates the data term and the factor term.
sklearn.decomposition.NMF.
n_components: number of componentsinit: how to initialize the searchsolver: ‘cd’ for coordinate descent, or ‘mu’ for multiplicative updatel1_ratio, alpha_H, alpha_W: Can regularize fitNMF.components_: components x features matrixNMF.fit_transform()As always, selection of the appropriate method depends upon the question being asked.