Purposes:
Bonus: Many of the other methods from the class can be applied after dimensionality reduction with little or no adjustment!
Many dimensionality reduction methods involve matrix factorization
Basic Idea: Find two (or more) matrices whose product best approximate the original matrix
Low rank approximation to original \(N\times M\) matrix:
\[ \mathbf{X} \approx \mathbf{W} \mathbf{H}^{T} \]
where \(\mathbf{W}\) is \(N\times R\), \(\mathbf{H}^{T}\) is \(M\times R\), and \(R \ll N\).
Generalization of many methods (e.g., SVD, QR, CUR, Truncated SVD, etc.)
\[ \mathbf{X} \approx \mathbf{W} \mathbf{H}^{T} \]
where \(\mathbf{W}\) is \(M\times R\), \(\mathbf{H}^{T}\) is \(M\times R\), and \(R \ll N\).
https://www.aaronschlegel.com/image-compression-principal-component-analysis/
https://www.aaronschlegel.com/image-compression-principal-component-analysis/
https://www.aaronschlegel.com/image-compression-principal-component-analysis/
sklearn.decomposition.PCA
PCA.fit_transform(X)
fits the model to X
, then provides the data in principal component spacePCA.components_
provides the “loadings matrix”, or directions of maximum variancePCA.explained_variance_
provides the amount of variance explained by each componentimport matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.decomposition import PCA
iris = datasets.load_iris()
X = iris.data
y = iris.target
target_names = iris.target_names
pca = PCA(n_components=2)
X_r = pca.fit(X).transform(X)
# Print PC1 loadings
print(pca.components_[:, 0])
# Print PC1 scores
print(X_r[:, 0])
# Percentage of variance explained for each component
print(pca.explained_variance_ratio_)
# [ 0.92461621 0.05301557]
What if we have data wherein effects always accumulate?
Helleday et al, Nat Rev Gen, 2014
Helleday et al, Nat Rev Gen, 2014
Alexandrov et al, Cell Rep, 2013
Alexandrov et al, Cell Rep, 2013
\[[W]_{ij} \leftarrow [W]_{ij} \frac{[\color{darkred} X \color{darkblue}{H^T} \color{black}]_{ij}}{[\color{darkred}{WH} \color{darkblue}{H^T} \color{black}]_{ij}}\]
Color indicates the reconstruction of the data and the projection matrix.
\[[H]_{ij} \leftarrow [H]_{ij} \frac{[\color{darkblue}{W^T} \color{darkred}X \color{black}]_{ij}}{[\color{darkblue}{W^T} \color{darkred}{WH} \color{black}]_{ij}}\]
Color indicates the reconstruction of the data and the projection matrix.
sklearn.decomposition.NMF
.
n_components
: number of componentsinit
: how to initialize the searchsolver
: ‘cd’ for coordinate descent, or ‘mu’ for multiplicative updatel1_ratio
, alpha_H
, alpha_W
: Can regularize fitNMF.components_
: components x features matrixNMF.fit_transform()
As always, selection of the appropriate method depends upon the question being asked.