The timing and order of drug treatment affects cell death.
One solution: use the concepts from PCA to reduce dimensionality.
First step: Simply apply PCA!
Dimensionality goes from m to Ncomp.
Decompose X matrix (scores T, loadings P, residuals E) X=TPT+E
Regress Y against the scores (scores describe observations – by using them we link X and Y for each observation)
Y=TB+E
The PCs for the X matrix do not necessarily capture X-variation that is important for Y.
We might miss later PCs that are important for prediction!
What if, instead of maximizing the variance explained in X, we maximize the covariance explained between X and Y?
We will find principal components for both X and Y:
X=TPt+E
Y=UQt+F
wa=1u′aua⋅X′aua
https://learnche.org/pid/latent-variable-modelling/projection-to-latent-structures/how-the-pls-model-is-calculated
wa=wa√w′awa
https://learnche.org/pid/latent-variable-modelling/projection-to-latent-structures/how-the-pls-model-is-calculated
ta=1w′awa⋅Xawa
https://learnche.org/pid/latent-variable-modelling/projection-to-latent-structures/how-the-pls-model-is-calculated
ca=1t′ata⋅Y′ata
https://learnche.org/pid/latent-variable-modelling/projection-to-latent-structures/how-the-pls-model-is-calculated
ua=1c′aca⋅Yaca
https://learnche.org/pid/latent-variable-modelling/projection-to-latent-structures/how-the-pls-model-is-calculated
https://learnche.org/pid/latent-variable-modelling/projection-to-latent-structures/how-the-pls-model-is-calculated
Janes et al, Nat Rev MCB, 2006
R2X provides the variance explained in X:
R2X=1−|XPLSR−X||X|
R2Y shows the Y variance explained:
R2Y=1−|YPLSR−Y||Y|
If you are trying to predict something, you should look at the cross-validated R2Y (a.k.a. Q2Y).
sklearn.decomposition.PCA
and sklearn.linear_model.LinearRegression
sklearn.pipeline.Pipeline
sklearn.cross_decomposition.PLSRegression
M.fit(X, Y)
to trainM.predict(X)
to get new predictionsPLSRegression(n_components=3)
to set number of components on setupM.n_components = 3
after setup