A user wants to use principal component analysis to reduce some noise before applying linear regression.He has 1000 samples and 200 features but he receives an error.
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.decomposition import PCA
X = np.random.rand(1000,200)
y = np.random.rand(1000,1)
model.fit(X,y)
pca = PCA(n_components=8)
pca.fit(X)
PCA(copy=True, iterated_power='auto', n_components=3, random_state=None,
svd_solver='auto', tol=0.0, whiten=False)
principal_components = pca.components_
model.fit(principal_components,y)
The error is given below
ValueError: Found input variables with inconsistent numbers of samples: [8, 1000]
We should simultaneously fit PCA to X and transform it into (1000, 8) array named X_pca. That's what we should use instead of the pca.components_
pca = PCA(n_components=8)
X_pca = pca.fit_transform(X)
model.fit(X_pca,y)