A user wants to use principal component analysis to reduce some noise before applying linear regression.He has 1000 samples and 200 features but he receives an error.

1.1K Asked by ranjan_6399 in Data Science , Asked on Jan 15, 2020

import numpy as np

from sklearn.linear_model import LinearRegression

from sklearn.decomposition import PCA

X = np.random.rand(1000,200)

y = np.random.rand(1000,1)

model.fit(X,y)

pca = PCA(n_components=8)

pca.fit(X)

PCA(copy=True, iterated_power='auto', n_components=3, random_state=None,

svd_solver='auto', tol=0.0, whiten=False)

principal_components = pca.components_

model.fit(principal_components,y)

The error is given below

ValueError: Found input variables with inconsistent numbers of samples: [8, 1000]

We should simultaneously fit PCA to X and transform it into (1000, 8) array named X_pca. That's what we should use instead of the pca.components_

pca = PCA(n_components=8)

X_pca = pca.fit_transform(X)

model.fit(X_pca,y)

Your Answer