How can I use the simple imputer class to replace missing values with mean values in Python?
This is my code
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
#Importing Dataset
dataset = pd.read_csv('C:/Users/Rupali Singh/Desktop/ML A-Z/Machine Learning A-Z Template Folder/Part 1 - Data Preprocessing/Data.csv')
print(dataset)
X = dataset.iloc[:, :-1].values
Y = dataset.iloc[:, 3].values
#Missing Data
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values= np.nan, strategy='mean')
X.fit[:, 1:3] = imputer.fit_transform(X[:, 1:3])
print(X)
My data set:
Country Age Salary Purchased
0 France 44.0 72000.0 No
1 Spain 27.0 48000.0 Yes
2 Germany 30.0 54000.0 No
3 Spain 38.0 61000.0 No
4 Germany 40.0 NaN Yes
5 France 35.0 58000.0 Yes
6 Spain NaN 52000.0 No
7 France 48.0 79000.0 Yes
8 Germany 50.0 83000.0 No
9 France 37.0 67000.0 Yes
Error Message:
File "C:/Users/Rupali Singh/PycharmProjects/Machine_Learning/data_preprocessing_Template.py", line 15, inX.fit[:, 1:3] = imputer.fit_transform(X[:, 1:3])
AttributeError: 'numpy.ndarray' object has no attribute 'fit'
Your error is due to using Simple Imputer's fit and fit_transform on a numpy array. Here's how i used it on a Dataframe
imr = Imputer(missing_values='NaN', strategy='median', axis=0)
imr = imr.fit(data[['age']])
data['age'] = imr.transform(data[['age']]).ravel()
X.fit = impute.fit_transform().. this is wrong. you can't assign a value to a X.fit() just simply because .fit() is an imputer function, you can't use the method fit() on a numpy array, hence your error!
Use x[:, 1:3] = imputer.fit_transform(x[:, 1:3]) instead