How to deal with categorical variables in Python?

421    Asked by NehaTambe in Data Science , Asked on Nov 30, 2019
Answered by Neha Tambe

Categorical variables are discrete variables present in a data which are needed to be converted into a continuous variable. Now continuous variables can be a number in ordered form but in some cases, this type of encoding may cause ordinality issues. For example, if a column contains the names of countries like India and Japan and they are encoded as 1,2 then an algorithm may assume Japan is twice compared to India. In such cases, one-hot encoding is required which converts categorical into dummy variables that are only represented in 0 or 1.For example

Actual

Dummy_India

Dummy_Japan

India

1

0

Japan

0

1


For implementing the same in Python, we can use pd.get_dummies() to encode the categorical variables into dummy ones.


In this dataset, all the columns contain categorical variable. Let us convert gender into dummy variables.


So after converting we can see that the male column which contains 1 are actually male and 0 are actually female.


Your Answer

Interviews

Parent Categories