How to deal with categorical variables in Python?
Categorical variables are discrete variables present in a data which are needed to be converted into a continuous variable. Now continuous variables can be a number in ordered form but in some cases, this type of encoding may cause ordinality issues. For example, if a column contains the names of countries like India and Japan and they are encoded as 1,2 then an algorithm may assume Japan is twice compared to India. In such cases, one-hot encoding is required which converts categorical into dummy variables that are only represented in 0 or 1.For example
Actual
Dummy_India
Dummy_Japan
India
1
0
Japan
0
1
For implementing the same in Python, we can use pd.get_dummies() to encode the categorical variables into dummy ones.
In this dataset, all the columns contain categorical variable. Let us convert gender into dummy variables.
So after converting we can see that the male column which contains 1 are actually male and 0 are actually female.