How can i classify and handle pandas drop duplicate columns

I want to find & remove duplicate column names in the data frames. I have a dataframe with various data types such as integers, objects, floats etc. 


there are two conditions that we have for a dataframe to be found and removed:

1. the column name of dataframe is same

2. values present in the columns are same


I have tried doing df.T.duplicated() but it is too slow for big dataframes, also while browsing i got to know about pivot, pivot_table or corr to list duplicate column names.


Please make me understand when we should use these three things and if there is any other thing we can do?

Answered by Brian Kennedy

1. Duplicate columns by name in a Pandas:

with using df.columns.duplicated() you can find duplicate columns by name in a Pandas DataFrame. It is fine to use this way if the data is not huge or gigantic, and the speed of this technique is fast enough.

2. Duplicate columns by values in a Pandas:

The Implementation of finding duplicate columns by values is time consuming and complex at times as well. Generally, I don’t recommend using any pivot, pivot_table or corr for any large datasets. but for detecting two duplicate columns having equivalent correlation we can use corr, by this way you can easily find what you’re booking for.



Your Answer

Interviews

Parent Categories