How to implement 'in' and 'not in' for Pandas dataframe

606    Asked by DeirdreCameron in SQL Server , Asked on May 7, 2021

How can I achieve the equivalents of SQL's IN and NOT IN?

I have a list with the required values. Here's the scenario:

df = pd.DataFrame({'countries':['US','UK','Germany','China']})
countries = ['UK','China']
# pseudo-code:
df[df['countries'] not in countries]
My current way of doing this is as follows:
df = pd.DataFrame({'countries':['US','UK','Germany','China']})
countries = pd.DataFrame({'countries':['UK','China'], 'matched':True})
# IN
df.merge(countries,how='inner',on='countries')
# NOT IN
not_in = df.merge(countries,how='left',on='countries')
not_in = not_in[pd.isnull(not_in['matched'])]
But this seems like a horrible kludge. Can anyone improve on it?

Answered by Dipika Agarwal

You can use pd.Series.isin to solve pandas not in where it checks whether the values are contained in Series and returns a boolean Series showing whether each element in the Series matches an element in the passed sequence of values exactly.

Example:
>>> df
  countries
0 India
1 France
2 Austria
3 China
>>> countries
['France', 'China']
>>> df.countries.isin(countries)
0 False
1 True
2 False
3 True
Name: countries, dtype: bool
>>> df[df.countries.isin(countries)]
  countries
1 France
3 China
>>> df[~df.countries.isin(countries)]
  countries
0 India
2 Austria

Your Answer

Interviews

Parent Categories