How to implement 'in' and 'not in' for Pandas dataframe
How can I achieve the equivalents of SQL's IN and NOT IN?
I have a list with the required values. Here's the scenario:
df = pd.DataFrame({'countries':['US','UK','Germany','China']})
countries = ['UK','China']
# pseudo-code:
df[df['countries'] not in countries]
My current way of doing this is as follows:
df = pd.DataFrame({'countries':['US','UK','Germany','China']})
countries = pd.DataFrame({'countries':['UK','China'], 'matched':True})
# IN
df.merge(countries,how='inner',on='countries')
# NOT IN
not_in = df.merge(countries,how='left',on='countries')
not_in = not_in[pd.isnull(not_in['matched'])]
But this seems like a horrible kludge. Can anyone improve on it?
You can use pd.Series.isin to solve pandas not in where it checks whether the values are contained in Series and returns a boolean Series showing whether each element in the Series matches an element in the passed sequence of values exactly.
Example:
>>> df
countries
0 India
1 France
2 Austria
3 China
>>> countries
['France', 'China']
>>> df.countries.isin(countries)
0 False
1 True
2 False
3 True
Name: countries, dtype: bool
>>> df[df.countries.isin(countries)]
countries
1 France
3 China
>>> df[~df.countries.isin(countries)]
countries
0 India
2 Austria