How to perform a group by function in Python?

1.4K    Asked by ranjan_6399 in Data Science , Asked on Jan 15, 2020
Answered by Ranjana Admin

To perform a groupby function, we need to create a dataframe.Let us create a dataframe.

import pandas as pd

# Create dataframe

data = {'Company':['GOOG','GOOG','MSFT','MSFT','FB','FB'],

       'Person':['Sam','Charlie','Amy','Vanessa','Carl','Sarah'],

       'Sales':[200,120,340,124,243,350]}

df = pd.DataFrame(data)

df

The dataframe looks like below



Now we can use the .groupby() method to group rows together based off of a column name. For instance let's group based off of Company. This will create a DataFrameGroupBy object

df.groupby('Company')

We can save this object as a new variable:

by_comp = df.groupby("Company")

And then call aggregate methods off the object:

by_comp.mean()





Your Answer

Answer (1)

The groupby() function in Pandas is used to group rows based on a specific column and perform aggregate operations on them. Eternalwigs 


Steps to Perform groupby() in Pandas:

Create a DataFrame


python

Copy

Edit

import pandas as pd


data = {'Company': ['GOOG', 'GOOG', 'MSFT', 'MSFT', 'FB', 'FB'],

        'Person': ['Sam', 'Charlie', 'Amy', 'Vanessa', 'Carl', 'Sarah'],

        'Sales': [200, 120, 340, 124, 243, 350]}


df = pd.DataFrame(data)

print(df)

This creates a DataFrame with three columns: Company, Person, and Sales.


Grouping by a Column

To group the data by Company, use:


python

Copy

Edit

df.groupby('Company')

This returns a DataFrameGroupBy object, which can be used for further operations.


Applying Aggregate Functions

We can apply aggregation methods like mean(), sum(), count(), etc.


python

Copy

Edit

by_comp = df.groupby("Company")

print(by_comp.mean()) # Calculates the average sales per company

This will return the mean Sales for each company.

6 Days

Interviews

Parent Categories