Abu Bin Fahd

Nov 26, 20211 min

Pandas Technique: Summary Statistics

Summary statistics is a part of descriptive statistics that summarizes and provides the gist of information about the sample data. Statisticians commonly try to describe and characterize the observations by finding: a measure of location, or central tendency, such as the arithmetic mean.

import pandas as pd
 
import numpy as np

# read dataset
 
df = pd.read_csv('Srt_dta.csv')
 
df

Summarizing numerical data

df['Height(cm)'].mean()

'2011-12-11'

df['Date of Birth'].max()

'2018-02-27'

The .agg() method

agg() is used to pass a function or list of function to be applied on a series or even each element of series separately. In case of list of function, multiple results are returned by agg() method.

def pct30(column):
 
return column.quantile(0.3)
 

 
df['Weight(kg)'].agg(pct30)

21.0

Summaries on multiple columns

df[['Height(cm)', 'Weight(kg)']].agg(pct30)

Height(cm) 45.4
 
Weight(kg) 21.0
 
dtype: float64

Multiple summaries

def pct40(column):
 
return column.quantile(0.4)
 

 
df['Height(cm)'].agg([pct30, pct40])

pct30 45.4
 
pct40 47.2
 
Name: Height(cm), dtype: float64

Cumulative sum

df['Weight(kg)'].cumsum()
 
# another method
 
# .cummax()
 
# .cumprod()
 
# .cummin()

0 25
 
1 48
 
2 70
 
3 87
 
4 116
 
5 118
 
6 192
 
Name: Weight(kg), dtype: int64

    0