top of page

Data Scientist Program


Free Online Data Science Training for Complete Beginners.

No prior coding knowledge required!

Pandas Technique: Summary Statistics

Summary statistics is a part of descriptive statistics that summarizes and provides the gist of information about the sample data. Statisticians commonly try to describe and characterize the observations by finding: a measure of location, or central tendency, such as the arithmetic mean.

import pandas as pd
import numpy as np
# read dataset
df = pd.read_csv('Srt_dta.csv')

Summarizing numerical data

df['Date of Birth'].max()

The .agg() method

agg() is used to pass a function or list of function to be applied on a series or even each element of series separately. In case of list of function, multiple results are returned by agg() method.

def pct30(column):
    return column.quantile(0.3)

Summaries on multiple columns

df[['Height(cm)', 'Weight(kg)']].agg(pct30)
Height(cm)    45.4
Weight(kg)    21.0
dtype: float64

Multiple summaries

def pct40(column):
    return column.quantile(0.4)
df['Height(cm)'].agg([pct30, pct40])
pct30    45.4
pct40    47.2
Name: Height(cm), dtype: float64

Cumulative sum

# another method
# .cummax()
# .cumprod()
# .cummin()
0     25
1     48
2     70
3     87
4    116
5    118
6    192
Name: Weight(kg), dtype: int64


Recent Posts

See All
bottom of page