Pandas Technique: Summary Statistics

Abu Bin Fahd
Nov 26, 2021
1 min read

Summary statistics is a part of descriptive statistics that summarizes and provides the gist of information about the sample data. Statisticians commonly try to describe and characterize the observations by finding: a measure of location, or central tendency, such as the arithmetic mean.

import pandas as pd
import numpy as np

# read dataset
df = pd.read_csv('Srt_dta.csv')
df

Summarizing numerical data

df['Height(cm)'].mean()

'2011-12-11'

df['Date of Birth'].max()

'2018-02-27'

The .agg() method

agg() is used to pass a function or list of function to be applied on a series or even each element of series separately. In case of list of function, multiple results are returned by agg() method.

def pct30(column):
    return column.quantile(0.3)
    
df['Weight(kg)'].agg(pct30)

21.0

Summaries on multiple columns

df[['Height(cm)', 'Weight(kg)']].agg(pct30)

Height(cm)    45.4
Weight(kg)    21.0
dtype: float64

Multiple summaries

def pct40(column):
    return column.quantile(0.4)
    
df['Height(cm)'].agg([pct30, pct40])

pct30    45.4
pct40    47.2
Name: Height(cm), dtype: float64

Cumulative sum

df['Weight(kg)'].cumsum()
# another method
# .cummax()
# .cumprod()
# .cummin()

0     25
1     48
2     70
3     87
4    116
5    118
6    192
Name: Weight(kg), dtype: int64

datainsightonline.com

Data Scientist Program

Free Online Data Science Training for Complete Beginners.

No prior coding knowledge required!

Pandas Technique: Summary Statistics

Summarizing numerical data

The .agg() method

Summaries on multiple columns

Multiple summaries

Cumulative sum

Recent Posts

Comments

40 Python Projects with Source Code for Beginners

How to Read Medium Premium Articles for Free

How to use Sqlite3 using Python

Data Visualization - which types of graphs should we use?

Best Online Courses for Data Science

9 Ways to Embed Code Snippets on your Data Science Blog Posts