Pandas Technique: Summary Statistics
Summary statistics is a part of descriptive statistics that summarizes and provides the gist of information about the sample data. Statisticians commonly try to describe and characterize the observations by finding: a measure of location, or central tendency, such as the arithmetic mean.
import pandas as pd
import numpy as np
# read dataset
df = pd.read_csv('Srt_dta.csv')
df
Summarizing numerical data
df['Height(cm)'].mean()
'2011-12-11'
df['Date of Birth'].max()
'2018-02-27'
The .agg() method
agg() is used to pass a function or list of function to be applied on a series or even each element of series separately. In case of list of function, multiple results are returned by agg() method.
def pct30(column):
return column.quantile(0.3)
df['Weight(kg)'].agg(pct30)
21.0
Summaries on multiple columns
df[['Height(cm)', 'Weight(kg)']].agg(pct30)
Height(cm) 45.4
Weight(kg) 21.0
dtype: float64
Multiple summaries
def pct40(column):
return column.quantile(0.4)
df['Height(cm)'].agg([pct30, pct40])
pct30 45.4
pct40 47.2
Name: Height(cm), dtype: float64
Cumulative sum
df['Weight(kg)'].cumsum()
# another method
# .cummax()
# .cumprod()
# .cummin()
0 25
1 48
2 70
3 87
4 116
5 118
6 192
Name: Weight(kg), dtype: int64
Comments