Summary statistics is a part of descriptive statistics that summarizes and provides the gist of information about the sample data. Statisticians commonly try to describe and characterize the observations by finding: a measure of location, or central tendency, such as the arithmetic mean.
import pandas as pd import numpy as np
# read dataset df = pd.read_csv('Srt_dta.csv') df
Summarizing numerical data
df['Date of Birth'].max()
The .agg() method
agg() is used to pass a function or list of function to be applied on a series or even each element of series separately. In case of list of function, multiple results are returned by agg() method.
def pct30(column): return column.quantile(0.3) df['Weight(kg)'].agg(pct30)
Summaries on multiple columns
Height(cm) 45.4 Weight(kg) 21.0 dtype: float64
def pct40(column): return column.quantile(0.4) df['Height(cm)'].agg([pct30, pct40])
pct30 45.4 pct40 47.2 Name: Height(cm), dtype: float64
df['Weight(kg)'].cumsum() # another method # .cummax() # .cumprod() # .cummin()
0 25 1 48 2 70 3 87 4 116 5 118 6 192 Name: Weight(kg), dtype: int64