top of page
learn_data_science.jpg

Data Scientist Program

 

Free Online Data Science Training for Complete Beginners.
 


No prior coding knowledge required!

USEFUL PANDAS TECHNIQUES IN PYTHON

There are several libraries in python, however we will focus on one of the most powerful which is Pandas. This article, will introduce you to some useful techniques in pandas with examples.


Dataset description: the data describes the length and width of two stages (setal and petal) of different species of flowers)."""


First of all, import pandas package


# Import Pandas package
import pandas as pd 

1- Load DataFrame

We will import our DataFrame with the pd. read_csv() by giving it the path and print the first few rows. The code below shows the synthesis.

 
# Read the DataFrame by using pd.read_csv(): iris
iris=pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')

# Print the first few rows of iris file
print(iris.head()) 

Output 1:



2- Overview on the data

It is important to have an overview of the content of our data before proceeding with any analysis. So, we will proceed with the info() and shape function.


# Overview of our DataFrame iris
iris.info()
iris.shape

Output2:



3- Sorting and setting DataFrame

a- sorting

Sorting a DataFrame is done with the sort () function. In this example with the dataframe named iris, we will sort the column petal_length in Descending (ascending = False) order. The function ascending = True or False make values in ascending or descending order. Let's have a look:



# sort iris by descending sepal_lenght
iris_sort = iris.sort_values("sepal_length",ascending = False)
print(iris_sort.head(5))

Output 3a:



b- setting index

Setting a column as an index is done with set_index(). In our case we are interested in changing the index column with the Species column.


# setting species colomn as the index
iris_ind = iris.set_index("species")
print(iris_ind)

Output 3b:


4- Missing Values

To check if there is any missing data in our table or not, we have to proceed with the function isna().any(). The example below illustrates this:

Output 4:

The results show that there are not missing values.


5- Aggregate

Aggregation with the agg() function allows to perform basic operations (min, max, sum, etc ) quickly. This method can be done on one or more columns. The example below will show us how to proceed.

Example 1:

# Aggregate over sepal_length column
sepal_length_agg = iris["sepal_length"].agg(['min','max','sum'])
print(sepal_length_agg)

Output 5a:

Example 2:

# Aggregate over sepal_length per sepal_width and petal_width columns
iris.agg({'sepal_width' : ['median', 'min','max'], 'petal_width' : ['median','min', 'max']})

Output 5b:



6- Grouped summary statistics

It happens to group the data according to our analysis needs. The groupby() method allows to perform this task and to manipulate large data sets. groupby() takes a DataFrame as input and divides the DataFrame into groups based on given criteria. In this example, the average will be used

Example 1:


 # groupe by species, calculate mean sepal_length,petal_length
length_group = iris_ind_sort.groupby("species")[["sepal_length","petal_length"]].mean()
print(length_group)

Output 6 a:


Example 2:

# groupe by species, calculate mean sepal_width,petal_width
width_group = iris_ind_sort.groupby("species")[["sepal_width","petal_width"]].mean()
print(width_group)

Output 6b:



That's all, I hope you will find these techniques useful. It's your turn to practice.

 





0 comments

Recent Posts

See All
bottom of page