top of page
learn_data_science.jpg

Data Scientist Program

 

Free Online Data Science Training for Complete Beginners.
 


No prior coding knowledge required!

5 Pandas Techniques in Python

In this post, I will identify 5 useful pandas techniques in python to show its powerful tools. Each tool will be demonstrated with a brief definition and a simple code.


I started with importing the Pandas module and loading the data set into the Python environment as Pandas Dataframe:


import pandas as pd
import numpy as np
data = pd.read_csv("C:/Users/asus/Desktop/train.csv", index_col="Loan_ID")

I will start with the first technique: Boolean indexing in pandas.

  • Boolean Indexing in Pandas

In order to filter values of a column based on conditions from another set of columns from a Pandas Dataframe, I want to list all females who are: not graduates and have a loan.

Therefore I used Boolean indexing that can help as shown in the following code:


data.loc[(data["Gender"]=="Female") & (data["Education"]=="Not Graduate") & (data["Loan_Status"]=="Y"), ["Gender","Education","Loan_Status"]]

This code gives the following result:

The second technique I used in this blog is Apply Function in pandas.

  • Apply Function in Pandas

The function in Pandas returns the value after passing each row-column of a data frame after using one function or more. It can be default or user-defined. As shown in the code below:



#Apply Function in Pandas
def num_missing(x):
    return sum(x.isnull())
print("Valeurs manquantes par colonne:")
print(data.apply(num_missing, axis=0)) 
print("\nValeurs manquantes par ligne:")
print(data.apply(num_missing, axis=1).head())

This code gives the following result:


The Third Technique is Imputing missing values using Pandas.




  • Imputing missing values using Pandas

This technique is used to update missing values with the overall mean/mode/median of the column. In the code below I worked on the Gender, Married, and Self_Employed columns with their modes.

As shown in the code below:


#Imputing missing values using Pandas
from scipy.stats import mode
mode(data['Gender'])
data['Gender'].mode()[0]
data['Gender'].fillna(data['Gender'].mode().iloc[0], inplace=True)
data['Married'].fillna(data['Married'].mode().iloc[0], inplace=True)
data['Self_Employed'].fillna(data['Self_Employed'].mode().iloc[0], inplace=True)
print(data.apply(num_missing, axis=0))

This code gives the following result:

The 4th technique is the Pivot Table in Pandas

  • Pivot Table in Pandas

Pandas also can be used to create pivot tables when a key column is 'LoanAmount' that has missing values. impute it using the mean amount of each ‘Gender’, ‘Married’ and ‘Self_Employed’ group. The mean ‘LoanAmount’ of each group in Pandas data frame.

The code below shows how I implement it to determine a pivot table:


#Pivot Table in Pandas
impute_grps = data.pivot_table(values=["LoanAmount"], index=["Gender","Married","Self_Employed"], aggfunc=np.mean)
print (impute_grps)

This code gives the following result:

The last technique is the Multi-Indexing in Pandas Dataframe

  • Multi-Indexing in Pandas Dataframe

the Pandas index is made with a combination of 3 values called Multi-Indexing. It helps in performing operations and provides fast operations.


The values for each group have not been imputed.

I did it by using the various techniques from pandas learned till now.

As shown in the code below:

#Multi-Indexing in Pandas Dataframe
for i,row in data.loc[data['LoanAmount'].isnull(),:].iterrows():
  ind = tuple([row['Gender'],row['Married'],row['Self_Employed']])
  data.loc[i,'LoanAmount'] = impute_grps.loc[ind].values[0]
print (data.apply(num_missing, axis=0))

This code gives the following result:


2 comments

Recent Posts

See All

2 Comments


Data Insight
Data Insight
Oct 30, 2021

You must write an original article or you should never post them on Data Insight.

Like

Data Insight
Data Insight
Oct 30, 2021

65% of your article was plagiarized from these 10 sources:

1.This can be done using the various techniques from pandas learned till now. #iterate only through rows with missing LoanAmount for i,row in ...

https://www.analyticsvidhya.com/blog/2016/01/12-pandas-techniques-python-data-manipulation/


2.This article focuses on providing 12 ways for data manipulation in Python. Ive also shared some tips & tricks which will allow you to work faster. I would recommend that you look at the codes for data exploration before going ahead. To help you understand better, ...

https://www.scribd.com/doc/309584074/12-Useful-Pandas-Techniques-in-Python-for-Data-Manipulation


3.What do you do, if you want to filter values of a column based on conditions from another set of columns from a Pandas Dataframe? For instance, we want a list of all females who are not…


Like
bottom of page