Pandas Techniques for Data Manipulation in Python apply function:

aya abdalsalam
Mar 6, 2022
3 min read

Updated: Mar 24, 2022

Pandas.apply allow users to pass a function to every cell in the dataframe. Ander conditions of the function it works to dataframe it increase the simplicity and readability of code.

DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), **kwargs) fun: function you want to apply to rows or columns

axis(0 or index,1 or column) axis:0 or 'index': apply function to each column. axis:1 or 'column': apply function to each row.

row:(False is Default,Determines if row or column is passed as a Series or ndarray object) if row = False :passes each row or column as a Series to the function. else: the passed function will receive ndarray objects instead

args :tuple Positional arguments to pass to func in addition to the array/series.

**kwargs:additional keyword arguments to pass as keywords arguments to func.

import pandas as pd
import numpy as np
df3 = pd.DataFrame([[2,3,4,5]] * 3, columns=['A', 'B','C','D'])
df3

df4 = df3.apply(np.sqrt)

#sum each column alone so you will get 4 values 
df3.apply(np.sum,axis = 0)

A     6
B     9
C    12
D    15
dtype: int64

#sum all row so you will get 3 rows 
df3.apply(np.sum,axis = 1)

0    14
1    14
2    14
dtype: int64

df = pd.read_csv('traffic.csv')
df.head()
dfi.export(df.head(),'df.png')

def use_apply(i):
        j = "NotFound"
        if i == 'M':
            j ="Male"
        elif i == "F":
            j = "Female"
        
        return j
      
result = df['driver_gender'].apply(use_apply)
result

0          Male
1          Male
2          Male
3          Male
4        Female
          ...  
91736    Female
91737    Female
91738      Male
91739    Female
91740      Male
Name: driver_gender, Length: 91741, dtype: object

As we see above we change every cell in column driver_gender as function said

pandas.DataFrame.agg

This Function help you to do some operations at the same time so reduce your code.

DataFrame.agg(func=None, axis=0, *args, **kwargs)

func : accept functions to perform them this function accept list of funtions

axis: The default is (0)to perform along columns

axis :1 to perform along rows

*args: to add some parameters

**kwargs:Keyword arguments to pass to func to name identify parameters


df3 = pd.DataFrame([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9],
                   [np.nan, np.nan, np.nan],
                    [3,4,5],
                    [8,9,6]],
                  columns=['A', 'B', 'C'])
df3.agg(['max','sum'])

df3.agg(sum)

A    23.0
B    28.0
C    29.0
dtype: float64

df3.agg(min)
A    1.0
B    2.0
C    3.0
dtype: float64

Merge Dataframe

If You have tow Data sets and you want to Work at them at the same time to get results from them all you should use merge.

dataframe pandas.merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)

left: Dataframe name

right: the second dataframe name

how:(inner defult, outer, left, right, cross)

left: use keys from left data frame

right: use keys from right data frame

inner: intersection between the two data frame

outer: the union of the two data frame

on: column name which should be in the two data Frame

left_on: column or index level names to join on in the left DataFrame.

right_on: column or index level names to join on in the right DataFrame.

left_index: default False, use index of the left dataframe

right_index: use index of the right dataframe

suffixes: to distinguish between two data frame columns if there exist the same names

df4 = pd.DataFrame({'dfk': ['foo', 'bar', 'baz', 'foo'],
                    'value': [1, 2, 3, 5]})
df5 = pd.DataFrame({'dfk': ['foo', 'bar', 'baz', 'foo'],
                    'value': [5, 6, 7, 8]})
pd.merge(df4,df5,on ='dfk') # intersection

pd.merge(df4,df5,how = 'outer' ,on = 'dfk')

df4.merge(df5,how = 'cross') # cross product

pandas.isnull

Catch Empty cells and Return True if NaN and False if not

df3.isnull()

pandas.unique(values)

Return unique values

# notice that the output not sorted 
pd.unique(pd.Series([4,5,7,8,9,99,4,5,4 ,2,33]))

array([ 4,  5,  7,  8,  9, 99,  2, 33], dtype=int64)

pd.unique([("m", "n"), ("z", "x"), ("n", "v"), ("z", "x")]) 
# note that (a,b) != (b,a)

array([('m', 'n'), ('z', 'x'), ('n', 'v')], dtype=object)

melt in pandas

used to change Data format from wide-----> to long(⬇️)

m = {"Name": ["Aya", "Lisa", "David"], "ID": [1, 2, 3], "Role": ["CEO", "Editor", "Author"]}

df = pd.DataFrame(m)

print(df)
print('\n________________________________________\n')

df_melted = pd.melt(df, id_vars=["ID"], value_vars=["Name", "Role"])

print(df_melted)

    Name  ID    Role
0    Aya   1     CEO
1   Lisa   2  Editor
2  David   3  Author

________________________________________

   ID variable   value
0   1     Name     Aya
1   2     Name    Lisa
2   3     Name   David
3   1     Role     CEO
4   2     Role  Editor
5   3     Role  Author

we can use pivot to unmelt dataframe

m = {"Name": ["Aya", "Lisa", "David"], "ID": [1, 2, 3], "Role": ["CEO", "Editor", "Author"]}

df = pd.DataFrame(m)

print(df)
print('\n________________________________________\n')

melted = pd.melt(df, id_vars=["ID"], value_vars=["Name", "Role"], var_name="Attribute", value_name="Value")

print(melted)
print('\n________________________________________\n')

# unmelting using pivot()

unmelted = melted.pivot(index='ID', columns='Attribute')

print(unmelted)

    Name  ID    Role
0    Aya   1     CEO
1   Lisa   2  Editor
2  David   3  Author

________________________________________

   ID Attribute   Value
0   1      Name     Aya
1   2      Name    Lisa
2   3      Name   David
3   1      Role     CEO
4   2      Role  Editor
5   3      Role  Author

________________________________________
           Value        
Attribute   Name    Role
ID                      
1            Aya     CEO
2           Lisa  Editor
3          David  Author

unmelted = unmelted['Value'].reset_index()
unmelted.columns.name = None
print(unmelted)

   ID   Name    Role
0   1    Aya     CEO
1   2   Lisa  Editor
2   3  David  Author

resourse:https://pandas.pydata.org/docs/reference/api/pandas.melt.html

https://predictivehacks.com/?all-tips=save-a-pandas-dataframe-as-an-image

https://github.com/AyaMohammedAli/Pandas-Techniques-for-Data-Manipulation-in-Python

datainsightonline.com

Data Scientist Program

Free Online Data Science Training for Complete Beginners.

No prior coding knowledge required!

Pandas Techniques for Data Manipulation in Python apply function:

Merge Dataframe

pandas.isnull

pandas.unique(values)

Recent Posts

Comments

40 Python Projects with Source Code for Beginners

How to Read Medium Premium Articles for Free

How to use Sqlite3 using Python

Data Visualization - which types of graphs should we use?

Best Online Courses for Data Science

9 Ways to Embed Code Snippets on your Data Science Blog Posts