top of page
learn_data_science.jpg

Data Scientist Program

 

Free Online Data Science Training for Complete Beginners.
 


No prior coding knowledge required!

Writer's pictureSara Ahmed

pandas techniques

1) Apply function

Pandas.apply allow the users to pass a function and apply it on every single value of the Pandas series. It comes as a huge improvement for the pandas library as this function helps to segregate data according to the conditions required due to which it is efficiently used in data science and machine learning..


Objects passed to the function are Series objects whose index is either the DataFrame’s index (axis=0) or the DataFrame’s columns (axis=1). By default (result_type=None)


first we add the needed libraries , this is the first step when using pandas functions

import pandas as pd
import numpy as np

creating a dataframe
df = pd.DataFrame([[1,2,3]]*5 , columns = ['A','B','C'])
print(df)
output:
   A  B  C
0  1  2  3
1  1  2  3
2  1  2  3
3  1  2  3
4  1  2  3


applying apply function per column
df.apply(np.sum , axis =0)
output:
A     5
B    10
C    15
dtype: int64


applying apply function per row
df.apply(np.sum , axis =1)
output:
0    6
1    6
2    6
3    6
4    6
dtype: int64

2) Boolean indexing

Boolean Indexing is used if user wants to filter the values of a column based on conditions from another set of columns. For instance, we want a list of all students who are not scholars and got a loan. Boolean indexing can support here.


in python 0 is false , 1 is true and vice versa
0==False
output:
True

c = 10
(c > 1) + (c<20) +(c == 12)
output:
2


A boolean test can be used as an index for an index for an array or tuple
state = True
state = (True , False)[state]
state
output:
False

3) Is null function

Detect missing values for an array-like object.


This function takes a scalar or array-like object and indicates whether values are missing .




import pandas as pd
import numpy as np
pd.isna('dog')
output:
False

Is pd.na a null value?
pd.isna(pd.NA)
output:
True
Is np.nan a null value?
pd.isna(np.nan)
output:
True
construct an array with null values , and pass it to isna function
array = np.array([[1, np.nan, 3], [4, 5, np.nan]])
pd.isna(array)
output:
array([[False,  True, False],
       [False, False,  True]])

construct a dataframe , and pass it to isna function
df = pd.DataFrame([['ant', 'bee', 'cat'], ['dog', None, 'fly']])
pd.isna(df)
output:



pd.isna(df[1])
output:
0    False
1     True
Name: 1, dtype: bool

4) Get dummies

Convert categorical variable into dummy/indicator variables.


convert the list into dummy variables
import pandas as pd
import numpy as np
s = pd.Series(list('abca'))
pd.get_dummies(s)
output:


convert the list into dummy variables
s1 = ['a', 'b', np.nan]
pd.get_dummies(s1)
output:

it includes the null values when converting to dummy varaibles
pd.get_dummies(s1, dummy_na=True)
output:


5) Cut function

Use cut when you need to segment and sort data values into bins. This function is also useful for going from a continuous variable to a categorical variable. For example, cut could convert ages to groups of age ranges. Supports binning into an equal number of bins, or a pre-specified array of bins.

The input array to be binned. Must be 1-dimensional.


Discretize into three equal-sized bins.
import pandas as pd
import numpy as np
pd.cut(np.array([1, 7, 5, 4, 6, 3]), 3)
output:
[(0.994, 3.0], (5.0, 7.0], (3.0, 5.0], (3.0, 5.0], (5.0, 7.0], (0.994, 3.0]]
Categories (3, interval[float64, right]): [(0.994, 3.0] < (3.0, 5.0] < (5.0, 7.0]]
Discretize into three equal-sized bins ,
pd.cut(np.array([1, 7, 5, 4, 6, 3]), 3, retbins=True)

output:
([(0.994, 3.0], (5.0, 7.0], (3.0, 5.0], (3.0, 5.0], (5.0, 7.0], (0.994, 3.0]]
 Categories (3, interval[float64, right]): [(0.994, 3.0] < (3.0, 5.0] < (5.0, 7.0]],
 array([0.994, 3.   , 5.   , 7.   ]))

pd.cut(np.array([1, 7, 5, 4, 6, 3]),
       3, labels=["bad", "medium", "good"])
output:
['bad', 'good', 'medium', 'medium', 'good', 'bad']
Categories (3, object): ['bad' < 'medium' < 'good']




0 comments

Recent Posts

See All

Comments


bottom of page