5 Ways to Manipulate Pandas Dataframes

Tahani Reesh
Nov 22, 2021
2 min read

Pandas is a great library in Python that expedites the data analysis and exploration process. Pandas also is that it provides a variety of functions and methods for data manipulation. In this blog, I wanted to quickly discuss and show a few useful pandas methods/functions, which can come in handy during your daily work.

Installation

To install pandas, you just need to run pip install pandas in your terminal . Then we can import pandas as pd.

pip install pandas

to import it

import pandas as pd

after installing and import our library now we can start our desired operations with pandas.

now we need to insert our Dataframe as following

import pandas as pd
import numpy as np
data = {'product_name': ['laptop', 'printer', 'tablet', 'desk', 'chair'],
        'price': [1200, 150, 300, 450, 200]
        }
        
 df = pd.DataFrame(data)

print(df)

1. Sorting DataFrame

we can sort the data frame in ascending or descending in pandas and that by using function sort_values().

And we can applying as here

Input:

df.sort_values(by=['price'], ascending=True)

output:

2. Apply Function

this function is used to apply a function along an axis of dataframe

whether it can be row as (axis=0) or column (axis=1).

input:

def double(a):
    return 2*a
 
df['price'] = df['price'].apply(double)
 
# Reading Dataframe
df

Output:

3. Cut Function

The cut() function is used to bin values into discrete intervals. we use cut when you need to segment and sort data values into bins . it only works with arrays.

Input:

pd.cut(np.array([2, 7, 5, 4, 6, 8]), 3)

Output:

[(1.994, 4.0], (6.0, 8.0], (4.0, 6.0], (1.994, 4.0], (4.0, 6.0], (6.0, 8.0]]
Categories (3, interval[float64]): [(1.994, 4.0] < (4.0, 6.0] < (6.0, 8.0]]

4. Explode in Dataframe

explode() method I, t used to transform each element of a list-like to a row, replicating index values.

Input :

df1 = pd.DataFrame(data={"id": [1, 2], 
                        "values": [[1, 2, 3], [4, 5, 6]]})
df1

Output:

now we apply the explode in our dataframe

Input :

df1.explode("values", ignore_index=True)

Output:

5. Indexing and Slicing

Here .loc is label base and .iloc is integer position based methods used for slicing and indexing of data.

we will apply on the same dataframe that we used in the first example.

Input:

print(df.loc[0:4, 'product_name'])
 
# Printing all the rows with price column

print(df.loc[:, 'price'])
 
# Printing only first rows having name.
print(df.iloc[0, 0:2])
 
# Printing first 3 rows having product name and price .
print(df.iloc[0:3, 0:3])
 
# Printing all rows having  product name & price 
print(df.iloc[:, 0:2])

Output

You can find the code used for this article on my Github. Thank you for reading. Please let me know if you have any feedback.

datainsightonline.com

Data Scientist Program

Free Online Data Science Training for Complete Beginners.

No prior coding knowledge required!

5 Ways to Manipulate Pandas Dataframes

Installation

1. Sorting DataFrame

2. Apply Function

3. Cut Function

4. Explode in Dataframe

5. Indexing and Slicing

Recent Posts

Comments

40 Python Projects with Source Code for Beginners

How to Read Medium Premium Articles for Free

How to use Sqlite3 using Python

Data Visualization - which types of graphs should we use?

Best Online Courses for Data Science

9 Ways to Embed Code Snippets on your Data Science Blog Posts