top of page
learn_data_science.jpg

Data Scientist Program

 

Free Online Data Science Training for Complete Beginners.
 


No prior coding knowledge required!

pandas techniques for data manipulation in python

Writer: Nehal SherifNehal Sherif

pandas is an open-source python library that implements easy, high-performance data structures and data analysis tools. The name comes from the term ‘panel data’, which relates to multidimensional data sets found in statistics and econometrics.


To install pandas, just run pip install pandas inside Python environment. Then we can import pandas as pd.



pip install pandas
import pandas as pd

pd.read_csv loads this data into a DataFrame. This can be considered as essentially a table or spreadsheet. Once loaded we can take a quick glimpse of the dataset by calling head() on the data frame.



df = pd.read_csv('gapminder_full.csv')
df.head()

1.Pivot Table

Pandas can be practised to produce MS Excel style pivot tables. For example, in a table, a key column which has missing values. We can impute it using mean amount of other groups.


df.pivot(index="year",columns="country")

2.Boolean Indexing

Boolean Indexing is used if user wants to filter the values of a column based on conditions from another set of columns. For instance, we want a list of all students who are not scholars and got a loan. Boolean indexing can support here.



0==False
c=10
(c>1)+(c<20)+(c==12)
#Boolean index can be used as an index for an array or tuple
state=True
state=(True,False)[state]
state

3.Crosstab

This function is used to get an original view of the data. The function provides scope to validate some fundamental hypothesis. For instance, one column is expected to affect the other column.



pd.crosstab(df.year,df.life_exp)

4.Merge DataFrames


Merging data frames is vital when a user has data coming from various sources to be related.




mydataset1 = pd.DataFrame({'cars': ["BMW", "Volvo", "Ford"],'passings': [3, 7, 2]})
mydataset2 = pd.DataFrame({'cars': ["BMW", "Volvo", "Ford"], "speed": [50, 70, 80]})
data=pd.merge(mydataset1,mydataset2)

5.Sorting DataFrames



When we want to sort Pandas data frame in a particular way. When a user wants to sort pandas data frame based on the values of one or more columns or sort based on the contents of row index or row names of the panda’s data frame. Pandas data frame has two useful functions

  1. sort_values(): this command is used to sort pandas data frame by one or more columns

  2. sort_index(): this command is used to sort pandas data frame by row index




sort_by_life=df.sort_values('life_exp')
















 
 

Comments


COURSES, PROGRAMS & CERTIFICATIONS

 

Advanced Business Analytics Specialization

Applied Data Science with Python (University of Michigan)

Data Analyst Professional Certificate (IBM)

Data Science Professional Certificate (IBM)

Data Science Specialization (John Hopkins University)

Data Science with Python Certification Training 

Data Scientist Career Path

Data Scientist Nano Degree Program

Data Scientist Program

Deep Learning Specialization

Machine Learning Course (Andrew Ng @ Stanford)

Machine Learning, Data Science and Deep Learning

Machine Learning Specialization (University of Washington)

Master Python for Data Science

Mathematics for Machine Learning (Imperial College London)

Programming with Python

Python for Everybody Specialization (University of Michigan)

Python Machine Learning Certification Training

Reinforcement Learning Specialization (University of Alberta)

Join our mailing list

Data Insight participates in affiliate programs and may sometimes get a commission through purchases made through our links without any additional cost to our visitors.

bottom of page