top of page

Data Scientist Program


Free Online Data Science Training for Complete Beginners.

No prior coding knowledge required!

Investigating Netflix Movies and Guest Stars in The Office

First of all pandas and matplotlib was imported as pd and plt respectively

import pandas as pd
import matplotlib.pyplot as plt

Here office episodes dataset was read as dataframe

office_df = pd.read_csv('datasets/office_episodes.csv', parse_dates=['release_date'])

Based on the ratings colors was appended with the help of for loop as shown below:

for ind, row in office_df.iterrows():
    if row['scaled_ratings'] < 0.25:
    elif row['scaled_ratings'] < 0.50:
    elif row['scaled_ratings'] < 0.75:

If the office dataframe row has guest then size 250 was appened and if it has no guest then size was kept 25 as shown in the loop below:

for ind, row in office_df.iterrows():
    if row['has_guests'] == False:

Data were divided according to the having guest and not having guest,

non_guest_df = office_df[office_df['has_guests'] == False]
guest_df = office_df[office_df['has_guests'] == True]

Finally, at the end the most popular guest star was printed

print(office_df[office_df['viewership_mil'] > 20]['guest_stars'])
top_star = 'Jessica Alba'

Popularity, Quality and Guest Appearances in office were resulted.

Viewership were plotted vertically and viewership were plotted horizontally on a plot. Viewership was displayed in Millions count. Data were falling between 7.5 to 10 Millions. Episode number from 0 to 175 having views not more than 10 Millions.


Recent Posts

See All
bottom of page