top of page

Data Scientist Program


Free Online Data Science Training for Complete Beginners.

No prior coding knowledge required!

Investigating Netflix Series

In this project, we will take a look at a dataset of The Office episodes, and try to understand how the popularity and quality of the series varied over time.

To do so, we will use the following dataset: datasets/office_episodes.csv, which was downloaded from Here.

After downloading it let us open and read it using ower Jupyter Notebook.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
pd.options.mode.chained_assignment = None  # default='warn'

df = pd.read_csv('the_office_series.csv', index_col = [0])

Let us view its columns details.

Let's remove the unnecessary columns.

df1 = df[['Ratings', 'Viewership', 'Date']]

Let's change the Date column type to be 'Date Time'

df1['Date'] = pd.to_datetime(df1['Date'])

Let's now analys our data over time and show out the graphs.

fig, ax = plt.subplots()
ax.scatter(y= df1['Ratings'], x= df1['Date'], color = 'blue', alpha= 0.3)
ax.scatter(y= df1['Viewership'], x= df1['Date'], color = 'gold', alpha = 0.3)
ax.legend(['Ratings', 'Viewership'])

df1.plot(y= ['Ratings','Viewership'], x= 'Date', figsize = (13, 5))

We can conclude that, the `Quality` of the series according to the `Ratings` of the viewers has not been affected by time. On the other hand we have seen that, the `Popularity` had an obvious declination in the last two years according to the `Viewership` values.


Recent Posts

See All
bottom of page