top of page
learn_data_science.jpg

Data Scientist Program

 

Free Online Data Science Training for Complete Beginners.
 


No prior coding knowledge required!

time series analysis

Writer: Eman MahmoudEman Mahmoud

Time-Series-Analysis-of-NAICS TThe North American Industry Classification System (NAICS) is an industry classification system developed by the statistical agencies of Canada, Mexico, and the United States. NAICS is designed to provide common definitions of the industrial structure of the three countries and a common statistical framework to facilitate the analysis of the three economies. This analysis is a step by step analysis of the data, with a blog post found here https://www.datainsightonline.com/post/analysing-the-naics-time-series-data

Time-Series-Analysis-of-NAICS TThe North American Industry Classification System (NAICS) is an industry classification system developed by the statistical agencies of Canada, Mexico, and the United States. NAICS is designed to provide common definitions of the industrial structure of the three countries and a common statistical framework to facilitate the analysis of the three economies. This analysis is a step by step analysis of the data, with a blog post found here https://www.datainsightonline.com/post/analysing-the-naics-time-series-data

15 CSV files beginning with RTRA. These files contain employment data by

industry at different levels of aggregation; 2-digit NAICS, 3-digit NAICS, and 4-digit

NAICS. Columns mean as follows:

(i) SYEAR: Survey Year

(ii) SMTH: Survey Month

(iii) NAICS: Industry name and associated NAICS code in the bracket

(iv) _EMPLOYMENT_: Employment


LMO Detailed Industries by NAICS: An excel file for mapping the RTRA data to the

desired data. The first column of this file has a list of 59 industries that are frequently used.

The second column has their NAICS definitions. Using these NAICS definitions and RTRA

data, you would create a monthly employment data series from 1997 to 2018 for these 59

industries.


I will merge LMO Detailed Industries by NAICS file with 2-digit NAICS .

I will merge LMO Detailed Industries by NAICS file with 3-digit NAICS .

I will merge LMO Detailed Industries by NAICS file with 4-digit NAICS .

then

merge all with other .

First read all files and preprossing it for suitable for merging.

Load liberalies



import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import re
import glob

files = glob.glob(r"C:/Users/21AK22/Documents/Data Insight/A_NEWLY_HIRED_DATA_ANALYST/*.csv")
data_2digit = pd.DataFrame()
data_3digit = pd.DataFrame()
data_4digit = pd.DataFrame()
for file in files:
   if re.search('_2NAICS', file):
       df = pd.read_csv(file)data_2digit = 
       pd.concat([data_2digit, df])
   elif re.search('_3NAICS',
       file):
       df = pd.read_csv(file)data_3digit = 
       pd.concat([data_3digit, df])
   elif re.search('_4NAICS', file):
      df =  pd.read_csv(file)data_4digit = 
      pd.concat([data_4digit, df])

I will use two function(separate_NAICS_code - Date_column) for preprossing data


def separate_NAICS_code(df):
df1=pd.DataFrame(df.NAICS.astype('str').str.split('[').to_list(), columns=['NAICS','NAICS_CODE'])
df1['NAICS_CODE']= df1.NAICS_CODE.astype('str').str.strip(']').str.replace('-',',')
df['NAICS']=df1['NAICS']
df['NAICS_CODE']= df1['NAICS_CODE']
return df

def Date_column(df):
df['date'] = pd.to_datetime(df.SYEAR.astype('str') + df.SMTH.astype('str'), format='%Y%m')
df = df.sort_values('date')
return df

preprossing data_2digit data and data_3digit

- Separate NAICS from thier code and put thier code in new column use separate_NAICS_code function.

- create date column using SYEAR and SYEAR use Date_column function.

preprossing data_4digit data only using Date_column function.


data_2digit.head(2)
data_3digit.head(2)
data_4digit.head(2)






Read and preprossing LMO_Detailed_Industries_by_NAICS file

- replace & in column NAICS with ,

- put type of column NAICS string


LMO_Detailed_Industries_by_NAICS = pd.read_excel(r"C:/Users/21AK22/Documents/Data Insight/A_NEWLY_HIRED_DATA_ANALYST/LMO_Detailed_Industries_by_NAICS.xlsx")
LMO_Detailed_Industries_by_NAICS['NAICS'] = LMO_Detailed_Industries_by_NAICS['NAICS'].replace(regex='&', value=',').astype('str')
LMO_Detailed_Industries_by_NAICS['NAICS'] = LMO_Detailed_Industries_by_NAICS['NAICS'].astype('str')
print(LMO_Detailed_Industries_by_NAICS.head())

splits all values in the NAIC column, that have a comma. We observe the following result


- left merging the data_2digit with lmo_detailed_industries

- left merging the data_3digit with lmo_detailed_industries

- merging the data_4digit with lmo_detailed_industries

then

Merging 3 dataframes

Result:



some visualization on final data

Employment in the Utilities industry 1997-2018

Number of Employment across industries 1997-2018




sourse code:https://github.com/eman888991/Data-Insight/blob/main/Project_Time_Series_Analysis_of_NAICS.ipynb

 
 

Kommentare


COURSES, PROGRAMS & CERTIFICATIONS

 

Advanced Business Analytics Specialization

Applied Data Science with Python (University of Michigan)

Data Analyst Professional Certificate (IBM)

Data Science Professional Certificate (IBM)

Data Science Specialization (John Hopkins University)

Data Science with Python Certification Training 

Data Scientist Career Path

Data Scientist Nano Degree Program

Data Scientist Program

Deep Learning Specialization

Machine Learning Course (Andrew Ng @ Stanford)

Machine Learning, Data Science and Deep Learning

Machine Learning Specialization (University of Washington)

Master Python for Data Science

Mathematics for Machine Learning (Imperial College London)

Programming with Python

Python for Everybody Specialization (University of Michigan)

Python Machine Learning Certification Training

Reinforcement Learning Specialization (University of Alberta)

Join our mailing list

Data Insight participates in affiliate programs and may sometimes get a commission through purchases made through our links without any additional cost to our visitors.

bottom of page