top of page
learn_data_science.jpg

Data Scientist Program

 

Free Online Data Science Training for Complete Beginners.
 


No prior coding knowledge required!

Correlation and Covariance

Writer's picture: abdelrahman.shaban7000abdelrahman.shaban7000

Whenever we talk about statistics, we should mention correlation and covariance. In this article, we will go through them to illustrate each concept with some examples.


Covariance:

It shows how two variables differ or the direction of the linear relationship. Its value can range from -∞,+∞, so a positive value indicates a positive relationship and a negative value indicates a negative relationship, and If the two variables are independent, So their covariance will be 0. So when we have a positive number we can conclude that there is a direct relationship, but that does not mean the dependency of one variable on another one.


Its formula:

· xi = data value of x

· yi = data value of y

· x̄ = mean of x

· ȳ = mean of y

· N = number of data values.


It is important to mention that the covariance is affected the variances of the variables.


x=[2,4,6,9,14]
y=[32,64,72,81,172]
np.cov(x,y)[0][1]
235.5

Correlation:

It is used to study the strength of a linear relationship between the variables, so unlike covariance, the correlation can be used to compare how strong or weak the relationship is because of its magnitude.

there are different methods that we can use to quantify the correlation. we will use the Pearson correlation coefficient.


Its formula:

We can say that the variables are positively correlated when the two variables move in the same direction. And they are negatively correlated when they move in the opposite directions.

the following are shapes for the relationships between variables and their correlation values based on that.

It is important to remember the common phrase "correlation does not imply causation" and that because there is a third factor can affect both of them.

Notice that you can always compute the correlation even if the relationship is not linear. So before we compute the correlation we should always check the scatterplot to see if the variables are linearly related. And if they are not linearly related, we should not use the correlation or Pearson correlation coefficient.

Here some examples in code:

x = np.arange(10, 20)
y = np.array([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])
print(x)
print(y)
[10 11 12 13 14 15 16 17 18 19]
[ 2  1  4  5  8 12 18 25 96 48]

r = np.corrcoef(x, y)
print(r)
array([[1.        , 0.75864029],
       [0.75864029, 1.        ]])


Another example using pandas:

x_col = pd.Series(range(10, 20))
y_col= pd.Series([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])
x_col.corr(y_col) 
0.7586402890911867

There is a difference between correlation and association as the strongly correlated items are strongly associated but not vice versa.


Resources used: here

That was part of Data Insight's Data Scientist program.

0 comments

Recent Posts

See All

Comments


COURSES, PROGRAMS & CERTIFICATIONS

 

Advanced Business Analytics Specialization

Applied Data Science with Python (University of Michigan)

Data Analyst Professional Certificate (IBM)

Data Science Professional Certificate (IBM)

Data Science Specialization (John Hopkins University)

Data Science with Python Certification Training 

Data Scientist Career Path

Data Scientist Nano Degree Program

Data Scientist Program

Deep Learning Specialization

Machine Learning Course (Andrew Ng @ Stanford)

Machine Learning, Data Science and Deep Learning

Machine Learning Specialization (University of Washington)

Master Python for Data Science

Mathematics for Machine Learning (Imperial College London)

Programming with Python

Python for Everybody Specialization (University of Michigan)

Python Machine Learning Certification Training

Reinforcement Learning Specialization (University of Alberta)

Join our mailing list

Data Insight participates in affiliate programs and may sometimes get a commission through purchases made through our links without any additional cost to our visitors.

bottom of page