top of page
learn_data_science.jpg

Data Scientist Program

 

Free Online Data Science Training for Complete Beginners.
 


No prior coding knowledge required!

Logistic regression and Decision Tree

Updated: Jul 26, 2022

Logistic regression

Classification Techniques are an important part of machine learning and logistic regression is the most used part.it used in many problems such spam detection and Diabetes prediction.


we use it in multi class classification such as (benign or malignant )

In logistic we have discrete classification and we use segmoid function in linear equation:


𝑦=𝛽0+𝛽1𝑋1+𝛽2𝑋2+.....+𝛽𝑛𝑋𝑛 (1)


In logistic we have discrete classification and we use segmoid function in linear equation 𝑝=1/1+𝑒−𝑦(2)


if we substitute in (1) by (2)

𝑝 = 1/1+𝑒−(𝛽0+𝛽1𝑋1+𝛽2𝑋2+.....+𝛽𝑛𝑋𝑛)


in the image below we have two classes one greater than o.5 and one from 0 to 0.5 so logistic regression represent values in binary form from 0 to one


Segmoid function:

Also called logistic function which represent values in 0 and one

its format is

𝑝=1/1+𝑒−x (3)


Loss function:

we use it to reduce the error of the predictions and the other true values

in the image below there exist 2 errors of classification one for red rectangle and one for blue circle


Regularization: we use it to avoid overfitting the data

which mean our model go through every point in the train due to more features in the data

so the solution of this is to reduce the number of parameters of the data For example, a simple way to regularize a polynomial model is to reduce the number of polynomial degrees.

Regularization algorithms :ridge,lasso and Elastic Net


SVM: support vector machine

we want to separate this tow groups which hyperplane is the best ?

a hyperplane is a line that optimally divides the data points into two different classes

so svm is a discriminative algorithms that try to find the best or the optimal hyperplane.


support vector is the most supported point in the data to the decision


Decision Tree:

decision tree is like the flowchart which internal node is the feature and branch represent decision or condition and each leaf is the output


we Select the best attribute using Attribute Selection Measures(ASM) to split the records.

we split attribute to small nodes and so on until one condition is meet

1- All the tuples belong to the same attribute value.

2-There are no more remaining attributes.

3-There are no more instances.


As we see we split our data to two groups train and test and work with train by select best attribute using Gain or gain index


information gain: which measures the impurity of the input set .if we have group of data not unique meaning that group have different types of data(apple,orange,pens ) the impurity of this group would be high


we measure impurity by Entropy and if the entropy is small this group of data is pure if Entropy = 0 it mean that it have only one group


pi=> probability of class i





Resourses:

1: https://www.datacamp.com/tutorial/understanding-logistic-regression-python













 
 
 

Comments


COURSES, PROGRAMS & CERTIFICATIONS

 

Advanced Business Analytics Specialization

Applied Data Science with Python (University of Michigan)

Data Analyst Professional Certificate (IBM)

Data Science Professional Certificate (IBM)

Data Science Specialization (John Hopkins University)

Data Science with Python Certification Training 

Data Scientist Career Path

Data Scientist Nano Degree Program

Data Scientist Program

Deep Learning Specialization

Machine Learning Course (Andrew Ng @ Stanford)

Machine Learning, Data Science and Deep Learning

Machine Learning Specialization (University of Washington)

Master Python for Data Science

Mathematics for Machine Learning (Imperial College London)

Programming with Python

Python for Everybody Specialization (University of Michigan)

Python Machine Learning Certification Training

Reinforcement Learning Specialization (University of Alberta)

Join our mailing list

Data Insight participates in affiliate programs and may sometimes get a commission through purchases made through our links without any additional cost to our visitors.

bottom of page