top of page

Data Scientist Program


Free Online Data Science Training for Complete Beginners.

No prior coding knowledge required!

Logistic regression and Decision Tree

Logistic regression

Classification Techniques are an important part of machine learning and logistic regression is the most used used in many problems such spam detection and Diabetes prediction.

we use it in multi class classification such as (benign or malignant )

In logistic we have discrete classification and we use segmoid function in linear equation:

๐‘ฆ=๐›ฝ0+๐›ฝ1๐‘‹1+๐›ฝ2๐‘‹2+.....+๐›ฝ๐‘›๐‘‹๐‘› (1)

In logistic we have discrete classification and we use segmoid function in linear equation ๐‘=1/1+๐‘’โˆ’๐‘ฆ(2)

if we substitute in (1) by (2)

๐‘ = 1/1+๐‘’โˆ’(๐›ฝ0+๐›ฝ1๐‘‹1+๐›ฝ2๐‘‹2+.....+๐›ฝ๐‘›๐‘‹๐‘›)

in the image below we have two classes one greater than o.5 and one from 0 to 0.5 so logistic regression represent values in binary form from 0 to one

Segmoid function:

Also called logistic function which represent values in 0 and one

its format is

๐‘=1/1+๐‘’โˆ’x (3)

Loss function:

we use it to reduce the error of the predictions and the other true values

in the image below there exist 2 errors of classification one for red rectangle and one for blue circle

Regularization: we use it to avoid overfitting the data

which mean our model go through every point in the train due to more features in the data

so the solution of this is to reduce the number of parameters of the data For example, a simple way to regularize a polynomial model is to reduce the number of polynomial degrees.

Regularization algorithms :ridge,lasso and Elastic Net

SVM: support vector machine

we want to separate this tow groups which hyperplane is the best ?

a hyperplane is a line that optimally divides the data points into two different classes

so svm is a discriminative algorithms that try to find the best or the optimal hyperplane.

support vector is the most supported point in the data to the decision

Decision Tree:

decision tree is like the flowchart which internal node is the feature and branch represent decision or condition and each leaf is the output

we Select the best attribute using Attribute Selection Measures(ASM) to split the records.

we split attribute to small nodes and so on until one condition is meet

1- All the tuples belong to the same attribute value.

2-There are no more remaining attributes.

3-There are no more instances.

As we see we split our data to two groups train and test and work with train by select best attribute using Gain or gain index

information gain: which measures the impurity of the input set .if we have group of data not unique meaning that group have different types of data(apple,orange,pens ) the impurity of this group would be high

we measure impurity by Entropy and if the entropy is small this group of data is pure if Entropy = 0 it mean that it have only one group

pi=> probability of class i




Recent Posts

See All


bottom of page