Intro of Logistic Regression:

Nehajindal
4 min readOct 17, 2020

Logistic Regression

Logistic Regression is a Machine Learning algorithm which is used for the classification problems, it is a predictive analysis algorithm and based on the concept of probability.

  • It is a classification problem where your target variable is categorical
  • Unlike in Linear Regression, in Logistic regression the output required is represented in discrete values like binary 0 and 1
  • It estimates relationship between a dependent variable (target) and one or more independent variable (predictors) where dependent variable is categorical/nominal.
Loss function (Logistic Regression)
Sigmoid function (Logistic Function)

The Loss function to calculate the probability

Where,

P represents Probability of Output class

Y represents predicted output

Sigmoid function which limits its range of probabilities between 0 and 1.

What are the types of logistic regression?

  1. Binary (eg. Tumor Malignant or Benign)
  2. Multinomial Logistic Regression ( “disease A” vs “disease B” vs “disease C”)

What is the Confusion Matrix?

The confusion matrix is a type of table used to define the characteristics of Classification problems on a set of test data for which the true values are known.

Basically, it is a performance measurement for machine learning classification.

It is a table with 4 different combinations of predicted and actual values.

Confusion Matrix

It is extremely useful for measuring Recall, Precision, Specificity, Accuracy and most importantly AUC-ROC Curve.

Lets understand TP,TN,FP,FN terms.

True Positive (TP)

  • The predicted value matches the actual value
  • The actual value was positive and the model predicted a positive value

True Negative (TN)

  • The predicted value matches the actual value
  • The actual value was negative and the model predicted a negative value

False Positive (FP) — Type 1 error

  • The predicted value was falsely predicted
  • The actual value was negative but the model predicted a positive value
  • It is also known as ‘Type 1 error’

False Negative (FN) — Type 2 error

  • The predicted value was falsely predicted
  • The actual value was positive but the model predicted a negative value
  • It is also known as ‘Type 2 error’.

Precision vs. Recall

Precision tells us how many of the correctly predicted cases actually turned out to be positive.

Here’s how to calculate Precision:

This would determine whether our model is reliable or not.

Recall tells us how many of the actual positive cases we were able to predict correctly with our model.

And here’s how we can calculate Recall:

F1_Score

When we try to increase the precision of our model, the recall goes down, and vice-versa. The F1-score captures both the trends in a single value:

F1-score is a harmonic mean of Precision and Recall, and so it gives a combined idea about these two metrics. It is maximum when Precision is equal to recall.

Lets take an example for better understanding:

I have a data of 165 rows and lets think it is regarding whether a person is having corona or not?

This is a list of rates that are often computed from a confusion matrix for a binary classifier:

Accuracy: Overall, how often is the classifier correct?

  • (TP+TN)/total = (100+50)/165 = 0.91
  • It says our model is giving an accuracy of 91% whether the person is having corona or not.

Misclassification Rate: Overall, how often is it wrong?

  • (FP+FN)/total = (10+5)/165 = 0.09
  • equivalent to 1 minus Accuracy
  • also known as “Error Rate”
  • 9% of accuracy is mislead by our model.

True Positive Rate: When it’s actually yes, how often does it predict yes?

  • TP/actual yes = 100/105 = 0.95
  • also known as “Sensitivity” or “Recall”
  • TP is 100, it means 100 people are having corona out of 165 people.

False Positive Rate: When it’s actually no, how often does it predict yes?

  • FP/actual no = 10/60 = 0.17
  • FP means our machine predicted that 10 people is having corona disease but in reality they don’t have.

True Negative Rate: When it’s actually no, how often does it predict no?

  • TN/actual no = 50/60 = 0.83
  • equivalent to 1 minus False Positive Rate
  • also known as “Specificity”
  • TN means out of 165,50 people doesn’t have corona for sure.

False Negative Rate: When it’s actually Yes, how often does it predict no?

  • FN/actual no = 5/105= 0.04
  • FN means they were having corona disease but our machine predicted it wrongly.

Precision: When it predicts yes, how often is it correct?

  • TP/predicted yes = 100/110 = 0.91

Thank You All :)

--

--

Nehajindal

Meticulous & Inquisitive Data Scientist Enthusiast with hands on Python Libraries, EDA,Stats,ML |Ex-Amazonian | Daishonin Follower