Intro of Logistic Regression:
Logistic Regression
Logistic Regression is a Machine Learning algorithm which is used for the classification problems, it is a predictive analysis algorithm and based on the concept of probability.
- It is a classification problem where your target variable is categorical
- Unlike in Linear Regression, in Logistic regression the output required is represented in discrete values like binary 0 and 1
- It estimates relationship between a dependent variable (target) and one or more independent variable (predictors) where dependent variable is categorical/nominal.
The Loss function to calculate the probability
Where,
P represents Probability of Output class
Y represents predicted output
Sigmoid function which limits its range of probabilities between 0 and 1.
What are the types of logistic regression?
- Binary (eg. Tumor Malignant or Benign)
- Multinomial Logistic Regression ( “disease A” vs “disease B” vs “disease C”)
What is the Confusion Matrix?
The confusion matrix is a type of table used to define the characteristics of Classification problems on a set of test data for which the true values are known.
Basically, it is a performance measurement for machine learning classification.
It is a table with 4 different combinations of predicted and actual values.
It is extremely useful for measuring Recall, Precision, Specificity, Accuracy and most importantly AUC-ROC Curve.
Lets understand TP,TN,FP,FN terms.
True Positive (TP)
- The predicted value matches the actual value
- The actual value was positive and the model predicted a positive value
True Negative (TN)
- The predicted value matches the actual value
- The actual value was negative and the model predicted a negative value
False Positive (FP) — Type 1 error
- The predicted value was falsely predicted
- The actual value was negative but the model predicted a positive value
- It is also known as ‘Type 1 error’
False Negative (FN) — Type 2 error
- The predicted value was falsely predicted
- The actual value was positive but the model predicted a negative value
- It is also known as ‘Type 2 error’.
Precision vs. Recall
Precision tells us how many of the correctly predicted cases actually turned out to be positive.
Here’s how to calculate Precision:
This would determine whether our model is reliable or not.
Recall tells us how many of the actual positive cases we were able to predict correctly with our model.
And here’s how we can calculate Recall:
F1_Score
When we try to increase the precision of our model, the recall goes down, and vice-versa. The F1-score captures both the trends in a single value:
F1-score is a harmonic mean of Precision and Recall, and so it gives a combined idea about these two metrics. It is maximum when Precision is equal to recall.
Lets take an example for better understanding:
I have a data of 165 rows and lets think it is regarding whether a person is having corona or not?
This is a list of rates that are often computed from a confusion matrix for a binary classifier:
Accuracy: Overall, how often is the classifier correct?
- (TP+TN)/total = (100+50)/165 = 0.91
- It says our model is giving an accuracy of 91% whether the person is having corona or not.
Misclassification Rate: Overall, how often is it wrong?
- (FP+FN)/total = (10+5)/165 = 0.09
- equivalent to 1 minus Accuracy
- also known as “Error Rate”
- 9% of accuracy is mislead by our model.
True Positive Rate: When it’s actually yes, how often does it predict yes?
- TP/actual yes = 100/105 = 0.95
- also known as “Sensitivity” or “Recall”
- TP is 100, it means 100 people are having corona out of 165 people.
False Positive Rate: When it’s actually no, how often does it predict yes?
- FP/actual no = 10/60 = 0.17
- FP means our machine predicted that 10 people is having corona disease but in reality they don’t have.
True Negative Rate: When it’s actually no, how often does it predict no?
- TN/actual no = 50/60 = 0.83
- equivalent to 1 minus False Positive Rate
- also known as “Specificity”
- TN means out of 165,50 people doesn’t have corona for sure.
False Negative Rate: When it’s actually Yes, how often does it predict no?
- FN/actual no = 5/105= 0.04
- FN means they were having corona disease but our machine predicted it wrongly.
Precision: When it predicts yes, how often is it correct?
- TP/predicted yes = 100/110 = 0.91
Thank You All :)