Explain the concept of logistic regression.
Logistic regression is a very popularly utilized classification technique in the credit and risk industry for checking the probability of default problems. The challenges which are faced nowadays by credit and risk departments are due to the black box nature of machine learning models, which is the prime reason of slowing down the usage of advanced models in this space.
The algorithm of logistic regression is based on maximum likelihood estimation which estimates the parameters of a given model observations by finding the parameter values that maximize the likelihood of given observations.
It is used in classification problems which are binary in nature. Some of the examples include
- Spam vs ham emails
- Loan default(yes/no)
- Disease diagnosis(benign/malignant)
It tries to predict discrete categories which are classified by having classes like 0 and 1.It is based on sigmoid function which takes in any value and outputs it to be between 0 and 1. The below representation shows how a sigmoid function works.
We can set a cutoff point at 0.5, anything below it result in class 0, anything above is class 1.
The red line shown is the threshold value which is used to assign a class.