What are logits in machine learning?
"One common mistake that I would make is adding a non-linearity to my logits output." What does the term "logit" mean here or what does it represent ?
Data Science What does Logits in machine learning mean? Asked 4 years, 8 months ago Modified 5 months ago Viewed 33k times
41
"One common mistake that I would make is adding a non-linearity to my logits output."What does the term "logit" mean here or what does it represent ?
machine-learning deep-learning Share Improve this question
Follow
asked Apr 30, 2018 at 14:55
Rajat's user avatar
Rajat
94722 gold badges88 silver badges1010 bronze badges
Add a comment
3 Answers
Sorted by:
Highest score (default)
57
Logits interpreted to be the unnormalised (or not-yet normalised) predictions (or outputs) of a model. These can give results, but we don't normally stop with logits, because interpreting their raw values is not easy.
Have a look at their definition to help understand how logits are produced.
Let me explain with an example:
We want to train a model that learns how to classify cats and dogs, using photos that each contain either one cat or one dog. You build a model and give it some of the data you have to approximate a mapping between images and predictions. You then give the model some of the unseen photos in order to test its predictive accuracy on new data. As we have a classification problem (we are trying to put each photo into one of two classes), the model will give us two scores for each input image. A score for how likely it believes the image contains a cat, and then a score for its belief that the image contains a dog.
Perhaps for the first new image, you get logit values out of 16.917 for a cat and then 0.772 for a dog. Higher means better, or ('more likely'), so you'd say that a cat is the answer. The correct answer is a cat, so the model worked!
For the second image, the model may say the logit values are 1.004 for a cat and 0.709 for a dog. So once again, our model says the image contains a cat. The correct answer is once again a cat, so the model worked again!
Now we want to compare the two results. One way to do this is to normalize the scores. That is, we normalize the logits! Doing this we gain some insight into the confidence of our model.
Let's using the softmax, where all results sum to 1 and so allow us to think of them as probabilities:
σ(z)j=ezj∑Kk=1ezkforj=1,…,K.
For the first test image, we get
prob(cat)=exp(16.917)exp(16.917)+exp(0.772)=0.9999
prob(dog)=exp(0.772)exp(16.917)+exp(0.772)=0.0001
If we do the same for the second image, we get the results:
prob(cat)=exp(1.004)exp(1.004)+exp(0.709)=0.5732
prob(dog)=exp(0.709)exp(1.004)+exp(0.709)=0.4268
The model was not really sure about the second image, as it was very close to 50-50 - a guess!
The last part of the quote from your question likely refers to a neural network as the model. The layers of a neural network commonly take input data, multiply that by some parameters (weights) that we want to learn, then apply a non-linearity function, which provides the model with the power to learn non-linear relationships. Without this non-linearity, a neural network would simply be a list of linear operations, performed on some input data, which means it would only be able to learn linear relationships. This would be a massive constraint, meaning the model could always be reduced to a basic linear model. That being said, it is not considered helpful to apply a non-linearity to the logit outputs of a model, as you are generally going to be cutting out some information, right before a final prediction is made.