Why is F1-Score better than precision and recall in evaluating NER?
Precision works fine when false positive cases are high. On the other hand, recall works fine when false negative cases are high. Since many domain cases such as medical, educational or legal fields are delicate where evaluating entities with such measures can cost more risk. In such case, F1-Score performs better as it is the only measure that maintains a balance between precision and recall.
The below formula shows how F1 score works
In deep learning algorithms, evaluation can be measured from iterations or epochs after training a model. But due to some delicate business domains, measures like precision and recall can lead to business or evaluation risk which is why F1-score is considered because it gives a balance between precision and recall.