Objective
You will learn to read model quality with context. One metric is rarely enough.
Classification metrics
For spam filtering:
- True Positive: spam correctly blocked.
- False Positive: real message incorrectly blocked.
- False Negative: spam missed.
- True Negative: real message correctly allowed.
From this matrix:
- Precision = of blocked messages, how many are actually spam.
- Recall = of actual spam messages, how many were blocked.
- F1 balances both.
Accuracy can be misleading when spam is rare.
Regression metrics
For delivery time:
- MAE: average absolute error (easy to explain).
- RMSE: punishes large misses more strongly.
If late deliveries are very costly, RMSE may matter more.
Worked example (online store)
Suppose model A has higher accuracy but lower recall for fraud. If missed fraud is expensive, model B might be better despite lower accuracy.
Metrics are business decisions wearing math clothing.
Three takeaways
- Always align metric choice with cost of mistakes.
- Use confusion matrix for classification sanity checks.
- Pair MAE/RMSE with baseline and percentile error views.