EP05 · 8 min

Metrics: confusion matrix, precision/recall/F1, MAE/RMSE

Choose metrics that reflect real product risk instead of relying on one summary score.

Simple definition
Metrics are lenses that reveal different model failure modes.
Precise definition
Evaluation metrics are objective functions over predictions and ground truth designed to capture task-specific error costs.

Objective

You will learn to read model quality with context. One metric is rarely enough.

Classification metrics

For spam filtering:

  • True Positive: spam correctly blocked.
  • False Positive: real message incorrectly blocked.
  • False Negative: spam missed.
  • True Negative: real message correctly allowed.

From this matrix:

  • Precision = of blocked messages, how many are actually spam.
  • Recall = of actual spam messages, how many were blocked.
  • F1 balances both.

Accuracy can be misleading when spam is rare.

Regression metrics

For delivery time:

  • MAE: average absolute error (easy to explain).
  • RMSE: punishes large misses more strongly.

If late deliveries are very costly, RMSE may matter more.

Worked example (online store)

Suppose model A has higher accuracy but lower recall for fraud. If missed fraud is expensive, model B might be better despite lower accuracy.

Metrics are business decisions wearing math clothing.

Three takeaways

  • Always align metric choice with cost of mistakes.
  • Use confusion matrix for classification sanity checks.
  • Pair MAE/RMSE with baseline and percentile error views.

Visual Stage

Interactive walkthrough

Interactive confusion matrix lab

Move threshold to see precision/recall and confusion matrix update in real time.

Threshold55%

True Positive

3

False Positive

2

False Negative

1

True Negative

2

Precision

60%

Recall

75%

F1

67%

Common traps
  • Using accuracy on imbalanced classification.
  • Reporting RMSE without baseline context.
  • Optimizing one metric while harming user outcomes.
Three takeaways
  • Precision and recall capture different failure costs.
  • Confusion matrices make tradeoffs visible.
  • Regression errors need scale-aware interpretation.
Next lesson