EP05 · 8 min

Metrics: confusion matrix, precision/recall/F1, MAE/RMSE

Choose metrics that reflect real product risk instead of relying on one summary score.

Simple definition

Metrics are lenses that reveal different model failure modes.

Precise definition

Evaluation metrics are objective functions over predictions and ground truth designed to capture task-specific error costs.

Objective

You will learn to read model quality with context. One metric is rarely enough.

Classification metrics

For spam filtering:

True Positive: spam correctly blocked.
False Positive: real message incorrectly blocked.
False Negative: spam missed.
True Negative: real message correctly allowed.

From this matrix:

Precision = of blocked messages, how many are actually spam.
Recall = of actual spam messages, how many were blocked.
F1 balances both.

Accuracy can be misleading when spam is rare.

Regression metrics

For delivery time:

MAE: average absolute error (easy to explain).
RMSE: punishes large misses more strongly.

If late deliveries are very costly, RMSE may matter more.

Worked example (online store)

Suppose model A has higher accuracy but lower recall for fraud. If missed fraud is expensive, model B might be better despite lower accuracy.

Metrics are business decisions wearing math clothing.

Three takeaways

Always align metric choice with cost of mistakes.
Use confusion matrix for classification sanity checks.
Pair MAE/RMSE with baseline and percentile error views.

Visual Stage

Interactive walkthrough

Interactive confusion matrix lab

Move threshold to see precision/recall and confusion matrix update in real time.

Threshold55%

True Positive

False Positive

False Negative

True Negative

Precision

60%

Recall

75%

67%

Common traps

Using accuracy on imbalanced classification.
Reporting RMSE without baseline context.
Optimizing one metric while harming user outcomes.

Three takeaways

Precision and recall capture different failure costs.
Confusion matrices make tradeoffs visible.
Regression errors need scale-aware interpretation.

Next lesson

Interactive Panel

Complete the blocks to lock in the lesson.

Quiz progress: 0 / 5

Score: 0 / 5

Worked example: metric to risk

Match each scenario with the metric priority.

Draggable Terms

Targets

Prioritize precision

Drop item here

Prioritize recall

Drop item here

Regression error (MAE/RMSE)

Drop item here

Correct matches: 0 / 4

Quick check (5 questions)

Check that metric tradeoffs are clear.

1. Precision answers which question?

2. Recall is most critical when:

3. Why can accuracy mislead on imbalanced data?

4. RMSE differs from MAE because RMSE:

5. Threshold tuning primarily changes:

Score: 0 / 5 · Answered 0

Teach-back

Connect metric choice to business risk.

In one or two sentences, explain why accuracy alone is risky for spam filtering.