Objective
You will trace one complete loop from raw data to model decision. The goal is to know exactly where to debug when performance drops.
Workflow overview
- Collect data.
- Clean and structure data.
- Split into features (X) and target (y).
- Train a model on historical examples.
- Run inference on new, unseen examples.
- Evaluate with metrics aligned to business risk.
Worked example (online store)
For spam detection:
- Data: historical support messages.
- X: message text length, sender domain, presence of suspicious phrases.
- y: spam/not spam label.
- Training: learn weights from historical labeled messages.
- Inference: classify a new incoming message in real time.
- Evaluation: precision/recall tradeoff because both false positives and false negatives hurt.
For delivery time:
- X: distance, courier load, weather conditions.
- y: actual delivery hours.
Different target, same workflow.
Why teams fail here
Many teams jump from "we have data" to "let's deploy". They skip labeling quality checks, split strategy, and metric choice. That creates dashboards with nice numbers and poor user outcomes.
Quick check guidance
In the quiz, ask: "Am I in training mode or inference mode?" If you cannot answer that quickly, your architecture is probably tangled.
Three takeaways
- Treat data flow as a product surface, not hidden plumbing.
- Inference should mirror real-world conditions, not training shortcuts.
- Evaluation must connect to user impact and business constraints.