EP07 · 7 min

Vectors, embeddings, cosine similarity

Understand how meaning can be represented numerically and compared efficiently.

Simple definition

Embeddings convert text or items into vectors that capture semantic relationships.

Precise definition

Embeddings are learned dense representations in vector space where geometric proximity approximates task-relevant similarity.

Objective

You will build intuition for why vector search works and where it can fail.

Core concept

Each text snippet becomes a coordinate in high-dimensional space. Similar meaning lands closer together.

Cosine similarity asks: "Do these vectors point in a similar direction?"

Worked example (online store)

For product reviews:

"Package arrived late" and "shipping was delayed" should be close.
"Great fit and color" should be far from delivery complaint vectors.

This supports semantic search and issue grouping.

For support routing, embeddings can cluster messages by intent before agents respond.

Practical caveats

Bad chunking can bury relevant facts.
Generic embedding models may miss domain terms.
Similarity thresholds require tuning on real examples.

Visual intuition

Imagine a map: nearby neighborhoods represent similar meanings. Vector search finds nearby neighbors, not guaranteed truth.

Three takeaways

Embeddings are representation tools, not final decision engines.
Retrieval quality is measurable and improvable.
Cosine similarity is a practical default for semantic comparison.

Visual Stage

Interactive walkthrough

Visual walkthrough: meaning space

Tap each region to see what vector neighborhoods represent.

Step Insight

Messages about delays and shipping issues land close in embedding space.

Common traps

Assuming nearest vectors always mean correct answers.
Ignoring domain mismatch in embedding models.
Using raw dot product without normalization when magnitude biases results.

Three takeaways

Embeddings power retrieval, recommendation, and clustering.
Cosine similarity compares direction rather than vector length.
Retrieval quality depends on both embeddings and chunk strategy.

Next lesson

Interactive Panel

Complete the blocks to lock in the lesson.

Quiz progress: 0 / 5

Score: 0 / 5

Worked example: nearest intent bucket

Match each incoming message to its closest intent bucket.

Draggable Terms

Targets

Shipping complaint

Drop item here

Product quality praise

Drop item here

Billing concern

Drop item here

Correct matches: 0 / 4

Quick check (5 questions)

Confirm embedding fundamentals.

1. Embeddings primarily represent:

2. Cosine similarity compares:

3. High similarity guarantees factual correctness?

4. Chunking quality matters because:

5. A domain mismatch in embedding model may cause:

Score: 0 / 5 · Answered 0

Teach-back

Summarize embeddings in plain language.

How would you explain cosine similarity to a product manager?