EP07 · 7 min

Vectors, embeddings, cosine similarity

Understand how meaning can be represented numerically and compared efficiently.

Simple definition
Embeddings convert text or items into vectors that capture semantic relationships.
Precise definition
Embeddings are learned dense representations in vector space where geometric proximity approximates task-relevant similarity.

Objective

You will build intuition for why vector search works and where it can fail.

Core concept

Each text snippet becomes a coordinate in high-dimensional space. Similar meaning lands closer together.

Cosine similarity asks: "Do these vectors point in a similar direction?"

Worked example (online store)

For product reviews:

  • "Package arrived late" and "shipping was delayed" should be close.
  • "Great fit and color" should be far from delivery complaint vectors.

This supports semantic search and issue grouping.

For support routing, embeddings can cluster messages by intent before agents respond.

Practical caveats

  • Bad chunking can bury relevant facts.
  • Generic embedding models may miss domain terms.
  • Similarity thresholds require tuning on real examples.

Visual intuition

Imagine a map: nearby neighborhoods represent similar meanings. Vector search finds nearby neighbors, not guaranteed truth.

Three takeaways

  • Embeddings are representation tools, not final decision engines.
  • Retrieval quality is measurable and improvable.
  • Cosine similarity is a practical default for semantic comparison.

Visual Stage

Interactive walkthrough

Visual walkthrough: meaning space

Tap each region to see what vector neighborhoods represent.

Step Insight

Messages about delays and shipping issues land close in embedding space.

Common traps
  • Assuming nearest vectors always mean correct answers.
  • Ignoring domain mismatch in embedding models.
  • Using raw dot product without normalization when magnitude biases results.
Three takeaways
  • Embeddings power retrieval, recommendation, and clustering.
  • Cosine similarity compares direction rather than vector length.
  • Retrieval quality depends on both embeddings and chunk strategy.
Next lesson