Computers process numerical information. Therefore, language must be converted into mathematical form before machines can analyze it.
A vector is an ordered list of numbers representing features.
Example:
King → [0.2, 0.8, 0.5]
Queen → [0.21, 0.79, 0.52]
Vectors can be visualized as points in multi-dimensional space. Distance between points measures similarity.
Vector arithmetic captures semantic relationships:
King − Man + Woman ≈ Queen
Large language models convert entire sentences into high-dimensional vectors known as embeddings.
Suppose we map the words Dog, Cat, Lion, and Car into 2D space. Dog, Cat, and Lion cluster together (animals), while Car is positioned far away.
Vectors allow computers to mathematically represent meaning. Similarity in vector space corresponds to similarity in language.