11/06/2025 - Articles
Vectorization in AI — How words become numbers
What happens when artificial intelligence turns words into numbers? Behind modern language models lies an inconspicuous but central principle: vectorization. It translates language, images, and sounds into mathematical structures, making understanding, searching, and generation possible in the first place. In this article, you will learn what exactly vectorization in AI means, how embeddings are created, why AI systems need numerical representations instead of words, and how these vectors are used today in search engines, RAG systems, and recommendation systems. Using a vivid example from the Middle Ages, I will show step by step how semantic relationships are converted into mathematical objects, from one-hot encoding to Word2Vec to contextual embeddings in modern transformer models.
Vectorization in AI: How words become numbers?
How does artificial intelligence manage to “understand” language? The answer lies in a mathematical trick: vectorization. Instead of treating words as symbols, AI transforms them into points in a space—and recognizes similarities in meaning, analogical relationships, and contexts. These points—i.e., the numerical representations of the words—are called embeddings. Each embedding is a vector that expresses the meaning of a word, sentence, or document in numbers. This creates a “semantic space” in which proximity expresses similarity in content.
To get an initial idea of how to visualize such an “embedding,” let's travel back in time to the Middle Ages. The monastery library is to be vectorized. The monks have decided to capture the meaning of every word in their vocabulary (they know just over 100,000 words) by assigning a weighted score from 512 carefully chosen adjectives.
For each word, they assign values (weights) between 0 and 1 that express how strongly the word is pronounced on this axis.

Example
Cat → 0.9 “animalistic,” 0.8 “friendly,” 0.2 ‘holy’
Monk → 0.1 “animalistic,” 0.2 “friendly,” 0.9 “holy”
→ This results in a series of numbers (vector) with 512 values for each word.
These values are determined by surveying the monks.
If this works, you would expect the following:
Similar words (cat, kitty, feline) should result in similar vectors. In mathematical terms: the product of two similar vectors should be large, and the product of two dissimilar vectors should be small.
It should be possible to perform “semantic calculations” with the vectors. If you assign vectors to the terms man, woman, nun, and monk, then the result of the vector calculation monk − man + woman should yield the vector for nun.
How “semantic computing” works with vectors
The following small numerical example shows how such semantic analogies can be represented mathematically: four terms are represented as vectors on four semantic axes, and the calculation monk − man + woman is compared. To make the semantic proximity measurable, the vectors are additionally normalized and compared using scalar products – these values show how closely the calculated vector for ‘nun’ corresponds to the original vector:
| Property | Vector values | Scalar products | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Monk | Nun | Woman | Man | Calculated Nun: Monk − Man + Woman | Monk × Monk | Monk × Nun | Monk × Woman | Monk × Man | Calculated Nun × Nun | |
| Piety | 0,99 | 0,99 | 0,5 | 0,2 | 1,29 | 0,498 | 0,514 | 0,316 | 0,140 | 0,565 |
| Masculinity | 0,99 | 0,05 | 0,01 | 0,99 | 0,01 | 0,498 | 0,026 | 0,006 | 0,691 | 0,000 |
| Femininity | 0,05 | 0,93 | 0,999 | 0,005 | 1,044 | 0,001 | 0,024 | 0,032 | 0,000 | 0,430 |
| Age | 0,07 | 0,06 | 0,05 | 0,05 | 0,07 | 0,002 | 0,002 | 0,002 | 0,002 | 0,002 |
| Standardization (= vector length) | 1,403 | 1,361 | 1,118 | 1,011 | 1,661 | 1,00 | 0,57 | 0,36 | 0,83 | 0,997 |
| Similarity, where 1 = identical | ||||||||||
This thought experiment shows that vectorization translates language into a form that makes semantic proximity measurable. This is precisely the idea behind modern embeddings. Modern AI models generate such embeddings by learning from huge amounts of text instead of laboriously determining each dimension by hand, as our monks did. This creates the basis on which transformer models, search engines, recommendation systems, and RAG workflows can understand, compare, and combine meanings.
Why is this so powerful? Because vectors are comparable mathematically. Their similarity can be measured, for example, using the scalar product (dot product) – precisely the operations used in the transformer mechanism (attention). This allows AI to recognize, weight, and combine relationships between words (token vectors) – all based on linear algebra.

What is vectorization in AI?
In AI, vectorization refers to the process of translating symbols—such as words, tokens, image sections, or audio snippets—into numerical vectors. These vectors, also known as embeddings, are the mathematical representation of meaning. They form the basis for neural networks to process and “understand” language, images, or sounds in the first place. Put simply, an AI model assigns each word a point in a multidimensional space. Words that occur in similar contexts – such as cat and house cat – are close to each other in this space. This creates a semantic vector space in which geometric proximity reflects similarity in content.
All blog articles on the main topic of AI: AI knowledge (1-4) and AI at Projektron (5-8)
A small practical example: One-hot, network, embeddings
Now let's automate the process. The basic idea is to determine the vectors by analyzing neighboring words. To understand how an embedding is created in this way, let's look at a greatly simplified example.
1. Text corpus and vocabulary
First, we need as large a text corpus as possible that we can analyze for neighboring words. In the next step, we create a vocabulary from this. We will use the vocabulary to train a neural network later on. Here is a simplified example with a text corpus consisting of only three sentences:
The dog saw a cat. The dog chased the cat. The cat climbed a tree. |
|---|
In the next step, we create a vocabulary from this. In our example (in alphabetical order), it looks like this:
|
|---|
2. One-hot encoding
Each word is represented by a so-called one-hot vector. This vector is as long as the vocabulary itself and contains a “1” at the position of the word, otherwise only zeros. Example:
cat (0,1,0,0,0,0,0,0) climbed (0,0,0,1,0,0,0,0) |
|---|
Now we need a collection of neighboring words as training data. The idea was to determine the vectors by analyzing neighboring words. Here is the beginning of a collection based on our text corpus:
3. Training a neural network
Now it's time for training. To do this, we build a neural network. It consists of three layers: the input layer, the middle (hidden) layer, and the output layer. The number of neurons in the outer two layers is fixed: it corresponds to the size of our vocabulary, i.e., 8. We can determine the number of neurons in the hidden layer. It corresponds to the size of the vector with which we want to describe our vocabulary semantically. Here, we have chosen 3.
The neurons in successive layers are fully connected. The strength of the connections (weights) indicates the factor by which the value of a neuron is passed on to a neuron in the following layer. At the beginning, the weights are initialized with random values. The network is then trained (i.e., the weights are adjusted) to predict the second word of a neighboring pair based on the first word. In the following image, the word “rescued” is to be predicted for “cat.”
Explanation
- The weighting coefficient (WC) is the numerical value that a neural network assigns to a specific input.
- It reflects how strongly or probably the model considers the respective term to be a word partner for the given input signal.
- This WC may be followed by a correction value – such as an error measure, gradient, loss value, or an adjustment in the backpropagation step.
At the beginning of training, our model calculates a vector for “cat” that does not match the desired vector for “rescued” very well. The difference between the prediction and the expectation flows into an error function. Over many training steps (backpropagation) with all word pairs, the network learns which word combinations occur frequently together and how to predict them better. Since different words can be neighbors of the word cat, the prediction will not be a one-hot vector, but will have values at all 8 positions.
In our text corpus, the combination of “cat” and ‘the’ occurs frequently – so when “cat” is entered, the second entry in the output vector should be quite high after training. When “dog” is entered, the eighth entry in the output vector should be zero, since the combination ‘dog’ and “itself” does not occur in the text corpus and therefore should not be predicted.
4. Extraction of the embeddings
After training, the vector from the middle layer can be read out for each word – this is the embedding. These values form the position of the word in semantic space, e.g.:
- cat → (0.519, 0.434, 0.047)
This simple model corresponds to the early Word2Vec approaches (CBOW, Skip-Gram). Modern systems—such as BERT or Transformer models—extend the principle to include context dependency (a word (or token) is not isolated, but interpreted in its linguistic environment. The embedding depends on the sentence, neighboring words, and syntactic relationships), but the idea remains the same:
Language gives rise to vectors, vectors describe meaning. |
|---|
Without vectorization, AI would not be able to understand semantic relationships, respond contextually, or compare information in vector space.
Why do AI models need numbers instead of words?
In short: Because artificial intelligence calculates with numbers – not symbols.
Neural networks perform linear algebra operations. Words as text are unsuitable for this. They must first be converted into a numerical form.
This is exactly what vectorization in AI does.
Without vectorization, there would be no transformer models, no attention mechanisms, and no semantic search. The conversion of symbols into vectors is the bridge between language and mathematics, between meaning and calculation.
Where is vectorization used in AI in practice?
Vectors are a universal tool and appear in many areas of AI:
Search engines & semantic search: Documents and search queries are stored in bundles in vectors; instead of pure keyword searches, similarity searches are used in vector space indexes (e.g., ANN index).
RAG (retrieval-augmented generation): In retrieval-based text generation, a vector index searches for matching knowledge chunks, which are then passed to a language model as context.
Text classification & clustering: Embeddings facilitate the training of classifiers and the recognition of natural groupings.
Recommendation systems: Item and user embeddings enable calculations such as “user X likes objects that are close to Y.”
Position vectors in transformer architectures: Token vectors also carry position encodings—either sinusoidal or as learned vectors—so that the order in sequences remains recognizable.
Multimodal systems: Images and audio are also embedded in vector spaces (patch embeddings, audio frames), enabling cross-modal retrieval and multimodal models.
Vector databases / ANN indexes: Technologies such as FAISS, Annoy, or HNSW specialize in efficient neighborhood search of large embedding collections.
Projektron's experience with vectorization in AI
At Projektron, we use vectorization to support AI functions in BCS—for example, in software help. To do this, we use proven libraries that convert text from online help and other sources into so-called embeddings (numerical vectors). These embeddings form the basis for semantic search functions and the use of retrieval-augmented generation (RAG). This allows AI to access existing knowledge and provide more targeted, contextually relevant answers.
One of the first practical applications is the AI-supported help assistant in BCS. Help documents are broken down into sections (“text splits”), vectorized, and stored in a vector database. When a user query is received, it is also translated into a vector and compared with the stored text vectors. The most similar hits in terms of content are then passed on to the language model as context. In this way, the system combines semantic search with generative AI and provides users with precise, explanatory answers.
We tested several variants to select a suitable embedding model. The OpenAI model text-embedding-ada served as a benchmark, but it is only available in the cloud. We also tested various locally operable models, from which the BAAI/bge-m3 model from the Beijing Academy of Artificial Intelligence emerged as the best local solution for us. While cloud models tend to achieve higher accuracy, local models offer advantages in terms of data protection and integration into existing systems.
An important finding from the testing and improvement cycles was the importance of proper text preparation: the size of the text sections has a significant impact on the quality of the results. In order to better answer context-free questions, an extension (“Parent Document Retrieval”) was implemented, which draws on entire documents as context when necessary.
Vectorization and semantic search now form the technical basis of our AI framework and thus also the foundation for future applications in BCS, such as support for contracts, process instructions, or other knowledge sources.
Conclusion: Vectorization—the foundation of our AI approaches
Vectorization is the central technique that converts language, images, or other complex content into a numerical form that neural networks can understand. It translates meaning into a mathematically processable form. Words, sentences, or entire documents become points in a multidimensional space, whose distances express semantic proximity.
From early methods such as bag-of-words or Word2Vec, in which fixed word vectors are trained, to modern context-dependent embeddings from transformer models, the basic idea remains the same: meaning arises from patterns in number spaces.
The example of the “medieval monastery” illustrates this principle clearly: when terms are described by weighted properties (e.g., location, size, purpose, age), a numerical signature is created that can be used to measure and combine similarities. Modern AI methods continue this idea, only automated, scaled, and mathematically optimized.
Today, vectorization forms the basis of many AI applications: from semantic search and text classification to recommendation systems and generative assistants such as RAG. It makes knowledge machine-readable, thereby creating the foundation on which AI can interact meaningfully with human language.

About the author
Dr. Marten Huisinga heads teknow GmbH, a platform for laser sheet metal cutting. In the future, AI methods will simplify the offering for amateur customers. Huisinga was one of the three founders and, until 2015, co-managing director of Projektron GmbH, for which he now works as a consultant. As DPO, he is responsible for implementing the first AI applications in order to assess the benefits of AI for BCS and Projektron GmbH.
More interesting articles on the Projektron blog

Product management at Projektron
How does software remain successful for 25 years? Projektron BCS shows that continuous updates, user feedback, and modern technologies ensure long-term success. Learn how product management works at Projektron.

Use cases for AI in BCS
Step by step, an AI ecosystem is emerging at BCS that is making everyday work noticeably easier. The article shows which use cases are already productive and which functions are still to come.

AI-Help in BCS
Since version 25.3, the new BCS AI user help has been providing precise answers to questions about Projektron documentation. The article shows how iterative optimizations in retrieval and splitting have significantly improved the quality of responses.

Choosing PM software
If your SME or company is about to choose project management software, you probably don't know where to start looking for the right PM tool for you. This guide will guide you through the PM software market and lead you to the right decision in 9 steps.








![Visualization of neural network training: Left, the one-hot encoding of the input word „cat“, followed by input layer, hidden layer, and output layer. Then the weight coefficient (WC) is shown, the target word „climbed“ as a one-hot vector, and the adjustment column for backpropagation. [Translate to Englisch:] Illustration of neural network training: Input token „cat“ in one-hot encoding, three layers of the network (input layer, hidden layer, output layer), weight coefficient (WC), target word „climbed“ as one-hot vector, and column for weight adjustment in backpropagation.](/fileadmin/_processed_/7/e/csm_bcs-vektrorisierung-neuronales-netz-en_c75cd2977a.png)
