11/19/2025 - Articles
RAG (Retrieval-Augmented Generation): How AI applications benefit from it
Language models such as GPT, BERT, and LLaMA have impressive language comprehension skills. They can summarize, formulate, or translate texts—and even show a touch of style awareness. But one crucial disadvantage remains: the knowledge of these models ends at the point of training. So they still know who discovered America, but not who just won the soccer championship or what new AI developments are currently making the rounds. Company-specific information is also left out. For example, a standard model cannot answer questions about the latest BCS releases. This is where RAG (Retrieval-Augmented Generation) comes into play. RAG gives the language model access to current or subject-specific information. This means that AI can not only draw on its pre-trained knowledge, but also on current data that is specifically relevant to the task at hand.
What is RAG?
Retrieval-Augmented Generation (RAG) is an architectural approach that combines the power of large language models (LLMs) with external knowledge sources. Instead of relying solely on the knowledge stored in the model, it can retrieve additional information from a database or document collection as needed and use it as context for generating responses. This makes the generated responses more accurate, up-to-date, and tailored to the specific application area.
With RAG, a query to the language model is answered in two steps:
| 1. | Retrieval phase: First, a retrieval module identifies contextually relevant documents or text passages from a stored knowledge base. |
| 2. | Generation phase: This information is then transferred together with the original query to the language model, which then formulates a suitable response. |
The system prompt controls how the model uses the texts it finds, i.e., whether it quotes from them, summarizes them, or simply includes them as background knowledge.

Definition: Retrieval-Augmented Generation (RAG)
Retrieval-Augmented Generation (RAG) refers to a process in which a language model automatically retrieves relevant information from an external knowledge source in addition to its query and incorporates it into the response generation. This results in more accurate, up-to-date, and context-relevant responses without the need to retrain the model.
RAG in BCS
In our implementation, the language model automatically receives selected texts from a maintained database in addition to the user's question. These documents have been indexed in advance so that they can be retrieved dynamically as needed. The model uses them as contextual information to tailor the response to the specific content of the software help or other knowledge sources. A practical example of its use is the integration of RAG into the BCS KI Help. Here, RAG was used to provide users with precise answers to specific questions by retrieving relevant documents from the software help and incorporating them into the answer generation.
Distinction and difference between RAG and classical language models
Classic language models are based on a fixed training data set. Their knowledge is limited to the time of the last training session. To expand such a model with new knowledge, you can perform what is known as fine-tuning: This involves retraining the existing model with a smaller, user-specific data set. However, this process is complex, resource-intensive, and only leads to a new static level of knowledge. New or changed information after the end of training remains unknown to the model. In addition, fine-tuning can have unintended side effects, such as a deterioration in general language comprehension. If a new model generation is introduced, fine-tuning must be performed again.
RAG is a more flexible alternative: the model remains unchanged, but can access current or domain-specific information at any time via the retrieval module. This eliminates the need for training and allows the system to take continuously updated content into account. The actual language model can be easily replaced with a more powerful version. In our variant, this means that users receive precise and up-to-date answers to their questions based on verified texts from the stored database without the language model itself having to be retrained.
The advantages of RAG at a glance:
Up-to-dateness: Access to the latest information without model retraining.
Flexibility: Adaptation to specific domains or areas of knowledge through targeted retrieval of relevant data.
Efficiency: Avoids the need for comprehensive fine-tuning of the model.
All blog articles on the main topic of AI: AI knowledge (1-4) and AI at Projektron (5-8)
How does RAG work? The two-stage architecture
Retrieval-Augmented Generation (RAG) is one of the most elegant answers to a core problem of modern AI systems: How can language models not only work with their trained knowledge, but also access current, external information? The solution lies in an architecture that combines two disciplines: retrieval (the targeted retrieval of relevant information) and generation (the creation of natural, context-related texts). The system works in two separate but closely interlinked phases: the indexing phase and the inference phase.
The diagram illustrates the process of Retrieval-Augmented Generation (RAG):
1. Indexing: Documents are converted into semantic vectors by an encoder and stored in a vector database.
2. Inference phase: A question is also converted into a query vector.
The vector database determines the most relevant documents, which are then sent to a large language model (LLM) together with the original question and an instruction as context.
The LLM generates the final answer based on this information.
Indexing phase: Creating a knowledge base
In this first phase, the foundation is laid on which all subsequent responses from the RAG system will be based. A structured, searchable knowledge base is created from the existing documents (FAQs, technical documentation, or research reports). At Projektron, for example, the data set consists of approximately 2,500 HTML documents from the BCS software help. From these texts, we create a vector index that maps the semantic relevance of the individual text parts.
This phase can be divided into three central steps:
Text preparation
Embedding
Construction of the vector index
With ChunkViz v0.1, you can try out for yourself the method of dividing text into “splits” or ‘chunks’ that is used in retrieval methods. The screenshot shows a graphical representation of how a text is divided into smaller blocks, known as “chunks.” For texts such as software help files, which are highly structured and pre-structured, effective semantic coding can be achieved by creating splits of constant length and with overlap.
- The text reads: “This is the text I want to divide into sections. It is an example text.”
- The 92 characters are divided into 3 blocks (chunks).
- The “chunk size” is set to 35 characters, with an overlap (“chunk overlap”) of 4 characters between the blocks.
This so-called text splitting is a crucial step that is often underestimated.
- If the splits are too long, their semantic meaning becomes diluted. The model then no longer clearly recognizes what the text section is actually about.
- If they are too short, the context is missing. The AI loses connections and can hardly make meaningful references.
Various strategies have become established in research and practice:
- Recursive splitting by characters or paragraphs (e.g., at OpenAI or LangChain)
- Semantic splitting based on sentence relationships (e.g., Sentence Transformers)
- Heuristic splits with overlap, as used in corporate environments or at Projektron, for example
A text such as the BCS AI Help is semantically limited, highly structured, and pre-structured. Recursive splitting then generates many very short chunks, e.g., from headings and subheadings, which provide too little context for retrieval. Semantic splitting does not seem to be able to determine good text boundaries in texts that deal with only one topic (“how do I use BCS”).
After several attempts, splits of constant length with slight overlap have proven to be effective. The length is chosen so that each text fragment retains sufficient context without losing semantic sharpness.
Indexing phase Step 2: Embedding the splits: Meaning becomes mathematics
Once the texts have been broken down into manageable units, the actual “magic” step follows: embedding. This involves converting each text split into a vector, i.e., a series of numbers that represent the semantic meaning of the text.
These vectors are not simple keywords, but multidimensional mathematical representations of the content. They enable the system to recognize semantically similar content later on, even if the words used are different.
Example:
The phrases “How do I start a new project?” and “Create project” have hardly any words in common, but they have a very similar meaning. In vector space, their embeddings are therefore close together.
Typical embedding models are:
text-embedding-ada-002 from OpenAI: one of the most widely used models for semantic text representation
BAAI/bge-m3: a locally installable model that meets data protection requirements
JINA Embeddings or Cohere Embed: alternatives with strong performance for large amounts of data
Good embedding is the basis for accurate retrieval. The more accurately the semantic relationships are mapped, the better the system can recognize relevant content.
Indexing phase Step 3: Building the vector index
Once all text splits have been converted into vectors, they are stored in a vector database. Traditional databases (SQL, NoSQL) are unsuitable for such tasks because they cannot calculate semantic similarities. Instead, specialized systems such as FAISS (Facebook AI Similarity Search), Milvus, Pinecone, or Weaviate are used.
These databases are optimized to find the vectors that are most similar to a search vector, i.e., that are semantically the best fit, in billions of vectors at lightning speed. The result is a vector index that forms the “memory” of the RAG application, so to speak.
At the end of this phase, the system has a highly organized knowledge base that allows it to find specific information for each user query.
Inference phase: Knowledge meets language
The second phase is where RAG really comes into its own: The system generates answers.
| 1. | When a user asks a question, it is also translated into a query vector using the same embedding model. |
| 2. | This vector is passed to the vector database, which selects the most semantically similar text splits, i.e., those text fragments that are most closely related to the question in terms of content. |
| 3. | These chunks are then passed on to the language model (e.g., GPT-4, Claude, Mistral, or Llama 3) together with the question and a system prompt. |
| 4. | The system prompt specifies how the AI should use the information provided, i.e., whether it may quote, summarize, or paraphrase, for example. |
| 5. | The language model then combines the contextual information with its general understanding of language and generates a precise, coherent, and verifiable response. |
Practical examples for RAG
Chatbots and customer service
RAG-based chatbots are the backbone of modern self-service portals. Companies such as Microsoft, IBM, and ServiceNow use this technology to automatically answer support requests with documented expertise. This allows customers to ask questions around the clock and receive answers based on real company documents, not the model's imagination.
In Microsoft's Copilot for Dynamics 365 or ServiceNow's Now Assist, for example, RAG ensures that support texts, guidelines, and internal documentation are dynamically, transparently, and up-to-date incorporated into responses.
Knowledge management
RAG is also revolutionizing the way companies work with corporate knowledge in internal knowledge management. Employees can ask questions in natural language: “How do we approve new software releases?”
The system automatically searches internal documents, manuals, and emails to formulate a concise answer, often including a reference to the original source. Especially in large organizations with a constantly growing volume of information, RAG creates an invaluable advantage here: knowledge becomes searchable, findable, and usable in real time.
Advantages of RAG over the standalone language model
RAG offers decisive advantages over conventional, purely generative language models, especially when it comes to providing precise, verifiable, and up-to-date answers to complex questions. While a standalone model such as GPT or Llama is limited to its static training knowledge, RAG dynamically expands this knowledge with relevant external information.
More precise and context-related answers
Without RAG, a language model can answer any question, but the answers are based solely on the general, sometimes outdated training corpus. The result: often plausible, but very general statements that are sometimes incorrect or incomplete in terms of content.
RAG solves this problem by specifically retrieving relevant text passages from a defined knowledge base, such as product documentation, manuals, or internal FAQs, and providing them to the model as context. This results in answers that are technically sound, application-oriented, and tailored precisely to the question asked.
For example, a question such as “How do I create a new project in the system?” would be answered by a standard model with general instructions. A RAG system, on the other hand, searches the actual documentation and provides a concrete step-by-step explanation from the real application context, plus a screenshot.
Transparency and traceability
Another advantage lies in the explainability of the results. RAG can not only generate answers, but also name the sources from which the information originates. Users can view the underlying documents or links on request and verify statements directly in their original context. This traceability strengthens trust in AI responses. Trust is a key issue, especially in corporate environments where incorrect or non-transparent statements can have significant consequences.
Easy maintenance and updating
RAG systems are designed so that their knowledge base can be continuously updated. New documents can be easily added and outdated information removed. After re-indexing, i.e., converting the texts into embeddings and storing them in the vector database, the updated knowledge is available for new queries. This process can even be automated, e.g., for regularly released software or new product versions. This keeps the system up to date without having to retrain the language model itself. This is an immense efficiency advantage over classic fine-tuning methods.
Reduction of hallucinations
One of the biggest problems with generative AI is what is known as hallucination, i.e., the invention of seemingly plausible but false information. RAG significantly reduces this risk because the language model accesses real, verified texts. Instead of free association, the answer is generated based on trustworthy sources that serve as references. This not only increases the quality of the content, but also the reliability and credibility of the AI responses.
High flexibility in corporate use
Since RAG can access both structured and unstructured data, the process is suitable for a wide range of scenarios, from customer support and contract analysis to knowledge management. Companies benefit from a scalable, multilingual solution that can be easily transferred to new areas of application by simply adapting the underlying data set.
Challenges in implementing RAG
Despite its strengths, implementing retrieval-augmented generation also presents specific challenges. These lie less in the model logic itself and more in the technical fine-tuning and the correct handling of data and context variables.
The conflict of objectives in text splitting
A key issue is determining the optimal length of text splits, i.e., the text segments that are converted into vectors:
- If a split is too long, it may contain multiple topics. The resulting vector loses its selectivity, and relevant text passages are overlooked during the search.
- If, on the other hand, the split is too short, contextual connections that would be important for a meaningful response are lost.
In our practice, we have found that splits of equal length with slight overlap deliver the best results. This method ensures that semantic relationships are preserved without “watering down” the vector space. Nevertheless, the optimal length always depends on the data type: technical manuals, legal texts, and chat histories require different strategies.
Understanding the overall context
Another challenge is to preserve the contextual meaning of longer documents. One approach to this is parent document retrieval: instead of passing the individual short splits directly to the model, the parent document from which the relevant split originates is reloaded. This method is particularly well suited to structured knowledge bases, such as manuals or process descriptions.
Dealing with overly specific terms
An often overlooked stumbling block lies in the retrieval behavior for distinctive, frequently recurring terms. Terms such as product or company names (e.g., “BCS,” “Projektron,” “SAP”) are heavily weighted in vector searches. However, if the entire text corpus frequently contains these terms (because it is documentation), the search may come up empty: The system returns highly rated hits in which the product name appears multiple times, but which otherwise have little relevance to the actual question.
A proven countermeasure is the semantic rephrasing of the user's question through an additional AI query that removes superfluous proper names and focuses the user's question on the actual information need.
Data protection and security
RAG systems often work with sensitive internal data, which places special demands on data protection and information security. When external cloud models such as GPT-4 or Claude are used, no personal or confidential content may be included in the retrieval context. Two established approaches are:
- Anonymization or pseudonymization of documents before embedding, or
- the use of local language and embedding models that are operated entirely within the company's infrastructure.
Controlled data management is a key success factor, especially in an enterprise context. Only if the source of the embedded data is reliable, secure, and compliant with data protection regulations can RAG reach its full potential.
RAG at Projektron – experiences and results
The integration of retrieval-augmented generation (RAG) into productive systems demonstrates how powerful this approach can be when implemented correctly. In real-world projects, such as answering technical questions, knowledge management, or document research, RAG has proven to be a significant advancement over traditional search or FAQ systems.
More precise and context-aware answers
One of the most significant advantages lies in the accuracy of the content of the generated answers. Even if user questions do not correspond exactly to the wording of the underlying texts or only refer to partial aspects, the RAG system provides precise, comprehensible, and complete answers. The model is able to combine information from multiple sources in a meaningful way and generate a coherent, well-formulated answer instead of simply quoting text passages.
A practical example: If a user question in an FAQ database refers to several related articles, the system automatically recognizes the relevant passages, condenses them into a structured result, and formulates them in natural language. This significantly reduces the need to search through the entire original text.
Better structure and readability
If the source data has not been edited particularly carefully, as is often the case with FAQs, the output quality of the answers can be superior not only in terms of content but also in terms of structure. The generated texts are clearly structured, logically organized, and more readable than the original documents. The model arranges instructions in a logical order, highlights key points, and presents complex information in a way that remains understandable even for less specialized users.
Multilingualism and flexibility
Another advantage is native multilingualism. Responses can be output directly in other languages without the need for separate translation steps. This is a decisive efficiency gain in international companies or for globally deployed software systems.
In addition, the database can be flexibly expanded. New sources of knowledge can be easily integrated by indexing them and linking them to appropriate prompts. This enables continuous updating of the system without the need for time-consuming retraining of the language model.
Transferability to other use cases
Experience has shown that the RAG architecture can be easily transferred to other application scenarios. Whether for contract negotiations, internal knowledge databases, or support systems, the core process remains the same:
You compile the relevant data set, define a suitable system prompt, and can then build a context-sensitive AI assistance system in no time.
The future with RAG: Enterprise AI at the next level
Retrieval-Augmented Generation (RAG) will continue to be important for enterprise AI systems in the future. While current language models are rapidly evolving thanks to larger context windows and improved internal memory mechanisms, the RAG approach remains central to the targeted, transparent, and efficient use of AI systems.
Even if models will be able to process millions of tokens in context in the future, practice shows that a targeted, semantically highly relevant context leads to better results than a massively expanded input. RAG filters out precisely those pieces of information from huge amounts of data that are truly relevant, thereby reducing computing effort, response latency, and costs.
For companies, this means:
Scalability and control: Knowledge sources can be connected, maintained, and versioned in a modular manner.
Data sovereignty: Sensitive information remains in the company's own system instead of flowing into external training data.
Cost efficiency: Only relevant text excerpts are processed, which drastically reduces the token costs of modern LLMs.
RAG is thus establishing itself as a core component of a strategic AI architecture that takes into account both precision and compliance requirements. This is a decisive factor for enterprise applications in regulated industries such as IT, finance, law, and project management.
Combination with agents and tool use
Our next development step lies in the integration of RAG with agent-based frameworks and tool use systems. Agent architectures, currently discussed in research under terms such as autonomous agents or orchestrated AI, enable AI systems to be built in a modular fashion: an agent independently decides which tools, data sources, or API calls it needs to solve a task. The AI agent can then make a decision using the now hard-wired RAG query. If the user wants to know how to request vacation time in BCS, the agent will query the help via RAG. If the user wants to know how many vacation days they have left this year, the agent will query the BCS database for the user's vacation data via the tool. In both cases, the language model receives the appropriate contextual information for the response.

About the author
Dr. Marten Huisinga heads teknow GmbH, a platform for laser sheet metal cutting. In the future, AI methods will simplify the offering for amateur customers. Huisinga was one of the three founders and, until 2015, co-managing director of Projektron GmbH, for which he now works as a consultant. As DPO, he is responsible for implementing the first AI applications in order to assess the benefits of AI for BCS and Projektron GmbH.
More interesting articles on the Projektron blog

Product management at Projektron
How does software remain successful for 25 years? Projektron BCS shows that continuous updates, user feedback, and modern technologies ensure long-term success. Learn how product management works at Projektron.

Use cases for AI in BCS
Step by step, an AI ecosystem is emerging at BCS that is making everyday work noticeably easier. The article shows which use cases are already productive and which functions are still to come.

AI-Help in BCS
Since version 25.3, the new BCS AI user help has been providing precise answers to questions about Projektron documentation. The article shows how iterative optimizations in retrieval and splitting have significantly improved the quality of responses.

Choosing PM software
If your SME or company is about to choose project management software, you probably don't know where to start looking for the right PM tool for you. This guide will guide you through the PM software market and lead you to the right decision in 9 steps.









