12/03/2025 - Articles

An overview of the basics and development approaches of AI

Since the end of 2023, we at Projektron GmbH have been exploring the possibilities of AI—how can we make this technology usable for our ERP and project management software BCS? In a development project, we familiarized ourselves with the technology and worked out some basics for possible product applications.

This article provides insight into the technology behind the current leading AI language models. We will focus in particular on the Transformer architecture, which enables machines to understand language accurately and generate text. These models can be customized to specific requirements through adjustments and training—for example, as an assistant in BCS. In a follow-up article, we will show how we developed the AI assistant and other applications and how they enrich our software.

The Transformer Architecture

AI Alignment

Customization of AI for BCS

Potential and versatility of RAG for Projektron

Die Transformer-Architektur: Aufmerksamkeit ist alles

Transformer architecture is at the heart of modern language models such as ChatGPT. First introduced in 2017, it has revolutionized language processing. Unlike older models that processed information sequentially, transformer technology allows parallel analysis of the entire input—and that's what makes it so efficient and powerful.

The Transformer architecture was initially introduced in 2017 by a group of Google employees in a publication entitled “Attention Is All You Need” (source: https://arxiv.org/abs/1706.03762).

What makes this architecture special is the attention mechanism. This calculates which parts of a text are particularly important for understanding the context of a word. With the help of mathematical operations such as vector calculus, each input is converted into a machine-readable form in a complex process.

As the following graphic shows, the attention mechanism occurs at three points in the architecture. This makes it possible to process the entire input in parallel. Previous models (such as LSTM) processed the input sequentially, which made it more difficult to map the context over a longer sequence.

The transformer architecture uses the attention mechanism to identify and process relevant parts of a text, enabling parallel analysis.

From token to issue: the process in detail

Step 1: Tokenization – Breaking down the input into processable units

In the first step, the user input is “tokenized.” This involves breaking down the input text into smaller units, known as tokens. A token can be a word, part of a word, or even a single character, depending on the tokenization method selected.

Step 2: Vectorization – How words are represented mathematically

The attention mechanism is essentially based on vector and matrix calculations. After the text has been divided into tokens, each of these tokens is translated into a vector (“vector embeddings”). This gives the model a numerical representation of the text, which serves as the basis for processing and calculating the relationships.

Which vector a token represents is determined by analyzing large amounts of text. The vector of a word is determined by analyzing the neighboring words. For example, one can imagine that synonyms such as “cat” and “house cat” result in very similar vectors, as they occur largely in the same word environments. “House cat” will occur more frequently in domestic contexts, so the vector will be somewhat more “domestic.” The vectors used by GPT 4 have 512 entries (“dimensions”).

Step 3: Position coding – Understanding the order of words

Since the Transformer architecture processes the entire user input in one step, no information about the order of the tokens is available per se. To enable the model to understand which tokens are related to which, positional information is added to each token. This positional encoding ensures that the model can understand not only the meaning of individual tokens, but also their order and relationships to each other. This is important because the meaning of a sentence depends heavily on word order.

Step 4: Attention mechanism – How the context of a word is calculated

You may remember from math class that vectors can be multiplied. The result of this scalar product is a real number (not a vector), and the larger this number is, the more similar the vectors are (the vectors are normalized beforehand).

Put simply, the attention mechanism calculates the significance of a word for all words in the text by multiplying the corresponding vectors. A high attention score means that the words are significant to each other. The larger the scalar product of the vectors of two words, the stronger their connection. This step makes it possible to map meaningful relationships and connections between words.

The Transformer contains three attention mechanisms, each with a slightly different focus, as shown in the graphic. The third mechanism ultimately generates a vector from which the probability of being the next token in the response sequence is calculated for all tokens in the last “linear” layer of the transformer. The number of neurons in the last layer of the transformer is therefore equal to the number of tokens the machine works with. The following image shows the process.

Schematic representation of a transformer language model with encoder and decoder that processes a text input and generates new words step by step based on probabilities until the end symbol is reached.

The model's output is ultimately generated by repeatedly selecting the most probable tokens until the end-of-sequence token (<EOS>) is reached. This iterative process results in a text sequence that reads as if it were written by a human being.

All blog articles on the main topic of AI: AI knowledge (1-4) and AI at Projektron (5-8)

Tokenization in AI

Tokenization breaks down texts into manageable building blocks, thereby determining the performance of AI.

Vectorization in AI

The basis for semantic search and modern language models: How words become numerical vectors (embeddings).

Attention in AI

Transformers such as BERT and GPT rely on the attention mechanism—a principle that recognizes which words in a sentence really matter.

RAG in AI

Retrieval-augmented generation combines language models with external knowledge sources to generate more accurate, up-to-date, and verifiable AI responses.

AI basics

In a development project in 2023, we laid the initial foundations for the targeted integration of AI into BCS.

AI framework for BCS

Projektron is developing a flexible AI framework that can be operated locally and meets the highest standards of precision, data protection, and transparency.

AI help in BCS

Since version 25.3, the new BCS AI help function has been providing precise answers to questions about Projektron documentation.

Use cases for AI in BCS

Step by step, an AI ecosystem is emerging in BCS that is making everyday work noticeably easier.

Sampling: Creativity through controlled chance

When generating text with language models such as Transformer, it makes sense to always select the token with the highest probability and append it to the existing sequence. However, this deterministic approach often leads to monotonous, repetitive, or even boring results. To make the generated text more creative and versatile, a technique known as sampling is used instead.

Sampling adds a random component to the token selection. Instead of strictly choosing the most probable token, a dice roll is made from a group of the most probable tokens (often referred to as the Top-k method) to determine which one is actually appended. This results in texts that are less predictable and repetitive. When the same question is asked several times, the answers may vary.

An explanation can be found here: https://huggingface.co/blog/how-to-generate

The sampling process allows for a random selection of tokens from the most likely options, resulting in more diverse and less repetitive results.

By applying sampling, the output is no longer deterministic.

Example of an AI response: ChatGPT provides correct facts but falsely describes a conscious thought process. The actual response generation is based on probability calculations.

The results of this technology are quite astonishing. It produces texts that sound as if they were written by humans. But we must always bear in mind that the machine lives in a self-referential, hermetic token world. The meaning of words is represented exclusively by other words, not by reference to any objects in the real world.

This becomes clear in the following conversation with ChatGPT: The question about prime numbers is answered correctly, but the question about the calculation method is not. Instead of saying that it has strung tokens together based on probability studies of large amounts of text, ChatGPT claims that it has gone through a mental process and used a definition. However, this is not the case; the architecture is not capable of doing so.

AI alignment: The path to reliable behavior

AI alignment deals with the question of how AI can be aligned with desired behavior. Risks such as bias or misinformation should be minimized. Language models such as current AI systems are trained with enormous amounts of text, including data from the internet. However, this training data often contains distortions and incorrect information. Language models are able to abstract from the examples available, recognize patterns, and apply them to new situations. At the same time, the biases or false assumptions contained in the training data flow unnoticed into the results.

Two examples illustrate this problem:

The issue gained prominence when cases such as Amazon's automatic application evaluation system became known. The system was trained using past decisions – and Amazon had mainly hired men. Amazon's AI then systematically disadvantaged women, even though the application texts did not contain any explicit information about gender. It was enough for the AI to be able to infer gender indirectly from hobbies, sports, or the type of social engagement. Alignment was not successful in this case, and use of the system was largely discontinued.
In contrast, Google developers were a little too successful in their alignment efforts to give higher weighting to members of minorities considered disadvantaged. Google AI-generated images of black popes are well known, and some early US presidents were also depicted with dark skin.

This presents a major challenge: alignment raises not only technical questions, but also ethical ones:

Clarification of objectives: Before an AI system can be aligned, it must be defined what behavior is considered “desirable.” This definition is often subjective and dependent on social norms and contexts.
Adherence to objectives: Even if a behavioral objective is clear, the technical challenge remains of designing a system that fulfills this objective permanently and consistently.

Many methods have been developed for this purpose. An overview can be found in Shen et al: Large Language Model Alignment: A Survey https://arxiv.org/pdf/2309.15025. The following three images were taken from this source.

A taxonomy of the key topics in large language model alignment: outer alignment, inner alignment, interpretability, protection against attacks, and evaluation.

Detailed breakdown of the external alignment of large language models. The presentation includes methods and challenges for ensuring the correct realization of given targets.

The central topics of alignment can be divided into five major areas:

External alignment: The ability of AI to correctly implement specified goals.
Internal alignment: The consistency between a model's intentions and its actual internal processes.
Interpretability: The ability to make AI decisions and processes comprehensible to humans.
Adversarial attacks: The protection of AI against manipulation.
Evaluation: The assessment of how well a system meets the defined goals.

These five major topics are further divided into a larger number of subtopics. I will only show this here for external alignment to give an impression of the scope of work in this field.

Illustration of the Debate method for external alignment: Two AI systems discuss and evaluate each other's statements in several rounds to verify alignment criteria.

Example: The “Debate” method

The example on the right shows the “Debate” method. Here, two AI systems compete against each other and evaluate each other's views in several rounds. The goal is to reach agreement on whether an output meets the alignment criteria, or at least to generate statements that are easy for humans to verify (e.g., ‘yes’ or “no”).

Customization of AI for BCS

Pre-trained language models such as GPT4, Gemma, and Llama come with language comprehension and general training knowledge. In order to use these models for your own application, they must be able to handle company-specific specialist knowledge. The following diagram shows possible ways to customize them. Completely new training from scratch is usually ruled out due to the costs involved. Prompt engineering, fine-tuning, and RAG are practical options for medium-sized companies.

Pre-training

As a rule, a generally applicable “pre-trained LLM” is used.

However, it is also possible to train entirely with your own data:

Advantage: complete control

Disadvantage: extreme effort

Prompt engineering: Instruction for the LLM in the system prompt

Fast, low effort, easily customizable, little individual data

Fine-tuning

A pre-trained LLM is retrained with your own data sets

Retrieval Augmented Generation (RAG)

Two-stage, semantically relevant information is passed to the LLM as context, knowledge is dynamically adaptable, up-to-date, little invented

Fine-tuning: Limitations and challenges

Our attempts at fine-tuning were less successful. Fine-tuning is performed using small, specific data sets from a specific knowledge domain. The aim is to convert a general-purpose model into a special model. The model is “re-trained,” i.e., the internal weights in the neural network are changed. This is the same procedure as in the initial training. A new model is created.

In our experiments to date, fine-tuning has impaired the general capabilities of the models without building sufficient specific capabilities. The models understood that they had learned something. However, they provided useless, generic answers, mainly based on their general prior knowledge from training. General control questions were answered less well after fine-tuning.

We see the cumbersome adaptation to new data sets as a significant disadvantage of fine-tuning: for a help assistant, the model would have to be retrained with each new version. An agile workflow, as established in Projektron's documentation, with one help release per week, is therefore practically impossible to implement. For a RAG-based application, on the other hand, this is not a problem. We have therefore not invested any further effort in fine-tuning for the time being.

Prompt Engineering – Flexible thanks to intelligent inputs

The user interacts with the language model via prompts. Inputs are made in natural language rather than in a special programming language that would first have to be learned. In general, a prompt can contain the following elements:

Input	Task or question that the model is supposed to answer (query). This part of the prompt is always visible to the user.
System	Describes how the model should solve the task. The system prompt usually remains invisible to the user.
Context	Additional external material that is used to solve the task, e.g., previous conversation in chat mode.
Output	Format or language requirements.

All of this is passed on to the language model for processing.

In addition to the actual user input, this includes instructions and additional information for the AI on how to solve a task. Prompt engineering deals with the question of how these prompts can best be designed and optimized. Many special techniques have now been developed to help you write a good prompt for a specific question. Detailed instructions can be found online, e.g.: https://www.promptingguide.ai or platform.openai.com/docs/guides/prompt-engineering

Through targeted instructions in the system prompt, a general AI can be used for specific tasks, such as summarizing, keyword tagging, or anonymization. These instructions remain hidden from the user. In an application for “keyword tagging,” for example, a corresponding system prompt is generated in the background that defines the type of keyword tagging and possibly specifies a list of keywords. This prompt is transmitted to the AI together with the user input text.

The following examples show how these instructions to the machine affect the output. The question is always the same: “Who was Richard Wagner?” The model's answer clearly depends heavily on the system prompt.

Visualization of AI outputs under the influence of different system prompts: Different answers to the same question, depending on the instructions defined in the background.

In addition to content specifications, requirements for output language and formatting also have an impact, as the following example shows. Tools trained accordingly are multilingual. The output is generated directly in the foreign language; no translation takes place.

Illustration of the effect of language and format specifications on AI-generated content, with an example of direct output in a foreign language.

Prompt engineering is usually an iterative process consisting of several rounds of prompt modifications and testing. The design of the prompt will depend on the language model. Examples (“shots”) for the task to be solved usually improve the result (“few shot prompt”). The more “intelligent” the model is, the fewer examples are needed. New large models then only need a description of the task without an example (“zero shot prompt”).

It is also often helpful to provide the model with an example solution (“chain of thought prompting”). The following graphic shows an example comparison between standard prompting and chain-of-thought prompting. Here, too, new large models usually only need the pure instruction, without a chain of thought. It is clear that with the advancement of language models, some techniques are becoming obsolete.

Comparison diagram between standard prompting and chain-of-thought prompting, showing how the thought chain leads to clearer and step-by-step answers in complex tasks.

Other sources for methods and techniques include: https://www.promptingguide.ai or platform.openai.com/docs/guides/prompt-engineering

Based on our experience with prompts, it works well to have an AI generate the first draft of the prompt: briefly describe what you want the prompt to do as input, and then let a tool such as ChatGPT create the prompt.

It is important to think as precisely, clearly, and concretely as possible about what result you expect. If something else comes out, on closer inspection the AI's response often turns out to be a good and direct implementation of the prompt – it's just that the prompt did not correctly reflect our requirements. You get what you specify, not what you want. That's why testing and gradual improvement are important: prompting is an iterative process.

Prompts in English are better understood. I already mentioned the advantage of the English language in tokenization. “Don't” instructions in the system prompt are less well understood than positive formulations of what is desired.

Retrieval-Augmented Generation (RAG) – Dynamic knowledge utilization in real time

RAG stands for “Retrieval Augmented Generation.” In addition to the question, the native language model receives selected texts as “context information” that it can use to answer the question. The system prompt controls how the material is handled.

The process is two-stage. In the indexing phase, a vector index is generated from a data set. In one of our use cases at Projektron, the data set consists of approximately 2,500 HTML documents from the BCS software help. In the inference phase, the user asks their question, which is converted into a query vector, and then suitable documents (the contextual information) are identified. The question, system prompt, and context are sent to the language model for processing.

This is the phase in which the application takes place.

Visualization of the two-phase process for document retrieval with dense embeddings, consisting of pre-selection and detailed analysis.

Indexing phase

The documents are split in the first step. There are various methods for doing this, e.g., splits with a fixed length, separation at certain formatting characters such as double line breaks, or according to semantics.

The text splitter recommended in the literature for general text is “recursive splitting by character.” The splitter is parameterized by a list of characters, with the default setting being [“\n\n”, “\n”, “ ‘, ’”]. The splitter divides the text at the separators in the list, i.e., first at double line breaks. If this results in splits that are too long, the splitter divides at single line breaks. This process continues until the pieces are small enough. The aim is to keep paragraphs, then sentences, then words together, as these are generally the semantically related pieces of text.

Graphic showing text splitting into splits for retrieval methods, with a focus on constant lengths and overlap for semantic coding.

In our experience, this method is not ideally suited to semantically similar, highly structured and pre-structured texts such as software help files. In such cases, it is probably best to simply create splits of constant length with a certain amount of overlap, as shown in the graphic on the right.

A good overview of different splitting methods can be found in the source for the image.

The splits (also called “chunks”) are “embedded” in the second step: Each split is assigned a vector that encodes the semantics. A text embedding such as text-embedding-ada-002 (from Open AI) or BAAI/bge-m3 (locally installable) is used. Good splitting is crucial for embedding. The splits must not be too long, otherwise it would not be possible to assign a selective vector. They must also not be too short, so that the meaning can be reasonably understood.

In the third step, the vector database (e.g., FAISS) takes the vectors and uses them to build a vector index. The vector database specializes in finding a specified number of nearest neighbor vectors for a query vector from a very large number of vectors. This completes the first phase.

Inference phase

Screenshot of a user question answered with information from two FAQs, including the source reference at the end of the text.

When the user enters a question, it is converted into a vector by the same text embedding. The vector database returns the vectors closest to this vector that belong to text splits with the same semantics as the question. These text splits should therefore contain information that answers the questions. They are passed on to the language model for processing together with the question and the system prompt. The answer is then output to the user.

Here is an example of an AI application that answers user questions based on a collection of FAQs:

The text corpus consists of several hundred heterogeneous FAQ documents written by different authors. Unlike the help texts, there is no uniform style. The individual FAQs have been repeatedly supplemented and have grown over time, but have been consolidated to a limited extent. Some topics are covered in several FAQs.

The screenshot shows a user question that is answered based on two FAQs. The system prompt contains the instruction to list the sources used at the end of the text.

The result

The FAQ query provides good and useful answers, even if the questions differ from the wording of the sources, concern only one aspect of the answer, or even several retrieval texts.

The answer fits the question exactly; you don't have to extract the information from the entire text.

The order of the instructions is better than in the original, and the information is presented in a more structured way.

The texts are also better formulated.

GPT4 was used as the language model in this experiment.

Another advantage is that the output can also be in foreign languages without the need for translation. The database can be easilyexpanded with new knowledge.

Conclusion: Potential and versatility of RAG for Projektron

Our positive experiences with Retrieval-Augmented Generation (RAG) have convinced us to use this method in our first main application, AI software help. Once the technology is working successfully in one application, it can easily be transferred to other areas such as contract negotiations. The important thing here is to create the right data set and develop a suitable system prompt.

Further details on the development of the software help and the underlying AI frameworks will follow in our next blog article. In addition, we will take a closer look at RAG technology in an upcoming article, as it plays a central role in our applications.

About the author

Dr. Marten Huisinga heads teknow GmbH, a platform for laser sheet metal cutting. In the future, AI methods will simplify the offering for amateur customers. Huisinga was one of the three founders and, until 2015, co-managing director of Projektron GmbH, for which he now works as a consultant. As DPO, he is responsible for implementing the first AI applications in order to assess the benefits of AI for BCS and Projektron GmbH.

Product management at Projektron

How does software remain successful for 25 years? Projektron BCS shows that continuous updates, user feedback, and modern technologies ensure long-term success. Learn how product management works at Projektron.

Use cases for AI in BCS

Step by step, an AI ecosystem is emerging at BCS that is making everyday work noticeably easier. The article shows which use cases are already productive and which functions are still to come.

AI-Help in BCS

Since version 25.3, the new BCS AI user help has been providing precise answers to questions about Projektron documentation. The article shows how iterative optimizations in retrieval and splitting have significantly improved the quality of responses.

Choosing PM software

If your SME or company is about to choose project management software, you probably don't know where to start looking for the right PM tool for you. This guide will guide you through the PM software market and lead you to the right decision in 9 steps.

How can artificial intelligence (AI) help make Projektron BCS even more powerful and intuitive? This exciting question has been driving us since the end of 2023. In a specially initiated development project, Projektron has laid the initial foundations for the targeted integration of AI into our software.

An overview of the basics and development approaches of AI

Contents

Die Transformer-Architektur: Aufmerksamkeit ist alles

From token to issue: the process in detail

Step 1: Tokenization – Breaking down the input into processable units

Step 2: Vectorization – How words are represented mathematically

Step 3: Position coding – Understanding the order of words

Step 4: Attention mechanism – How the context of a word is calculated

All blog articles on the main topic of AI: AI knowledge (1-4) and AI at Projektron (5-8)

Tokenization in AI

Vectorization in AI

Attention in AI

RAG in AI

AI basics

AI framework for BCS

AI help in BCS

Use cases for AI in BCS

Sampling: Creativity through controlled chance

AI alignment: The path to reliable behavior

Example: The “Debate” method

Customization of AI for BCS

Fine-tuning: Limitations and challenges

Prompt Engineering – Flexible thanks to intelligent inputs

Retrieval-Augmented Generation (RAG) – Dynamic knowledge utilization in real time

Indexing phase

Inference phase

The result

Conclusion: Potential and versatility of RAG for Projektron

About the author

More interesting articles on the Projektron blog

Product management at Projektron

Use cases for AI in BCS

AI-Help in BCS

Choosing PM software

Project management software

Project management methods

Work organisation

Invoicing and CRM

More functions