3 min read

Retrieval Augmented Generation (RAG) In 2 Minutes

Creating "pragmatic AGI" (artificial general intelligence) with current technology is achievable without waiting for future advancements like GPT-5 or magically appearing uber-conscience from more layers in the matrix multiplication architecture of today’s ANNs (artificial neural networks).

This approach doesn’t claim to surpass human intelligence or achieve consciousness but aims to perform various tasks based on loosely phrased verbal prompts. The cornerstone of this capability within AI multi-agents is Retrieval Augmented Generation (RAG)

Do read the previous article for more details and basic definitions, and today let’s focus on the KEY factor for creating any useful AI multi-agent: RAG, or retrieval augmented generation.


Picture of a typical LLM (large language model) architecture

RAG ensures that when a user interacts with an agent, such as a chatbot or large language model, the agent first retrieves relevant context before formulating a response. This prevents the model from generating inaccurate or imaginative answers, often referred to as "hallucinations." Similar to human memory, RAG allows the agent to recall pertinent information to provide accurate and contextually appropriate responses.

For example, comparing responses from leading LLMs like GPT-4 and Claude Opus with those from a RAG-enabled multi-agent reveals significant accuracy improvements. The latter consistently delivers correct and detailed answers tailored to the specific context of the query. Implementing RAG typically involves utilizing methods such as textual or web searches. Integrail Studio simplifies this process with a "poor man’s RAG" multi-agent setup. This configuration integrates three key components: querying a designated website, converting retrieved data into markdown format for efficient processing, and utilizing a large language model to generate responses based on the acquired contextual information.

 

Answers to the question "what is ootbi"?

Ootbi is the name of the product of ObjectFirst.com — a company of our co-founder and serial entrepreneur Ratmir Timashev. As you can see, both of the top models give a likely “correct” answer, but clearly not the one we are looking for. Our RAG-enabled multi-agent on the other hand provides 100% correct and detailed response.

So, how do you enable RAG? There are many ways, with two of the most popular being:

  • Simple textual / web search
  • Conversion of the text into vector representation via so called embeddings and then vector search on top of it

Today, we will look at the first one, being the simplest, and in the next parts will look into building a much more efficient vector search RAG in detail.

Here is how easy it is to create what we call a “poor man’s RAG” multi-agent using Integrail Studio:

Scheme of the “Poor Man’s RAG” multi-agent

All you need to do is to connect three “boxes” following very simple logic:

  • User’s request is being searched for on the given website address, in this case https://objectfirst.com (it can be YOUR website, or your knowledge base, or anything else you like)
  • Resulting page is “read” and converted to markdown format from html (making sure we use much fewer tokens while keeping the quality up)
  • The results are combined in the LLM (large language model) box, using any of the cost-efficient models with significant context size (e.g., Claude of the 2nd generation) — we simply ask the LLM to give the answer to the user’s request using the context we just “read”

That’s it, no need to write any code or come up with expensive infrastructure — you just created your first useful multi-agent!

For it to be useful in most of the real-world scenarios we have to further improve it. For instance, not every user’s request translates well into a search query — so we’ll need to build an additional “converter” for it. Another improvement would be to analyze several found pages, not just one.

We will look at how to create such a production-ready multi-agent next time, for now, we encourage you to follow this publication (Superstring Theory) or me (Anton Antich) as an author — for everyone who starts following before the public launch of the Integrail Studio in May 2024 will receive $10 worth of token credits — more than enough to run lots of fun experiments of your own.

In the next installments, we’ll cover increasingly more complex agents that are not just able to have a conversation with you but can do stuff on your behalf — such as send an email, schedule a meeting, write a summary of the news articles interesting for you, etc etc. The journey couldn’t be more exciting — join us!

Anatomy of an AI Multi-Agent

Anatomy of an AI Multi-Agent

In the previous parts of our AI Multi-Agent series, we explored why ChatGPT alone does not constitute true AI and how to build a simple “Poor Man’s...

Read More
How Do Large Language Models (LLMs) Read Images?

How Do Large Language Models (LLMs) Read Images?

Large Language Models (LLMs) like GPT-4 have changed the way we think about artificial intelligence. While they started by handling text, they’ve now...

Read More
When Was AI Invented?

When Was AI Invented?

Artificial Intelligence (AI) is no longer confined to science fiction. Today, it is a crucial part of our daily lives, but when did this...

Read More