A 43 minute presentation and discussion on TrueRAG. A “Discover TrueRAG” webinar. Three of us from AI On Cloud – Nikos, who is the co-founder and he going to provide the principles of TrueRAG and its benefits and Guy, our CTO and lead developer of the TrueRAG product, is going to outline the way that TrueRAG actually works and why you can believe the answers that he gives you when TrueRAG is deployed after about 10 to 15 minutes of slideware.
Transcript
Good afternoon and welcome to our Discover True RAG webinar. There’s three of us, which are the only three pictures or live feeds up on the screen. Three of us from AI on Cloud involved. Myself, the CRO for AI on Cloud and for the purposes of today, the moderator and scene setter. Nikos, who is the co-founder of AI on Cloud, is going to provide the principles of True RAG and its benefits. And Guy, the CTO and lead developer of the True RAG product. He’s going to outline the way that True RAG actually works and why you can believe the answers that Gen AI gives you when True RAG is deployed. After probably about 10, probably 15 minutes of slideware, we will then be jumping into the demos. So it’s going to be a fairly quick session, but hopefully an interesting and interactive one.
So the agenda, a simple one:
First, we confirm our real world experiences of RAG, why it’s essential to actually use RAG in many implementations of Gen AI, but why also it does not provide all of the solution. I’ll let Nikos go into the background of that in a while.
Then we explain the why, the how, the wherefore of True RAG. This will be both via slideware, but also live demonstration.
And then finally, we discuss next steps and answer any of your questions.
So hopefully we’re all ready, all of us are ready for some truthful AI. Very quickly, and this is going to be the five second overview of AI on cloud. We’re a consultancy focused on business implementations of Gen AI into enterprises. We cover the full gamut of AI, Gen AI, developing roadmaps for companies, suggesting the ripest areas of your business for AI implementations and help you implement Gen AI applications and models for the best ROI for your business. But today, let’s just focus on True RAG.
So in order for business, any business, but especially complex businesses with high levels of governance or regulation to adopt AI, they must trust its accuracy and truthfulness. Many of you, I’m sure, have played with Gen AI and you will have heard, maybe even experienced hallucinations. The problem with hallucinations is that maybe 90% of the time your Gen AI application will give the right answer. However, the 10% time when it doesn’t, but you don’t realize it’s hallucinating, will bring your whole solution down, bring the trust of your AI applications down to the point of being useless.
So LLMs, if they don’t know the answer to a question you ask, make assumptions. And we all know what assumptions make. LLMs will guess what they think the answer should be. And most importantly, they will not provide you with any indication they have guessed or made assumptions. So a hallucination is answered to you from your application in exactly the same manner as a correct answer. There’s been many examples of this out in the world. You only need to do a Google search for AI hallucinations. You’ll see some examples in the legal industry and many other places. Trusting and hallucination will destroy all the benefits of AI in your business and take a long time to recover from.
So to expand further on this, I’m going to hand over to Nikos to delve further.
Thank you, Neil, and good morning, good afternoon, everyone. On the assumption that you can all hear me. So, yes, on the back of what Neil said, we all get a good feeling these days that Genitive AI, AI and especially in the Genitive AI model, its adoption is holding the key to achieving business operation benefitswhich impact the bottom line. And also new revenue streams and impact on the top line through innovation adoption and new services through new products. So AI is the way forward in terms of adoption, in terms of unlocking the next wave of doing business and living the way we live as humans. But like every technology advancement, it has problems, at least in the initial implementations, but always we have trade-offs.
And in the current technology implementation of transformer models and AI, generative AI, we have several high-level issues that we need to address. The first one is what we all know as cut-off in terms of knowledge. The knowledge that needs to be acquired after the end of the training of the model we’re using isn’t going to be present. And somehow the AI generative model, we need to learn about this knowledge in order to use it in its answers. It could also be the case that we have company or industry-specific knowledge that the AI was never trained or never seen. So somehow that piece of information has to be provided to the model. And finally, we have most definitely legacy sources of information in our environment that the AI would definitely have not seen unless we purposely trained it from the beginning in our environment. This is information that’s sitting in legacy sources like an ERP or a CRM environment.
So in order to overcome these issues, we have come up with the retrieval augmented generation model of feeding this information into transformer models, into AI models. The current technology instantiation that more or less everybody’s using is vector databases, which effectively is a technology that for those of you that have been involved with takes objects in the real world, be it video, language, or audio, and translates them into vectors, which then stores them into multidimensional vector databases that are then retrieved and correlated using matrix multiplications very fast in GPUs.
So this is the way that we as an agency have come up with some answers to the issues of new knowledge, industry-specific knowledge, and specialist knowledge to feed into the AI to give us answers. But again, like every technology, there’s always shortcomings.
Some highlights of the shortcomings that are relevant in the current RAG implementations are:
We might get the wrong information back, either because the information has not been transcoded into vectors properly or accurately. Therefore, the vector match has happened in the wrong way.
It could be our prompting is wrong, so we never retrieved the right piece of information to feed into the model.
It could be that we retrieved too much and the important piece of information is hiding somewhere in the middle of a very large chunk of text that the AI is going to ignore.
The technology itself has a couple of issues. First of all, it’s not very easy to update vector databases. What happens if one small piece of the text of a document that we have vectorized changes? Do you change the whole document? Do you have enough metadata to change only the piece that has changed? And what happens if the chunking has changed? Because the document size has now changed. Are your metadata mapping to the right chunk of your document, your vector database?
Finally, being a new technology, it is also not clear how RAG scales when the volume of information increases. So that’s something that we need to be worried about.
So on the back of these concerns and trying to improve what we have in hand, there are several things we can do, several technology advancements we can apply, ranging from:
Multi-query, which is allowing the AI model itself to take our query and create new relevant ones so we can triangulate the information in the vector database better.
Re-ranking, where we avoid the issue of the critical piece of information being in the middle of a large chunk of text, bubbling it to the top so that the AI model can see it and use it.
Semantic chunking, keeping the relevant informationtogether. Therefore, it’s vectorized properly. Therefore, it’s retrieved better, which probably therefore is retrieved better, all the way up to embedding model fine-tuning, which effectively allows the embedding all the way up to embedding model. engine that creates the vectors to better understand the context of the information that is vectorizing. Therefore, it can get more clever and better in vectorizing that information and retrieving it more appropriately, giving it context of the environment that it’s working in to create the vector objects from the real-world objects.
But all these things don’t come for free. They imply time to implement complexity and effort. Things have to be coded, have to be implemented. And even though we have done all these things, we might still fail. We might still get the wrong answers for various reasons, being too much complexity. We, you know, we again provide too much information, whether the LEM is ignoring it, et cetera, et cetera, or the chunking has failed.
So you can either spend all this time and effort and complexity, or you can deploy through RAG. And RAG is a solution based, it’s a solution, it’s not something that you have to code, so you can get to market faster and much more securely in terms of getting it to work, basically from the get-go. And it’s also very scalable and secure because it’s created out of components of public cloud services. So inherently it inherits the scalability and the security of public cloud components. So you don’t have to worry or consider it as an afterthought. And it helps by providing the large language models with the correct information for them to work in and help it avoid hallucinations.
Now, how TrueRAG does that is something that Guy will tell us and handing over to you, Guy.
Thank you, Neil. Thank you, Nico. Sorry, it’s very good that you can find a way to talk badly about RAG. We had so many success stories with RAG and it’s working many times perfectly and after those improvements. But we wanted to get to a better position and this is why we focused on the few use cases that RAG couldn’t do its job. And we narrowed it down to the vector database statistical model.
So the main problem is that if you take your documents, whatever magic you do on them, but at the end of the day, you rely on some kind of a vague multidimensional, high dimension of vectors to be retrieved correctly to an unknown question of a user, it will fail. And it will fail too often for you to risk it. And this is why the core of TrueRAG is we take the data and put it inside of a graph database, more specifically a simple tree database in this kind of a hierarchy.
The hierarchy fits nicely to the concept of an enterprise that is known today, mainly the active directory. So you can say that my data is related to the way that my organization is structured all the way on the locations of the people, the people themselves, the product that I’m selling, whatever are the things that are relevant to the business, to the policies, we help you build this kind of a scheme. I’m going to say we, it’s the AI and the team together. This is an automatic process that we can validate and you as your administrator can validate that the structure, the schema of the graph database is correct and it’s something that you can manage.
Then all the documents are translated into this hierarchy. And every node will have only the relevant piece of this information. And because it’s a tree, you don’t have to populate the whole tree because you can have some of the general, more general pieces of information higher in the hierarchy. So we, for example, a state doesn’t have a policy. The country will have the policy for the state and it will default it to it. And if the country doesn’t have it, the organization might have it and so on.
So this kind of a hierarchy allows you to manage effectively, this is why we have trees, your data. It is all built in the cloud using managed services. So all the issues about scale and security and availability, all those are takencare of and you can really focus on managing your data, your queries, your users. At the end of the day, and we’ll see the demo, the AI will not hallucinate because it will never see a long chunk of unrelevant documents that it needs to find a needle in the haystack or whatever is the problem of the hallucination. It will see only a fine-grained piece of data and it will generate the answer based on that. Therefore, the answer will be much more reliable. It can be audited. We’ll see the audit trail and it can be controlled. So the administrator, if there is a hallucination, again, there is a mistake, you can in a very single step change it and fix it to prevent future hallucination.
The update is important. It’s not building the documents and Nikos mentioned the problems in updating documents. How do you delete your documents? How do you chunk the new document and put it into the vector database without or with some adjustment to the retrieval process? So at the end of the day, the ability of the administrator of the organization to control, update, view the data is critical to guarantee the truthfulness of the RAG system.
And we’ll see a couple of demos. Nikos will present the regular RAG. I will present the true RAG and I hope it will give you the benefits of this new approach. So I will stop sharing and Nikos, over to you for your part of the demonstration. Thank you.
Can you see my screen? I can. So I’m assuming everyone. Okay, so I’ll run very quickly through mine because obviously we all want to see true RAG. But just to show to people in life what happens with simple RAG.
So basically what we have here, and that’s the same information that Guy would use for his demo, is an addendum to the shipping instructions by UPS to people that want to ship through UPS spirits in the US, in the United States. It’s a short document, only two pages long.
And two things that I want to draw the attention of people on:
The first one is that we’ll see below there is a list of states. If somebody wants to send spirits to a state that is not in the table below, you cannot. It has to be one of the states below.
The other thing that tricks AI and bragging in this case, is that this table, I mean, we know that RAG has problems, you know, we know we have problems ragging information that is based on tables. But even worse, in this case, what really matters is the destination state.
So we have destination states. And what really matters is whether we have yes or no at the destination state. It doesn’t matter where it gets shipped from. Of course, it matters in the sense that you cannot ship from a state that is not in this table. But you can also, even if you ship from a state that is in this table, it doesn’t matter where you ship from. It matters to have a yes if we can do the operation or a no in the actual receiving state.
So these are the two pieces of information in terms of reference.
So what I have here is a very simple piece of code that it uses this document to ask a couple of simple questions. So if we can set up the imports and very quickly the background so we can make it nice looking.
So yeah, so what we’re going to set up is an environment where we’re going to use Cloud 3.5 Sonnet, the latest version, which is exactly what we’re going to use for the true RAG demonstration. An embedding model that has been fine tuned for querying. And yes, let’s get on with it. And Chrome as a database, as a vector database.
So we set up the helper function and the embedding function to ingest the document. Which we now have just ingested. So the document has been ingested. Set up the LLM and the helper functions to do the retrieval. What we have here as a query is basically onethat says to the LLM, given the context I’m going to provide to you from the document. And the question, give me the answers. If you don’t know, don’t try to make it up. And use chain of thought to validate your thinking as you go along. So we set this one up as well. And I have there a query in place.
And then we start the first question: can I send vodka, which is a spirit from Spokane to Seattle, both in the state of Washington? And as we can see here, Washington is in the state and it is an interstate transfer as either distillery or retailer. And it’s a yes into both. So the answer we should expect back is yes, you can. Let’s see what it says.
So we retrieve some data. The first thing that will alarm me is that there’s nothing about Washington here. So obviously the retrieval has missed retrieving Washington related information. And because of that, I suspect it will come back with hopefully vague answer. And lo and behold, even though it’s gone to some very detailed chain of thought thinking, the fact that it never got back from the RAG retrieval anything about Washington, it comes back with I don’t know answer, which effectively has shown that it has kind of not done the right thing.
Let’s get another one. Let’s say can a new interstate, let’s do another retrieval. Let’s ask it if I can send, as a retailer, do an interstate shipment from New Jersey to Nebraska. And if we go into the catalogue, New Jersey doesn’t matter where it comes from, as I said before. Nebraska, as we can see here, has a yes all across the board. So the answer should be yes, you can. Let’s see what it does.
Again, we don’t see Nebraska here. And as a result, I suspect it will fail. And lo and behold, because it didn’t get Nebraska back, it doesn’t know what the laws are. So two examples where even though the information is in the document, the retrieval model, the RAG model did not retrieve it properly. And it didn’t fit it downstream to the LLM. Therefore we got the wrong answer, even though the LLM did its best to give us the right answer.
So this is the demo. This is where a simple RAG exercise has failed. And of course I said we can improve this, but then we’ll have to start spending time in improving it. And I’m going to hand over now to Guy for him to tell us and show us how through RAG would have dealt with a similar situation.
I think there’s a question regarding the graph RAG. So I think that we can talk about it shortly now, but more in length at the end when we do QA. Because I think that this is something that, as I mentioned before, we are talking about moving from a vector database to a graph database. And this is exactly the premise of the graph RAG. But the highlight is that this is a much simpler, more manageable way of graph. And you will see in a second why. But yes, this is the direction that we are pushing. Vector databases are great, but only when you care about the semantic similarities. If you need more precise data retrieval, you need to have a different method. And graph RAG is one and through RAG is another. I hope it’s the start of the answer.
Let me maybe quickly demo using this kind of a simple administration interface that we have to the system. Because at the end of the day, it’s very hard. Nikos showed this kind of a behind the scene using a Jupyter Notebook of what are the documents and then what is the answer. But it’s usually hard to see. So this administration interface might give us a hint regarding where we are going.
So we are here in this interface. If you see some question for the administrator to test and see, because sometimes they want to validate, you see that we have a different question. One of the things that the LLM will do outof the box and you don’t need to work hard on is to translate Spokane or Seattle to Washington, as we saw in Nikos’ demo. It’s for Kane or Seattle to Washington. As we saw in the Nicholas demo, the LLM knows this kind of hierarchy. And so you can decide what is the level of hierarchy that you want to have your data in and what you trust the LLM to translate reliably. You can also support multiple languages. You can do it in Spanish, in French.
So again, if you are operating in any country that speaks other languages than English, you don’t need to change your data. You don’t need to change the interface. The system will support it out of the box.
So here we have this kind of a question that I can check the policy. If I can maybe buy online, again, I don’t need to say the word retailer. I can say online and it will be translated by the LLM to the same concept and ship it to New York City. I don’t need to say New York. I can say a city in New York, like New York City, which is easy.
And translation, bud light to a beer and whiskey to a spirit and all those were done automatically for us. But now we get to the data. Why did you say no? So we can see the simple answer, but you can also see the logic of the agent. I won’t read through it because it might be tedious. The most important part of it is the policy lookup. That is, what is the information that the model saw when it answered this question?
And here we can see that we have pieces of the document. So the spirit document is only one of the alcohol policy. There are similar to wines and to spirits and beers. And here we see that there are two levels:
One level is the country level. That is, what is the policy for the US? This will be applied if you don’t have a row in this table that you saw there.
There is also now one for the state. If you decided to override the default policy of the country.
And as I mentioned, you can have something with an organization or North America and the region. So you can have a very complex hierarchy. I’m going to see it in a second.
And because the model only saw those five pieces of policies and it has a very simple logic that I’m talking about beer and there is a state which is overriding the country, it could answer that the New York alcohol policy overrides the country and it doesn’t allow this kind of shipment.
So what we get here is that the policies are broken down into pieces. The pieces are attached to the different nodes of the graph of the tree. And we have a very simple audit trail for the administrator to see if the answer is right. And if it wasn’t right, why not? That is, if we don’t find the right policy here, we can add it. And this is what we’re going to do next.
So here we can see the hierarchy. Again, this is a very simple interface to show a simple hierarchy. Remember that this hierarchy can be much bigger. It can get all the way to the vehicle level or the person level. You can have millions of, and because it’s a tree and it’s organized in this kind of hierarchy, you don’t have to attach a policy to each one of them. You can say that all the people in the Seattle office can do something and you don’t need to say the name specifically.
And if we go on this long list of the US states and we choose New York, we can see, again, this is most of the time we use the API. This is a simple way to see the content of this API. We can see that we have the different policies that are attached to this node. Remember, this node is here, but we have another node of US above it and North America above that and the overall country above that.
Again, this is the UPS code, internal code that we have. It’s all based on the experience that we had with this customer. And you can see that we have the policy for the beer for the country, which we are not going to change now. But if we’re going to go to the beer of the state, which is the one that we want to fix.
So here we have this node that is bothering us. This is why we got a no before. I can edit this, change it to yes. Again, I’m not going to do that usually usingthis interface. I will just upload the document and the AI will generate the API calls that I’m simulating. But if I make this change, again, remember I did a simple change and a simple I’m done. I’m simulating. node, very, very manageable. And I will go now and ask the question again.
So where’s my question? Please note there were recent changes to the policies. I’m saying that because you might cache the policies to avoid calling it again and again. And if I ask the question now, again, praying for the demo gods, I knew that. It’s really confusing to answer that. Let me relax it for a minute.
We’re going to talk later about this kind of the model that we have behind the scene that are very strict on some data like beers and they are more lenient on data like wine. So if I ask a question about wine, it will stop preventing from answering questions regarding… Now let’s go and ask the question again. Hopefully now it will be nicer to me and we will go to the policy and we’ll check the new updated policy and it will reply. And we’re going to check also the logic and we’ll see.
You see now it was prevented before, it was no, but now because the policy has changed and this is the new lookup that it did. So this is the beer shipment for the state of New York and we have the new yes that we saw that we updated. So the change in the system is immediate. It’s very, very simple. And it’s more importantly, it’s manageable by a human administrator that at the end of the day they are responsible for the truthfulness of your system.
This was the demo. More than happy to start the discussion. Thank you guys. So hopefully you have a sense of the different approach that TrueRAG takes.
We’ve had a couple of questions come in whilst we’re going, which I know Nikos has answered one of them within the chat around his part of the demo where a table was processed as text. And the question was asking whether it would be more suitable to use SQL querying in that case. You’ve answered it Nikos. Do you want to expand on that a little?
Yeah, very briefly. Tables always present a problem when you rag them. And that’s why a lot of effort has gone into semantic chunking, into different ingestion methods specifically for tables in PDF documents. Generally speaking, graph databases represent tables or the graph depiction of tables is better than just ingesting them as text or ingesting them using standard rag ingestion.
Even if you use tricks such as fine tuning of embedding agents or using semantic chunking, because whatever you do at some point, especially if there are large and complex tables, they’re going to have a problem because you can always throw them into a database and query them as databases. But then you’re creating yet one more legacy environment that you need to keep updating and managing. So you’re creating a lot of, as you say, legacy, but also a lot of additional complexity as well.
Correct. I can add maybe another point here about the hierarchy because this is something that we saw in big organizations is that a policy or the document depends. So if you are here, this document is applicable. And if you are there, this document is applicable. And not always you have this kind of a context when you are doingretrieval. And then think of it like the title of the document can be policies for US and policies for England. And then you have the information. So this kind of context or it depends logic is something that if you don’t, and this is why we are focusing here on the graph database and hierarchical tree database, you will lose. And what you want, if you want to guarantee that the answer will be whole and complete, you have to structure your data differently. Putting in a database, regular rational databases is problematic to do this kind of hierarchical tree retrieval because you will get the node without the context all the way to the root of the tree. And if you keep it in the vector database, you rely on the statistical probability of retrieval, the right chunk, although the context might be far away. And this probability retrieval is always the danger, the risk of getting it wrong. And this is why if you focus it and translate it and you do some effort in building the scheme of the graph database and translating the documents into this schema, it takes the effort. But this effort will guarantee that the retrieval time, which is the sensitive time, will be much more accurate, almost perfect.
I think the other question also from Patrick, I don’t know if we addressed it enough. No, I don’t think so. I think we should try and expand on that because it’s a key question, I believe. Yes. So Patrick says, how is the lag different from GraphRug? What’s the difference? Right. So I think I did mention a bit the differences. I love GraphRug and I did a lot of work around it. And the main issues with that, with the Daiso, is that it’s sometimes generating a graph which is too complicated to handle, too complicated to administer. And this is a kind of a limitation of a system. The minute that the system becomes too complex to manage humanly, you are risking those mistakes. And this is why we took our model from Active Directory. Active Directory, in our organization, this is a very complex set of data that an administrator can handle. And it’s governing your security, your access control, everything. So it’s handling already the most sensitive data of your organization. And if you can, and GraphDatabase can be a tree, which we are doing, and can be simple, which we are forcing. If you can translate your policies to a GraphRug in a simple manageable structure, then yes, what we are building is similar to what you can build. The problem is that as we saw most of the time, the GraphRug is a complex process, unmanageable. It’s working, but once it’s done, you are helpless to make changes and understand how it works. So, I mean, some of that answer, and I think Patrick, thank you for your comment. Simplification is key. And by turning this into a product, we believe we are simplifying things entirely.
One of the things that came out of there, and you touched on it, Guy, which is around when information needs to be updated or information changes. And I guess there’s two halves to that to me in terms of a question. One half, which is why can’t we just keep training LLMs with new specific updated information? But the other half of the question is probably more around true RAG or not just around true RAG, actually. What happens? So we go through true RAG and we set up a solution with it. And we’re using a 250-page government regulation document. What happens when they come out with version two of that and it’s now 270 pages long and there’s X number of changes in it? How do we pull out the old document and push in the new document? This is going back to what we discussed about updating in a waythat is manageable. That is, if you want, and this is again what we are doing here to update a very complex document, you can give it to an LLM and say this is the old document, this is the new document. Please show me the few, again few it can be still dozens, of API calls that you’re going to do on the graph database to change from the old version to the new version. And as a person you can verify that these are the documents, these are the changes, and this is exactly capturing everything and all of the changes that you want. There are no mistakes there.
And those updates are very sensitive and you need to manage them, you need to handle them, not just hey new document, hey old document. This is where the system drifts, again even if the RAG was working perfectly in the beginning, after a couple of those changes it will drift into those hallucinations because of those changes that you don’t control. So what we do, show your LLM the two documents, show me the updates, apply the updates, and now I know that my system is updated.
Yeah, and I don’t think anybody would consider training their own LLMs unless they have enormous amounts of money and enormous amounts of time and data. And you don’t train, I mean you might train a deep model on a specific use case, definitely not the transformer, but a deep model on a specific case to do some sort of analysis. But you definitely wouldn’t start from training a large model, be it video or audio or language from scratch. And there are humongous things, even the 7 billion or even a 3 billion model will take forever and lots of data that you need to curate and you need to put safeguards around and everything.
What you can do is you can fine tune them, but again what we have found out is that fine tuning doesn’t put new information to the model, right? It changes the way that it responds. It changes the character of the model to respond closer to the tone of voice that you respond or to the environment that you are using it in context. But it doesn’t provide a lot of information. That’s the whole idea of fine tuning that you use a small amount of data and some small changes within the model to store this data. But this is not enough to give it all the information required to make it knowledgeable about your environment, your data and your specifics.
Just now I think it should be clear that we are not saying that RAG is wrong and using database is bad. This is not what we are claiming. There are many, many cases where this is the perfect solution. We built a very successful system this way. But if you have a use case which is more sensitive, that truth is essential. And we saw quite a lot of them.
Don’t give up. Don’t give up on AI. You can still benefit from the tremendous productivity boost, from the scale that you can get if you apply true RAG techniques to the application. If you continue to try to force the vector database to work for you with perfect janking and multi-queries and enrichments and everything, you will narrow the mistake, but you can never eliminate it altogether. That’s true.
Hi, Nikos. Thank you.



