Telerik blogs

The RAG resources you create to use with your LLMs are strategic resources, just like your organization’s databases. Telerik Agent Tools let you create the custom tools for managing those resources.

Like your organization’s databases, you should regard your AI resources as strategic, organizational resources. And, like the data in your organization’s databases, you need to manage the content used by your AI resources to enable those resources give you reliable, grounded answers. Telerik Agent Tools API lets gives you the tools for automating the process of creating and maintaining the content in your AI resources.

In an earlier post, I showed how to use Progress Telerik Document Processing Libraries (DPL) to create an AI resource with content from existing documents and then tie that resource to a Large Language Model (LLM) that your users could query.

Tying an LLM to your content creates a Resource Augmented Generation (RAG) resource that helps get your users grounded answers driven by the content you load into your resource. And using the DPL toolset is a great solution when you’re building an application that creates a RAG resource “as needed.”

But if you want to reliably create/update a RAG resource that will be used (and reused) by multiple applications and multiple users, then you need the Telerik Agent Tools API. Once you have a reliable process for creating your RAG resource, the Agent Tools API also lets you then use that resource to support your users by integrating with Microsoft’s latest toolset for responding to your user’s prompts.

In this post, I’m going to cover how to construct an automated workflow for creating a RAG resource that can be used across applications and users. In my next post, I’ll cover how to use that resource with the LLM of your choice.

Building a RAG Resource Workflow

You can use the Tools API to create an interactive application that lets you manage the documents that make up your resource’s content. However, for this post, I’m going to assume a simpler process than that (though I’ll cover all the tools you’d need for an interactive solution).

For this post, I’m going to assume there’s a folder that holds all the documents with the content that should be in my RAG resource (which might be Word documents, spreadsheets, PDF, text files—basically, all the content that you can load with Telerik DPL tools). Users control what goes into my resource by adding and removing documents from that folder.

In this case study, then, I build my RAG resource by running a batch program that reads all the documents in the folder, adds them all to a RAG resource in memory, and then saves the new resource, replacing any existing version with a new, updated version.

Configuring Your Project

To start building an application to manage your RAG resource, you must first add the Telerik.Documents.AI.Tools.Core NuGet package to your project. After that, you need the specific Telerik packages that support the document types that you’ll add to your RAG resource.

For my case study, I’m just going to use PDF documents, so I just need the “Fixed” packages:

- Telerik.Documents.AI.Tools.Fixed.Core
- Telerik.Documents.AI.AgentTools.Fixed 

If you’re working with spreadsheets, you’ll want to add the equivalent Spreadsheet packages:
- Telerik.Documents.AI.Tools.Spreadsheet.Core
-Telerik.Documents.AI.AgentTools.Spreadsheet)

To support Microsoft Word, HTML and text documents, you also need to add the Telerik.Documents.AI.Tools.Flow.Core package.

And, of course, there’s no reason you couldn’t make your life simpler by adding all the Telerik.Documents.AI.* libraries to your application, even if you don’t need to use them all right now. In this case study, I also didn’t take advantage of the some of the other packages available—the Telerik.Documents.AI.AgentTools.Conversion package that handles conversions between document types, for example.

Building Your RAG Resource

The process for creating a RAG resource begins with creating an in-memory repository which holds your documents. A repository can hold one of three categories of documents: spreadsheets, PDF documents or Word and Word-related documents.

In my case study, I’m only going to work with PDF documents (which Progress Telerik classifies as “Fixed” documents), so I created an inMemoryFixedDocumentRepository object and then stored a reference to that repository in an IFixedDocumentRepository variable.

The code to do that looks like this:

IFixedDocumentRepository pdfRepo = 
          new InMemoryFixedDocumentRepository( new TimeSpan(0, 2, 0) );

All of the three types of repositories look very much alike though (they all share a common IDocumentRepository interface, for example). As a result, working with other document types is similar.

If, for example, you were loading spreadsheet files (“Workbook” documents), you’d use the InMemoryWorkbookRepository object and the IWorkbookRepository interface. For Word and Word-related documents (“Flow” documents), you’d use the InMemoryFlowDocumentRepository object and the IFlowDocumentRepository interface.

Repositories have some built-in functionality (they all include a ListDocuments method that returns a list of documents in the repository, for example). You will, however, want to extend your repository by attaching one or more toolsets containing tools that add additional functionality to your repository.

For my case study, I just needed to be able to import documents into an empty repository so the only toolset I attached was the FixedFileManagementAgentTools toolset that supports working with a PDF repository and has import (and export) methods for individual documents. To that toolset looks like this, I pass a reference to my repository, I pass a reference to my repository and the path to the folder containing the documents that the tools will work with:

FixedFileManagementAgentTools pdfFileTools = 
            new FixedFileManagementAgentTools(pdfRepo, repoFolderPath);

You can add additional toolsets if you need additional functionality or need to work with more folders. If, for example, I wanted to work with PDF forms, I could add the FixedDocumentFormAgentTools toolset to my repository. When creating a toolset, you’ll always need to pass a reference to the repository the toolset will be attached to, but other parameters (if any) will vary from one toolset to another.

Managing Repositories with Registries

If you’re going to be working with multiple repositories in your workflow, you might want to create a registry to simplify managing your repositories. In my case study, for example, where I’m potentially loading several different kinds of files, I could end up supporting all three types of repositories (PDF files, spreadsheets, Word and Word-related documents)—creating a registry simplifies switching between the different types of repositories.

Registries are created using the DocumentRepositoryRegistry object and support registering one repository of each document category using their RegisterRespository method.

You can then retrieve the repository you want from a registry by using the registry’s TryGetRepository method, passing two parameters: the document type of the repository you want and an out parameter of type IDocumentRepository. Like other TryGet* methods, the TryGetRepository method returns return true if the method finds the repository you want and false if the method does not (the actual repository, if found, is loaded into the method’s second, out parameter).

The following code creates a registry, registers my PDF repository using the RegisterRepository method and then immediately retrieves the repository using the registry’s TryGetRepository method. Because the repository is returned using the common IDocumentRepository interface type, I cast the returned reference to the IFixedDocumentRepository type before working with it:

DocumentRepositoryRegistry registry = new();

registry.RegisterRepository(DocumentType.FixedDocument, pdfRepo);

if (registry.TryGetRepository(DocumentType.FixedDocument, 
                                                           out IDocumentRepository? repo) )
{
     IFixedDocumentRepository fixedRepo = (IFixedDocumentRepository) repo;
    //…work with the fixedRepo repository
}

You’ll also want to create a registry if, as part of your workflow, you want to support converting documents between formats or merging multiple documents. The toolsets that support that are ConvertDocumentsAgentTool and MergeDocumentsAgentTool toolsets and they attach to registries rather than repositories.

Loading, Saving and Retrieving Your Repository

Adding a document to a repository is easy: Just call the appropriate import method on the appropriate toolset that’s tied to the repository you want to update.

To add a PDF document to my repository, for example, I’d use the ImportFixedDocument method on my FixedFileManagementAgentTools toolset, tied to my PDF repository. To use the ImportFixedDocument method, I pass the path to the document I want to load and its document format (e.g., PDF, XLSX, etc.). Optionally, I can pass a name for the document which will be returned as part of the ListDocuments method.

Which means that importing a PDF document into my repository would look like this:

CallToolResponse res =
          pdfRepoTools.ImportFixedDocument(documentPath, 
                                                                                         DocumentFormat.PDF,
                                                                                          Path.GetFileNameWithoutExtension(documentPath)
                                                                                        );

The CallToolResponse object’s IsError property will be set to true if your import succeeds (and the Message property will contain additional information regardless of whether the call succeeds or fails).

Once I’ve added all the documents from my folder to my repository, I can save my repository with all of its content to a single file using my repository’s MergeAndExport method. The MergeAndExport method requires three parameters:

  • An array of the id property for each of the documents in the repository. You can use that array both to control which documents are exported and the merge order of the documents. I didn’t bother and just used a LINQ Select method to retrieve all the id property values.
  • A FileStream object that points to your repository file (I set this up to overwrite my repository every time).
  • The document type for the file (PDF, in my case).

You must close your FileStream after you finish exporting your documents. As a result, the code to export my repository of PDF documents would look like this:

string[] ids = pdfRepo.ListDocuments().Select( doc => doc.Id ).ToArray();
FileStream fs =  new FileStream(repoFilePath, FileMode.CreateNew),
                                 
repo.MergeAndExport(ids,
                                                fs,  
                                                DocumentFormat.PDF);
fs.Close()

You can now tie that file to an LLM using Microsoft ChatClientAgent object to let your users query your RAG resource. I’ll walk through how to do that in my next post.


Peter Vogel
About the Author

Peter Vogel

Peter Vogel is both the author of the Coding Azure series and the instructor for Coding Azure in the Classroom. Peter’s company provides full-stack development from UX design through object modeling to database design. Peter holds multiple certifications in Azure administration, architecture, development and security and is a Microsoft Certified Trainer.

Related Posts

Comments

Comments are disabled in preview mode.