Loading, Accessing and Converting Office and PDF Documents with Telerik Document Processing Libraries

by Peter Vogel

Published: March 31, 2026 7 min read Web, ASP.NET Core, Productivity, Document Processing 0 Comments

Telerik-Document-ProcessingT2-light-1200x303 Blog Cover - Top Image

Summarize with AI:

Here’s what you need to get started with Telerik Document Processing Libraries to work with PDF, Word and Excel files (and, like any good suite, make all those document types look very much alike).

Progress Telerik Document Processing Libraries, in addition to letting you work with a variety of document formats (PDF, DOCX, RTF, HTML, XLSX and more), are an example of the reason you buy into a suite of tools: All the tools bear a “family resemblance.”

The ideal scenario, of course, would be for a single tool that made all these document formats look the same. Given the differences in format and functionality between, for example, a PDF, a Microsoft Word (or RTF or HTML) document and an Excel spreadsheet, that’s not reasonable (though Progress Telerik has achieved that with DOCX, RTF and HTML documents).

The good news here is that, with Telerik Document Processing Libraries (DPL), the family resemblance is strong enough that, for all of the document types the library supports, I can show you how to load documents, start the editing process, convert between various document types and save a document in this one post.

Configuring Your Project

The sample code in this post was all written using the DPL for Windows libraries (even though I was working in ASP.NET Core—the more technical name for the version I used is “.NET(Target OS: Windows).” The suite is also available for the .NET Framework.

To create an ASP.NET Core project that would work with “all the documents,” I added these NuGet packages to my project:

To work with PDF files: Telerik.Windows.Documents.Fixed
To work with DOCX, HTML and RTF files: Telerik.Windows.Documents.Flow
To work with Excel spreadsheets: Telerik.Windows.Documents.Spreadsheet

For the Excel spreadsheets, I’m only going to work with XLSX files, so I added the Telerik.Windows.Documents.Spreadsheet.FormatProviders.OpenXml package to my project. (If I was going to work with, for example, XLS spreadsheet, then I would have added the Telerik.Documents.Spreadsheet.FormatProviders.Xls package.)

Loading Your Document

The code to load a document from a file into any of these libraries is very similar:

Create the appropriate provider.
Use the .NETFile object’s OpenRead to create a Stream that points to the file.
Use the provider’s Import method to load the stream into the document object.

Here, for example, is the code to load a PDF file into a RadFixedDocument object (I used this code in an ASP.NET Core application with documents in my project’s wwwroot folder):

RadFixedDocument doc;
PdfFormatProvider prov = new();

using (Stream str = File.OpenRead(@"wwwroot/documents/Priorities.pdf"))
{
    doc = prov.Import(str, TimeSpan.FromSeconds(10));
}

And here’s the code to load a DOCX file into a RadFlowDocument:

RadFlowDocument doc;
DocxFormatProvider prov = new();

using (Stream str = File.OpenRead(@"wwwroot/documents/Priorities.docx"))
{
    doc = prov.Import(str, TimeSpan.FromSeconds(10));
}

As you can see, the code is identical except for the provider (PdfFormatProvider vs. DocxFormatProvider) and document objects (RadFlowDocument vs. RadFixedDocument).

Because both RTF and HTML documents load into the same RadFlowDocument object as a DOCX document, only the provider object changes (HtmlFormatProvider or RtfFormatProdiver instead of DocxFormatProvider) when working with those file formats. Here’s the code to load an RTF document:

RadFlowDocument doc;
RtfFormatProvider prov = new();

using (Stream str = File.OpenRead(@"wwwroot/documents/Priorities.rtf"))
{
    doc = prov.Import(str, TimeSpan.FromSeconds(10));
}

And here’s the almost identical code to load an HTML file:

RadFlowDocument doc;
HtmlFormatProvider prov = new();

using (Stream str = File.OpenRead(@"wwwroot/documents/Priorities.html"))
{
    doc = prov.Import(str, TimeSpan.FromSeconds(10));
}

The code to load an Excel workbook is also similar to what you’ve seen before, just swapping in a new document object (Workbook) and provider (XlsxFormatProvider):

Workbook doc;
XlsxFormatProvider prov = new();
using (Stream str = File.OpenRead(@"wwwroot/documents/priority.xlsx"))
{
    doc = prov.Import(str, TimeSpan.FromSeconds(10));
}

A note: The Workbook object assumes that you’re going to load the whole Excel workbook into memory. For very large workbooks, that may not make sense. For that scenario, you should look at the SpreadStreamProcessing Library.

Modifying the Documents

Once you’ve loaded the documents, you can start working with them. You can often simplify your code by using the RadFixedDocumentEditor with PDF documents or the RadFlowDocumentEditorwith Word/RTF/HTML documents. Not surprisingly, the code for creating an editor is almost identical for these two document types: create an editor object and pass the document you want loaded into the editor.

The code to create an editor for a PDF document looks like this:

RadFixedDocumentEditor editor = new RadFixedDocumentEditor(doc);

The code for Word/RTF/HTML documents looks like this:

RadFlowDocumentEditor editor = new RadFlowDocumentEditor(doc);

That’s not to say that, as you start working with those documents, there aren’t going to be differences. These are, after all, very different kinds of documents. Having said that, some functionality does work in a similar way across all the document types.

If, for example, I want to search a PDF document for the text “ASP.NET,” I create a TextSearch instance from my document object. I then use the TextSearch object’s FindAll method to search for text in my PDF document, passing two things: my search text and a TextSearchOptions object that specifies how I want my search conducted. That FindAll object returns a collection of SearchResult objects that I can loop through.

Typical code, then, looks like this:

TextSearch search = new TextSearch(doc);
TextSearchOptions opts = new() {
CaseSensitive = false,
WholeWordsOnly = true,
UseRegularExpression = true
                                                              };

IEnumerable<SearchResult> items = search.FindAll("ASP.NET", opts);

Debug.Print($"Found {items.Count()} items.");
foreach (SearchResult item in items)
{
    Debug.Print($"Found at {item.Range.StartPosition} ");
    Debug.Print($"Found: {item.Result}");
}

The process is almost identical for the RadFlowDocument object, except:

You call the FindAllmethod directly from the RadFlowDocumentEditor.
There isn’t a separate options object (though all the search options from the TextSearch object are still available).
The FindAll method on the editor returns FindResult objects instead of SearchResult objects.

As a result, the equivalent search code for a DOCX, RTF or HTML document looks like this:

RadFlowDocumentEditor editor = new RadFlowDocumentEditor(doc);

IEnumerable<FindResult> items = editor.FindAll("ASP.NET");

Debug.Print($"Found {items.Count()} items.");
foreach (FindResult item in items)
{
    Debug.Print($"Found at {item.RelativeStartIndex} ");
    Debug.Print($"Found: {item.FullMatchText}");
}

Searching a spreadsheet works similarly. The differences:

The FindAll method is built right into the Workbook object.
You have more search options available with the spreadsheet object.
You pass your search string as part of the options object.

My find code with a workbook would look like this:

FindOptions opts = new() {
FindWhat = "ASP.NET",
MatchCase = false,
MatchEntireCellContents = true,
                                                };

IEnumerable<FindResult> items = doc.FindAll(opts);

Debug.Print($"Found {items.Count()} items.");
foreach(FindResult item in items)
{
    Debug.Print($"Found at {item.FoundCell.CellIndex} ");
    Debug.Print($"Found: {item.ResultValue}");
}

One note: It is certainly convenient when objects from the different libraries share the same name (like the FindResult object that’s defined in both the RadFlowDocument and Workbook libraries). However, if you try to use both FindResult objects in the same code file, the compiler will get confused because the two objects are in different namespaces. In the unlikely case that you’re working with both Excel and Word documents in the same code file, you’ll have to fully qualify the object names—something I haven’t done in this post.

Saving Your Documents

To save your modified documents back to disk, you just need to use the provider’s Export method. The code for all the document types is identical:

using (Stream str = File.Create(@"wwwroot/documents/Prioritiesnew.<filetype>"))
{
    prov.Export(doc,str, TimeSpan.FromSeconds(10));
}

Converting Documents

You can convert from one type in the suite to other types and, not surprisingly, the conversion processes look very much alike. As you’ve seen before, it often comes down to using the right provider.

If, for example, you want plain text version of your PDF file, you use the TextFormatProvider object’s Export method:

TextFormatProvider prov = new();
string text = prov.Export(doc, TimeSpan.FromSeconds(10));

For Word/HTML/RTF document types, the code is almost identical except it uses the TxtFormatProvider object:

TxtFormatProvider txProv = new();
string text = prov.Export(doc, TimeSpan.FromSeconds(10));

One note: There is a downside to having the classes that do similar things to different document types have the same name. If you are mixing document types and, as a result, using the Fixed, Flow and spreadsheet libraries in the same application, the compiler can get confused about which class from which library you’re using. If so, you’ll have to fully qualify your class names by including their namespaces in the class names. That makes for hard-to-read code, so I haven’t done that here.

But, as an example of how different documents require different functionality, you probably wouldn’t ever want to convert an Excel workbook to a string … but you might want to save your workbook as a CSV file. As you might expect by now, the code to save your imported workbook into a CSV file just means using the Export method on the appropriate provider—the CsvFormatProvider object in this case.

Typical code would look like this:

CsvFormatProvider prov = new();

using (Stream str = File.Create("Priority.csv"))
{
    prov.Export(doc, str, TimeSpan.FromSeconds(10));
}

Converting any of these document types (Excel, Word, HTML, etc.) to PDF is equally straightforward. Because all the libraries look very much alike, it’s really just a matter of adding the library with the provider you need.

But you can also convert your Workbook object into a RadFixedDocument if you wanted to manipulate your spreadsheet as a PDF object. That conversion is handled by the PdfFormatProvider from the Telerik.Windows.Documents.Spreadsheet.Formatproviders.Pdf package and using its ExportToFixedDocumentmethod:

PdfFormatProvider prov = new();

RadFixedDocument fixedDoc = 
        prov.ExportToFixedDocument(doc, TimeSpan.FromSeconds(10));

The code is identical if you want to convert a Word/RTF/HTML document to a RadFixedDocument to work with it as a PDF file. That conversion also uses a PdfFormatProvider object but, this time, from the Telerik.Windows.Documents.Flow.FormatProviders.Pdf namespace.

But, because the providers from the two libraries have the same name, if you’re using both libraries in the same code file, you will need to fully qualify your provider names to make sure you’re getting the appropriate PdfFormatProvider.

Of course, once you start using these tools to create or modify the documents you’ve loaded, you’ll find more differences—the functionality in a spreadsheet is very different from the functionality in an HTML document. But, while the family resemblances among this suite won’t eliminate those differences, it does cut those differences down to what matters: how those documents differ in their functionality. Which is, after all, what you want.

Explore Telerik Document Processing Libraries, plus component libraries, reporting and more with a free trial of the Telerik DevCraft bundle:

Try DevCraft

ASP.NET Core, PDF, Telerik DevCraft, Telerik Document Processing Libraries, Telerik UI for ASP.NET Core

About the Author

Peter Vogel

Peter Vogel is both the author of the Coding Azure series and the instructor for Coding Azure in the Classroom. Peter’s company provides full-stack development from UX design through object modeling to database design. Peter holds multiple certifications in Azure administration, architecture, development and security and is a Microsoft Certified Trainer.

Comments

Comments are disabled in preview mode.

All articles

Topics

Web Mobile Desktop Design Productivity People

Latest Stories
in Your Inbox

Subscribe to be the first to get our expert-written articles and tutorials for developers!

All fields are required

Country/Territory

Blog

Loading, Accessing and Converting Office and PDF Documents with Telerik Document Processing Libraries

Configuring Your Project

Loading Your Document

Modifying the Documents

Saving Your Documents

Converting Documents

Peter Vogel

Related Posts

Spreadsheet Analysis with Telerik Document Processing Libraries Agentic Tools

Cloud Integration with Telerik Document Processing

Building a RAG (Retrieval-Augmented Generation) in ASP.NET Core

Comments

All articles

Topics

Latest Stories in Your Inbox

Latest Stories
in Your Inbox