[Solved] Telerik PdfProcessing (RadFixedDocument) not extracting text for some PDFs using TextFragment works for simple PDFs only

1 Answer 3 Views
PdfProcessing WordsProcessing
Rajendra
Top achievements
Rank 1
Rajendra asked on 13 May 2026, 02:11 PM

I am using Telerik Document Processing (RadPdfProcessing) in an ASP.NET Core API to extract text from uploaded PDF files.

The current approach works fine for simple PDFs with embedded text, but I am facing issues with certain PDF files where no text is extracted at all (empty result), even though the PDF is readable visually.

 

PdfFormatProvider provider = new PdfFormatProvider();
RadFixedDocument document = provider.Import(stream);

StringBuilder sb = new StringBuilder();

foreach (RadFixedPage page in document.Pages)
{
    var textFragments = page.Content.OfType<TextFragment>();

    foreach (var fragment in textFragments)
    {
        sb.Append(fragment.Text);
    }

    sb.AppendLine();
}

string extractedText = sb.ToString();

Please help us resolve this issue.

Regards,
Rajendra

1 Answer, 1 is accepted

Sort by
0
Dess | Tech Support Engineer, Principal
Telerik team
answered on 13 May 2026, 03:05 PM

Hi, Rajendra,

Your current approach is correct for extracting text from PDFs that contain embedded, machine-readable text. However, there are scenarios where visually readable PDFs do not have extractable text elements. The following KB article is quite useful about this scenario: Extracting Text from PDF Documents.

Here are some common reasons and suggestions for handling such cases:

PDF Contains Scanned Images (No Embedded Text)

Many PDFs are generated from scanned documents and contain only images, not actual text data. In these cases, the Content collection of RadFixedPage will be empty, and no TextFragment objects will be found. To extract text from these image-based PDFs, you need to use the Optical Character Recognition (OCR) functionality. RadPdfProcessing supports this through the OcrFormatProvider in combination with an OCR provider such as Tesseract.

Please refer to the PdfProcessing Optical Character Recognition (OCR) Demo.

Other Reasons for Empty Extraction

Some PDFs may use uncommon encoding or unsupported features, which can prevent text extraction even if the document is not image-based. If the problematic PDFs are not scans, please check if they use special fonts, encodings, or protections.

Next Steps

Could you please confirm if the problematic PDFs are scanned documents or contain images? It would be greatly appreciated if you can provide a sample PDF document demonstrating the issue you are facing. If you are concerned about sharing it in the public forum, I would recommend you to submit a private support ticket. This information will help me provide more targeted guidance for your scenario.

I hope this information helps. If you need any further assistance please don't hesitate to contact me. 

Regards,
Dess | Tech Support Engineer, Principal
Progress Telerik

Love the Telerik and Kendo UI products and believe more people should try them? Invite a fellow developer to become a Progress customer and each of you can get a $50 Amazon gift voucher.

Tags
PdfProcessing WordsProcessing
Asked by
Rajendra
Top achievements
Rank 1
Answers by
Dess | Tech Support Engineer, Principal
Telerik team
Share this question
or