I am using Telerik Document Processing (RadPdfProcessing) in an ASP.NET Core API to extract text from uploaded PDF files.
The current approach works fine for simple PDFs with embedded text, but I am facing issues with certain PDF files where no text is extracted at all (empty result), even though the PDF is readable visually.
PdfFormatProvider provider = new PdfFormatProvider();
RadFixedDocument document = provider.Import(stream);
StringBuilder sb = new StringBuilder();
foreach (RadFixedPage page in document.Pages)
{
var textFragments = page.Content.OfType<TextFragment>();
foreach (var fragment in textFragments)
{
sb.Append(fragment.Text);
}
sb.AppendLine();
}
string extractedText = sb.ToString();
Please help us resolve this issue.
Regards,
Rajendra
