New to Telerik Document ProcessingStart a free 30-day trial

Using OcrFormatProvider

Updated on Jun 11, 2026
Minimum versionQ1 2025

Use OcrFormatProvider to recognize text in scanned images and turn that image content into searchable PDF content. The provider imports an image, runs an Optical Character Recognition (OCR) engine over it, and returns a RadFixedPage that you can place in a PDF document.

By default, OcrFormatProvider works with TesseractOcrProvider, which uses the third-party Tesseract OCR engine. If the default provider does not match your platform or deployment model, you can plug in a custom OCR provider implementation instead.

See the PdfProcessing Optical Character Recognition demo for a working example of the feature.

Review the OCR prerequisites article before you start. That article contains the required packages, tessdata setup, native Tesseract dependencies, and Linux-specific installation steps.

Use this article when you need to:

  • Convert a scanned image into a searchable PDF page.
  • Configure TesseractOcrProvider for one or more recognition languages.
  • Understand which platforms support the default Tesseract-based provider.
  • Decide when to use a custom OCR provider.
  • Diagnose common OCR setup and recognition issues.

Supported Platforms

The default TesseractOcrProvider implementation is supported on Windows and Linux. If your application runs on a platform where the default Tesseract integration is unavailable, use a custom OCR provider.

For Blazor WebAssembly, do not assume that the default Tesseract-based provider is available. Review the package guidance in OCR prerequisites and use a custom provider when your deployment model cannot use the default native Tesseract integration.

If your application uses cross-platform image processing dependencies, also review Cross-Platform Support - Images.

End-to-End OCR Workflow

In a typical OCR workflow, the input is a scanned image file and the output is a searchable PDF page inside a RadFixedDocument:

  1. Set up the required OCR packages and native dependencies.
  2. Place the tessdata folder in a location that your application can access.
  3. Create a TesseractOcrProvider and point it to the parent folder that contains tessdata.
  4. Configure the OCR options, such as language codes and parse level.
  5. Create an OcrFormatProvider with that OCR provider.
  6. Import the image and get a RadFixedPage result.
  7. Add the returned page to a RadFixedDocument and export the document to PDF if needed.

This flow gives you a PDF page that contains the recognized text in a searchable form instead of a plain image-only scan.

TesseractOcrProvider Public API

The following example shows the usual setup for OcrFormatProvider and TesseractOcrProvider.

C#
// Requirement for Images in .NET Standard - https://docs.telerik.com/devtools/document-processing/libraries/radpdfprocessing/cross-platform/images
//FixedExtensibilityManager.ImagePropertiesResolver = new ImagePropertiesResolver();

TesseractOcrProvider tesseractOcrProvider = new TesseractOcrProvider(".");
tesseractOcrProvider.LanguageCodes = new List<string>() { "eng" };
//tesseractOcrProvider.CorrectVerticalPosition = false; // Available in .NET Standard
tesseractOcrProvider.DataPath = @"..\..\..\";
tesseractOcrProvider.ParseLevel = OcrParseLevel.Line;

string imagePath = @"..\..\..\images\image.png";

string imageText = tesseractOcrProvider.GetAllTextFromImage(File.ReadAllBytes(imagePath));
Dictionary<Rectangle, string> imageTextAndTextDimentions = tesseractOcrProvider.GetTextFromImage(File.ReadAllBytes(imagePath));

OcrFormatProvider OcrProvider = new OcrFormatProvider(tesseractOcrProvider);

RadFixedDocument document = new RadFixedDocument();

RadFixedPage page = new RadFixedPage();
page = OcrProvider.Import(new FileStream(imagePath, FileMode.Open), null);
document.Pages.Add(page);

string outputPath = "output.pdf";
PdfFormatProvider pdfFormatProvider = new PdfFormatProvider();
using (Stream output = File.OpenWrite(outputPath))
{
    pdfFormatProvider.Export(document, output, TimeSpan.FromSeconds(10));
}

After the import completes, the provider returns a RadFixedPage that you can add to a document, inspect, or export.

TesseractOcrProvider Settings

Use these members to control how the default OCR engine behaves:

Method or propertyDescription
TesseractOcrProvider(string dataPath)Initializes the provider with the path to the parent directory that contains the tessdata folder.
LanguageCodesSets the language codes for the Tesseract OCR engine. The default value is eng. Download the required trained data files from the Tesseract tessdata repository.
CorrectVerticalPositionTries to correct the vertical position of the recognized text. This option is not available in .NET Framework.
DataPathGets or sets the path to the parent directory that contains the tessdata folder.
ParseLevelControls whether OCR parsing returns text by line or by word through OcrParseLevel.Line or OcrParseLevel.Word.
GetAllTextFromImageExtracts all recognized text from an image and returns it as a single string.
GetTextFromImageExtracts recognized text and returns words together with their bounding rectangles.

Setup Tips for Better OCR Results

Check these setup points before you troubleshoot recognition quality:

  • Use images with 300 DPI when possible.
  • Make sure the tessdata folder contains eng.traineddata and any additional languages that you configure in LanguageCodes.
  • Pass the parent folder of tessdata to TesseractOcrProvider, not the tessdata folder itself.
  • Verify that the required native Tesseract files are available on the target machine.
  • On Linux, complete the extra installation steps from OCR prerequisites.

Troubleshooting

Use these checks when OCR does not behave as expected:

  • OCR initialization fails: Confirm that the required packages and native dependencies are installed.
  • No text is recognized: Verify image quality, DPI, and whether the correct trained language files are present in tessdata.
  • Wrong language is recognized: Check the LanguageCodes setting and confirm that the matching .traineddata files are deployed.
  • Text positions look off: Try the CorrectVerticalPosition option when the runtime supports it.
  • The feature does not run on the target platform: Confirm whether the application uses the default Windows and Linux Tesseract integration or whether it needs a custom OCR provider.

Next Steps

Continue with the article that matches your next task:

  1. Use OCR prerequisites to complete package and native dependency setup.
  2. Review Implementing a Custom OCR Provider if you need a different OCR engine or unsupported platform coverage.
  3. Use Extracting Text from PDF Documents when you need to process searchable PDF content after OCR.

A complete example that implements OcrFormatProvider is available in the Document Processing SDK repository.

See Also