Class
TesseractOcrProvider

OCR provider backed by the Tesseract engine that extracts text from images, configurable by language packs and data path.

Definition

Namespace:Telerik.Windows.Documents.TesseractOcr

Assembly:Telerik.Windows.Documents.TesseractOcr.dll

Syntax:

cs-api-definition
public class TesseractOcrProvider : IOcrProvider

Inheritance: objectTesseractOcrProvider

Implements: IOcrProvider

Constructors

TesseractOcrProvider(string)

Creates a new instance of the TesseractOcrProvider class.

Declaration

cs-api-definition
public TesseractOcrProvider(string dataPath)

Parameters

dataPath

string

The path to the parent directory containing the tessdata directory. Ignored if the TESSDATA_PREFIX environment variable is set. If set to "." the tessdata directory should be in the same directory as the executable.

Properties

DataPath

The path to the parent directory containing the tessdata directory. Ignored if the TESSDATA_PREFIX environment variable is set. "." by default. If left unchanged, the tessdata directory should be in the same directory as the executable.

Declaration

cs-api-definition
public string DataPath { get; set; }

Property Value

string

LanguageCodes

The language codes to use for the Tesseract OCR engine. You can find the corresponding trained data for each language and their codes here: https://github.com/tesseract-ocr/tessdata

Declaration

cs-api-definition
public List<string> LanguageCodes { get; set; }

Property Value

List<string>

ParseLevel

Gets or sets the granularity for recognized text (based on OcrParseLevel), which affects iterator level and output grouping.

Declaration

cs-api-definition
public OcrParseLevel ParseLevel { get; set; }

Property Value

OcrParseLevel

Implements IOcrProvider.ParseLevel

Methods

GetAllTextFromImage(byte[])

Extracts all text from an image and returns it as a single string.

Declaration

cs-api-definition
public string GetAllTextFromImage(byte[] imageBytes)

Parameters

imageBytes

byte[]

The bytes of the image.

Returns

string

The entire text as a string.

Implements IOcrProvider.GetAllTextFromImage(byte[])

GetTextFromImage(byte[])

Extracts the text from an image and returns the words and their bounding rectangles.

Declaration

cs-api-definition
public Dictionary<Rectangle, string> GetTextFromImage(byte[] imageBytes)

Parameters

imageBytes

byte[]

The bytes of the image.

Returns

Dictionary<Rectangle, string>

Words with corresponding bounding rectangles

Implements IOcrProvider.GetTextFromImage(byte[])