ClassTesseractOcrProvider
OCR provider backed by the Tesseract engine that extracts text from images, configurable by language packs and data path.
Definition
Namespace:Telerik.Windows.Documents.TesseractOcr
Assembly:Telerik.Windows.Documents.TesseractOcr.dll
Syntax:
public class TesseractOcrProvider : IOcrProvider
Inheritance: objectTesseractOcrProvider
Implements:
Constructors
TesseractOcrProvider(string)
Creates a new instance of the TesseractOcrProvider class.
Declaration
public TesseractOcrProvider(string dataPath)
Parameters
dataPath
The path to the parent directory containing the tessdata directory. Ignored if the TESSDATA_PREFIX environment variable is set. If set to "." the tessdata directory should be in the same directory as the executable.
Properties
DataPath
The path to the parent directory containing the tessdata directory. Ignored if the TESSDATA_PREFIX environment variable is set. "." by default. If left unchanged, the tessdata directory should be in the same directory as the executable.
LanguageCodes
The language codes to use for the Tesseract OCR engine. You can find the corresponding trained data for each language and their codes here: https://github.com/tesseract-ocr/tessdata
ParseLevel
Gets or sets the granularity for recognized text (based on OcrParseLevel), which affects iterator level and output grouping.
Declaration
public OcrParseLevel ParseLevel { get; set; }
Property Value
Implements
Methods
GetAllTextFromImage(byte[])
Extracts all text from an image and returns it as a single string.
GetTextFromImage(byte[])
Extracts the text from an image and returns the words and their bounding rectangles.
Declaration
public Dictionary<Rectangle, string> GetTextFromImage(byte[] imageBytes)
Parameters
imageBytes
byte[]
The bytes of the image.
Returns
Words with corresponding bounding rectangles
Implements