This is a migrated thread and some comments may be shown as answers.

PDF Viewer - table recognition

2 Answers 185 Views
PDFViewer
This is a migrated thread and some comments may be shown as answers.
mohamed
Top achievements
Rank 1
mohamed asked on 05 Dec 2017, 10:18 AM

Hello,

I use RadPdfViewer to display a pdf document, i use a PdfViewer.Find() to extract text from document :

Could you please tell me if there's any way to :

- Recognize and Extract tables and their data from document.

- Reading special characters (bullet points, tick boxes, ...)

 

(I work on wpf project based on the c# language)

Thank you in advance

 

2 Answers, 1 is accepted

Sort by
0
Accepted
Boby
Telerik team
answered on 07 Dec 2017, 08:23 AM
Hello Mohamed,

PDF format is optimized for viewing, and not for preserving of the semantics of the content. That said, in the general case there is no information in the content indicating that part of the content is a table. The table is, in most cases, just a bunch of paths (borders) and text fragments (words); or even could be an image. 

More sophisticated software, like MS Word for example, have the ability to do an OCR analysis of the PDF content elements, and tries to "detect" that a certain content is a table.

That said, the answer to both of your question is no. You can try to detect semantics yourself using RadPdfProcessing and its document model, but this would be tricky and will most probably work only for certain classes of PDF documents.

Regards,
Boby
Progress Telerik
Want to extend the target reach of your WPF applications, leveraging iOS, Android, and UWP? Try UI for Xamarin, a suite of polished and feature-rich components for the Xamarin framework, which allow you to write beautiful native mobile apps using a single shared C# codebase.
0
mohamed
Top achievements
Rank 1
answered on 07 Dec 2017, 09:22 AM
Thank you very much
Tags
PDFViewer
Asked by
mohamed
Top achievements
Rank 1
Answers by
Boby
Telerik team
mohamed
Top achievements
Rank 1
Share this question
or