PDF Viewer - table recognition

3 posts, 1 answers
  1. mohamed
    mohamed avatar
    7 posts
    Member since:
    Apr 2017

    Posted 05 Dec 2017 Link to this post


    I use RadPdfViewer to display a pdf document, i use a PdfViewer.Find() to extract text from document :

    Could you please tell me if there's any way to :

    - Recognize and Extract tables and their data from document.

    - Reading special characters (bullet points, tick boxes, ...)


    (I work on wpf project based on the c# language)

    Thank you in advance


  2. Answer
    Boby avatar
    697 posts

    Posted 07 Dec 2017 Link to this post

    Hello Mohamed,

    PDF format is optimized for viewing, and not for preserving of the semantics of the content. That said, in the general case there is no information in the content indicating that part of the content is a table. The table is, in most cases, just a bunch of paths (borders) and text fragments (words); or even could be an image. 

    More sophisticated software, like MS Word for example, have the ability to do an OCR analysis of the PDF content elements, and tries to "detect" that a certain content is a table.

    That said, the answer to both of your question is no. You can try to detect semantics yourself using RadPdfProcessing and its document model, but this would be tricky and will most probably work only for certain classes of PDF documents.

    Progress Telerik
    Want to extend the target reach of your WPF applications, leveraging iOS, Android, and UWP? Try UI for Xamarin, a suite of polished and feature-rich components for the Xamarin framework, which allow you to write beautiful native mobile apps using a single shared C# codebase.
  3. mohamed
    mohamed avatar
    7 posts
    Member since:
    Apr 2017

    Posted 07 Dec 2017 in reply to Boby Link to this post

    Thank you very much
Back to Top