Get text from DOCX

9 posts, 0 answers
  1. Manjinder
    Manjinder avatar
    1 posts
    Member since:
    Jul 2015

    Posted 20 Jul 2017 Link to this post

    Are you able to open a DOCX and return the text from the document into a string. I need to be able to parse through the string to determine text replacements to be made. I can see methods for replacing text, but not for determining what text a document contains

    Regards

  2. Boby
    Admin
    Boby avatar
    763 posts

    Posted 24 Jul 2017 Link to this post

    Hi Manjinder,

    To extract the text, you can import the document using the DocxFormatProvider and export it as plain text using the TextFormatProvider:
    DocxFormatProvider docxFormatProvider = new DocxFormatProvider();
    RadFlowDocument document = docxFormatProvider.Import(str);
    TxtFormatProvider txtFormatProvider = new TxtFormatProvider();
    string text = txtFormatProvider.Export(document);


    Regards,
    Boby
    Progress Telerik

  3. JeffSM
    JeffSM avatar
    44 posts
    Member since:
    May 2014

    Posted 30 Jan 2019 in reply to Boby Link to this post

    A better code please, what is the usings that I need? What is str? In 2019 R1 don't works.
  4. JeffSM
    JeffSM avatar
    44 posts
    Member since:
    May 2014

    Posted 30 Jan 2019 in reply to Boby Link to this post

    To help some one else:

    using Telerik.WinForms.Documents.FormatProviders.OpenXml.Docx;
    using Telerik.WinForms.Documents.FormatProviders.Txt;

     var docxFormatProvider = new DocxFormatProvider();
                using (var input = File.OpenRead(textBox1.Text)) // Full file with path
                {
                    var document = docxFormatProvider.Import(input);
                    var txtFormatProvider = new TxtFormatProvider();
                    string text = txtFormatProvider.Export(document); // TEXT
                    textBox2.Text = text;
                }

  5. JeffSM
    JeffSM avatar
    44 posts
    Member since:
    May 2014

    Posted 30 Jan 2019 in reply to Boby Link to this post

    using Telerik.WinForms.Documents.FormatProviders.OpenXml.Docx;
    using Telerik.WinForms.Documents.FormatProviders.Txt;

     

      private void RadButton1_Click(object sender, EventArgs e)
            {
                var docxFormatProvider = new DocxFormatProvider();
                using (var input = File.OpenRead(textBox1.Text))
                {
                    var document = docxFormatProvider.Import(input);
                    var txtFormatProvider = new TxtFormatProvider();
                    string text = txtFormatProvider.Export(document);
                    textBox2.Text = text;
                }

     

    }

  6. JeffSM
    JeffSM avatar
    44 posts
    Member since:
    May 2014

    Posted 30 Jan 2019 in reply to Boby Link to this post

    01.using Telerik.WinForms.Documents.FormatProviders.OpenXml.Docx;
    02.using Telerik.WinForms.Documents.FormatProviders.Txt;
    03. 
    04. private void RadButton1_Click(object sender, EventArgs e)
    05.        {
    06.            var docxFormatProvider = new DocxFormatProvider();
    07.            using (var input = File.OpenRead(textBox1.Text))
    08.            {
    09.                var document = docxFormatProvider.Import(input);
    10.                var txtFormatProvider = new TxtFormatProvider();
    11.                string text = txtFormatProvider.Export(document);
    12.                textBox2.Text = text;
    13.            }
    14. 
    15.        }
  7. Tanya
    Admin
    Tanya avatar
    894 posts

    Posted 31 Jan 2019 Link to this post

    Hello Jefferson,

    The namespaces you have included are related to the RadRichTextEditor control from the UI for WinForms suite. To use the objects from RadWordsProcessing, you would need to use the namespaces starting with Telerik.Windows.Documents.Flow:
    Telerik.Windows.Documents.Flow.FormatProviders.Docx.DocxFormatProvider docxFormatProvider = new Telerik.Windows.Documents.Flow.FormatProviders.Docx.DocxFormatProvider();
    Telerik.Windows.Documents.Flow.Model.RadFlowDocument document = docxFormatProvider.Import(str);
    Telerik.Windows.Documents.Flow.FormatProviders.Txt.TxtFormatProvider txtFormatProvider = new Telerik.Windows.Documents.Flow.FormatProviders.Txt.TxtFormatProvider();
    string text = txtFormatProvider.Export(document);

    The str variable represents the stream of the file that should be passed to the Import() method of DocxFormatProvider so the class can read the contents and import the file. You can find more information about the different format providers and how to use them in the Formats and Conversion section in our documentation.

    Hope this helps.

    Regards,
    Tanya
    Progress Telerik
    Get quickly onboarded and successful with your Telerik and/or Kendo UI products with the Virtual Classroom free technical training, available to all active customers. Learn More.
  8. ajay
    ajay avatar
    1 posts
    Member since:
    Jan 2017

    Posted 29 Jun in reply to Boby Link to this post

    I have already used the DocxFormatProvider for .docx extension and its working fine. But in case of .doc extension its throwing the exception some thing like "Central directory header is broken" .
  9. Tanya
    Admin
    Tanya avatar
    894 posts

    Posted 30 Jun Link to this post

    Hi Ajay,

    The .doc format is currently not supported and we have logged request about it on our public portal. Make sure to cast your vote for the implementation of this functionality and subscribe to the item if you would like to receive updates about status changes on it: ADD. RadRichTextEditor - add support for importing .doc documents.

    Regards,
    Tanya
    Progress Telerik

    Progress is here for your business, like always. Read more about the measures we are taking to ensure business continuity and help fight the COVID-19 pandemic.
    Our thoughts here at Progress are with those affected by the outbreak.
Back to Top