Get text from DOCX

7 posts, 0 answers
  1. Manjinder
    Manjinder avatar
    1 posts
    Member since:
    Jul 2015

    Posted 20 Jul 2017 Link to this post

    Are you able to open a DOCX and return the text from the document into a string. I need to be able to parse through the string to determine text replacements to be made. I can see methods for replacing text, but not for determining what text a document contains

    Regards

  2. Boby
    Admin
    Boby avatar
    729 posts

    Posted 24 Jul 2017 Link to this post

    Hi Manjinder,

    To extract the text, you can import the document using the DocxFormatProvider and export it as plain text using the TextFormatProvider:
    DocxFormatProvider docxFormatProvider = new DocxFormatProvider();
    RadFlowDocument document = docxFormatProvider.Import(str);
    TxtFormatProvider txtFormatProvider = new TxtFormatProvider();
    string text = txtFormatProvider.Export(document);


    Regards,
    Boby
    Progress Telerik

  3. JeffSM
    JeffSM avatar
    42 posts
    Member since:
    May 2014

    Posted 30 Jan in reply to Boby Link to this post

    A better code please, what is the usings that I need? What is str? In 2019 R1 don't works.
  4. JeffSM
    JeffSM avatar
    42 posts
    Member since:
    May 2014

    Posted 30 Jan in reply to Boby Link to this post

    To help some one else:

    using Telerik.WinForms.Documents.FormatProviders.OpenXml.Docx;
    using Telerik.WinForms.Documents.FormatProviders.Txt;

     var docxFormatProvider = new DocxFormatProvider();
                using (var input = File.OpenRead(textBox1.Text)) // Full file with path
                {
                    var document = docxFormatProvider.Import(input);
                    var txtFormatProvider = new TxtFormatProvider();
                    string text = txtFormatProvider.Export(document); // TEXT
                    textBox2.Text = text;
                }

  5. JeffSM
    JeffSM avatar
    42 posts
    Member since:
    May 2014

    Posted 30 Jan in reply to Boby Link to this post

    using Telerik.WinForms.Documents.FormatProviders.OpenXml.Docx;
    using Telerik.WinForms.Documents.FormatProviders.Txt;

     

      private void RadButton1_Click(object sender, EventArgs e)
            {
                var docxFormatProvider = new DocxFormatProvider();
                using (var input = File.OpenRead(textBox1.Text))
                {
                    var document = docxFormatProvider.Import(input);
                    var txtFormatProvider = new TxtFormatProvider();
                    string text = txtFormatProvider.Export(document);
                    textBox2.Text = text;
                }

     

    }

  6. JeffSM
    JeffSM avatar
    42 posts
    Member since:
    May 2014

    Posted 30 Jan in reply to Boby Link to this post

    01.using Telerik.WinForms.Documents.FormatProviders.OpenXml.Docx;
    02.using Telerik.WinForms.Documents.FormatProviders.Txt;
    03. 
    04. private void RadButton1_Click(object sender, EventArgs e)
    05.        {
    06.            var docxFormatProvider = new DocxFormatProvider();
    07.            using (var input = File.OpenRead(textBox1.Text))
    08.            {
    09.                var document = docxFormatProvider.Import(input);
    10.                var txtFormatProvider = new TxtFormatProvider();
    11.                string text = txtFormatProvider.Export(document);
    12.                textBox2.Text = text;
    13.            }
    14. 
    15.        }
  7. Tanya
    Admin
    Tanya avatar
    857 posts

    Posted 31 Jan Link to this post

    Hello Jefferson,

    The namespaces you have included are related to the RadRichTextEditor control from the UI for WinForms suite. To use the objects from RadWordsProcessing, you would need to use the namespaces starting with Telerik.Windows.Documents.Flow:
    Telerik.Windows.Documents.Flow.FormatProviders.Docx.DocxFormatProvider docxFormatProvider = new Telerik.Windows.Documents.Flow.FormatProviders.Docx.DocxFormatProvider();
    Telerik.Windows.Documents.Flow.Model.RadFlowDocument document = docxFormatProvider.Import(str);
    Telerik.Windows.Documents.Flow.FormatProviders.Txt.TxtFormatProvider txtFormatProvider = new Telerik.Windows.Documents.Flow.FormatProviders.Txt.TxtFormatProvider();
    string text = txtFormatProvider.Export(document);

    The str variable represents the stream of the file that should be passed to the Import() method of DocxFormatProvider so the class can read the contents and import the file. You can find more information about the different format providers and how to use them in the Formats and Conversion section in our documentation.

    Hope this helps.

    Regards,
    Tanya
    Progress Telerik
    Get quickly onboarded and successful with your Telerik and/or Kendo UI products with the Virtual Classroom free technical training, available to all active customers. Learn More.
Back to Top