This is a migrated thread and some comments may be shown as answers.

Get text from DOCX

8 Answers 919 Views
WordsProcessing
This is a migrated thread and some comments may be shown as answers.
Manjinder
Top achievements
Rank 1
Manjinder asked on 20 Jul 2017, 12:43 PM

Are you able to open a DOCX and return the text from the document into a string. I need to be able to parse through the string to determine text replacements to be made. I can see methods for replacing text, but not for determining what text a document contains

Regards

8 Answers, 1 is accepted

Sort by
0
Boby
Telerik team
answered on 24 Jul 2017, 12:12 PM
Hi Manjinder,

To extract the text, you can import the document using the DocxFormatProvider and export it as plain text using the TextFormatProvider:
DocxFormatProvider docxFormatProvider = new DocxFormatProvider();
RadFlowDocument document = docxFormatProvider.Import(str);
TxtFormatProvider txtFormatProvider = new TxtFormatProvider();
string text = txtFormatProvider.Export(document);


Regards,
Boby
Progress Telerik

0
JeffSM
Top achievements
Rank 2
Iron
Veteran
Iron
answered on 30 Jan 2019, 03:19 PM
A better code please, what is the usings that I need? What is str? In 2019 R1 don't works.
0
JeffSM
Top achievements
Rank 2
Iron
Veteran
Iron
answered on 30 Jan 2019, 03:30 PM

To help some one else:

using Telerik.WinForms.Documents.FormatProviders.OpenXml.Docx;
using Telerik.WinForms.Documents.FormatProviders.Txt;

 var docxFormatProvider = new DocxFormatProvider();
            using (var input = File.OpenRead(textBox1.Text)) // Full file with path
            {
                var document = docxFormatProvider.Import(input);
                var txtFormatProvider = new TxtFormatProvider();
                string text = txtFormatProvider.Export(document); // TEXT
                textBox2.Text = text;
            }

0
JeffSM
Top achievements
Rank 2
Iron
Veteran
Iron
answered on 30 Jan 2019, 03:31 PM

using Telerik.WinForms.Documents.FormatProviders.OpenXml.Docx;
using Telerik.WinForms.Documents.FormatProviders.Txt;

 

  private void RadButton1_Click(object sender, EventArgs e)
        {
            var docxFormatProvider = new DocxFormatProvider();
            using (var input = File.OpenRead(textBox1.Text))
            {
                var document = docxFormatProvider.Import(input);
                var txtFormatProvider = new TxtFormatProvider();
                string text = txtFormatProvider.Export(document);
                textBox2.Text = text;
            }

 

}

0
JeffSM
Top achievements
Rank 2
Iron
Veteran
Iron
answered on 30 Jan 2019, 03:32 PM
01.using Telerik.WinForms.Documents.FormatProviders.OpenXml.Docx;
02.using Telerik.WinForms.Documents.FormatProviders.Txt;
03. 
04. private void RadButton1_Click(object sender, EventArgs e)
05.        {
06.            var docxFormatProvider = new DocxFormatProvider();
07.            using (var input = File.OpenRead(textBox1.Text))
08.            {
09.                var document = docxFormatProvider.Import(input);
10.                var txtFormatProvider = new TxtFormatProvider();
11.                string text = txtFormatProvider.Export(document);
12.                textBox2.Text = text;
13.            }
14. 
15.        }
0
Tanya
Telerik team
answered on 31 Jan 2019, 01:28 PM
Hello Jefferson,

The namespaces you have included are related to the RadRichTextEditor control from the UI for WinForms suite. To use the objects from RadWordsProcessing, you would need to use the namespaces starting with Telerik.Windows.Documents.Flow:
Telerik.Windows.Documents.Flow.FormatProviders.Docx.DocxFormatProvider docxFormatProvider = new Telerik.Windows.Documents.Flow.FormatProviders.Docx.DocxFormatProvider();
Telerik.Windows.Documents.Flow.Model.RadFlowDocument document = docxFormatProvider.Import(str);
Telerik.Windows.Documents.Flow.FormatProviders.Txt.TxtFormatProvider txtFormatProvider = new Telerik.Windows.Documents.Flow.FormatProviders.Txt.TxtFormatProvider();
string text = txtFormatProvider.Export(document);

The str variable represents the stream of the file that should be passed to the Import() method of DocxFormatProvider so the class can read the contents and import the file. You can find more information about the different format providers and how to use them in the Formats and Conversion section in our documentation.

Hope this helps.

Regards,
Tanya
Progress Telerik
Get quickly and successful with your Telerik and/or Kendo UI products with the Virtual Classroom free technical training, available to all active customers. Learn More.
0
ajay
Top achievements
Rank 1
answered on 29 Jun 2020, 04:34 PM
I have already used the DocxFormatProvider for .docx extension and its working fine. But in case of .doc extension its throwing the exception some thing like "Central directory header is broken" .
0
Tanya
Telerik team
answered on 30 Jun 2020, 06:47 AM

Hi Ajay,

The .doc format is currently not supported and we have logged request about it on our public portal. Make sure to cast your vote for the implementation of this functionality and subscribe to the item if you would like to receive updates about status changes on it: ADD. RadRichTextEditor - add support for importing .doc documents.

Regards,
Tanya
Progress Telerik

Progress is here for your business, like always. Read more about the measures we are taking to ensure business continuity and help fight the COVID-19 pandemic.
Our thoughts here at Progress are with those affected by the outbreak.
Tags
WordsProcessing
Asked by
Manjinder
Top achievements
Rank 1
Answers by
Boby
Telerik team
JeffSM
Top achievements
Rank 2
Iron
Veteran
Iron
Tanya
Telerik team
ajay
Top achievements
Rank 1
Share this question
or