Are you able to open a DOCX and return the text from the document into a string. I need to be able to parse through the string to determine text replacements to be made. I can see methods for replacing text, but not for determining what text a document contains
Regards
8 Answers, 1 is accepted
To extract the text, you can import the document using the DocxFormatProvider and export it as plain text using the TextFormatProvider:
DocxFormatProvider docxFormatProvider =
new
DocxFormatProvider();
RadFlowDocument document = docxFormatProvider.Import(str);
TxtFormatProvider txtFormatProvider =
new
TxtFormatProvider();
string
text = txtFormatProvider.Export(document);
Regards,
Boby
Progress Telerik


To help some one else:
using Telerik.WinForms.Documents.FormatProviders.OpenXml.Docx;
using Telerik.WinForms.Documents.FormatProviders.Txt;
var docxFormatProvider = new DocxFormatProvider();
using (var input = File.OpenRead(textBox1.Text)) // Full file with path
{
var document = docxFormatProvider.Import(input);
var txtFormatProvider = new TxtFormatProvider();
string text = txtFormatProvider.Export(document); // TEXT
textBox2.Text = text;
}

using Telerik.WinForms.Documents.FormatProviders.OpenXml.Docx;
using Telerik.WinForms.Documents.FormatProviders.Txt;
private void RadButton1_Click(object sender, EventArgs e)
{
var docxFormatProvider = new DocxFormatProvider();
using (var input = File.OpenRead(textBox1.Text))
{
var document = docxFormatProvider.Import(input);
var txtFormatProvider = new TxtFormatProvider();
string text = txtFormatProvider.Export(document);
textBox2.Text = text;
}
}

01.
using
Telerik.WinForms.Documents.FormatProviders.OpenXml.Docx;
02.
using
Telerik.WinForms.Documents.FormatProviders.Txt;
03.
04.
private
void
RadButton1_Click(
object
sender, EventArgs e)
05.
{
06.
var docxFormatProvider =
new
DocxFormatProvider();
07.
using
(var input = File.OpenRead(textBox1.Text))
08.
{
09.
var document = docxFormatProvider.Import(input);
10.
var txtFormatProvider =
new
TxtFormatProvider();
11.
string
text = txtFormatProvider.Export(document);
12.
textBox2.Text = text;
13.
}
14.
15.
}
The namespaces you have included are related to the RadRichTextEditor control from the UI for WinForms suite. To use the objects from RadWordsProcessing, you would need to use the namespaces starting with Telerik.Windows.Documents.Flow:
Telerik.Windows.Documents.Flow.FormatProviders.Docx.DocxFormatProvider docxFormatProvider =
new
Telerik.Windows.Documents.Flow.FormatProviders.Docx.DocxFormatProvider();
Telerik.Windows.Documents.Flow.Model.RadFlowDocument document = docxFormatProvider.Import(str);
Telerik.Windows.Documents.Flow.FormatProviders.Txt.TxtFormatProvider txtFormatProvider =
new
Telerik.Windows.Documents.Flow.FormatProviders.Txt.TxtFormatProvider();
string
text = txtFormatProvider.Export(document);
The str variable represents the stream of the file that should be passed to the Import() method of DocxFormatProvider so the class can read the contents and import the file. You can find more information about the different format providers and how to use them in the Formats and Conversion section in our documentation.
Hope this helps.
Regards,
Tanya
Progress Telerik
Hi Ajay,
The .doc format is currently not supported and we have logged request about it on our public portal. Make sure to cast your vote for the implementation of this functionality and subscribe to the item if you would like to receive updates about status changes on it: ADD. RadRichTextEditor - add support for importing .doc documents.
Regards,
Tanya
Progress Telerik
Our thoughts here at Progress are with those affected by the outbreak.