Delete everything between two strings

1 Answer 113 Views
WordsProcessing
Dean
Top achievements
Rank 1
Iron
Iron
Veteran
Dean asked on 09 Jun 2023, 04:14 PM

Hi.

I'd like to be able to put certain strings in a Word document, and then programmatically have everything between those strings removed.  The following works, when STARTREPLACE and ENDREPLACE are on the same line.

RadFlowDocument document = provider.Import(inputFileStream);

RadFlowDocumentEditor editor = new RadFlowDocumentEditor(document);

editor.ReplaceText(new Regex("(?<=STARTREPLACE)(.*)(?=ENDREPLACE)", RegexOptions.Singleline), string.Empty);

However when STARTREPLACE and ENDREPLACE are not on the same line, nothing happens.

Anyone have any idea how I can make this work over multiple lines - or if there's another approach which would be better?

Thanks in advance!

Dean

1 Answer, 1 is accepted

Sort by
0
Yoan
Telerik team
answered on 13 Jun 2023, 12:11 PM

Hi Dean,

The RadFlowDocumentEditor of the WordsProcessing library exposes a DeleteContent method which receives two existing elements in the document and removes everything between them (or them included). I believe that approach will best fit your scenario of deleting all content between two strings so I have created a sample project that demonstrates this functionality and attached it for your disposal. Feel free to modify and use it as you prefer and don't hesitate to ask questions if you have any.

Hope this helps.

Regards,
Yoan
Progress Telerik

Love the Telerik and Kendo UI products and believe more people should try them? Invite a fellow developer to become a Progress customer and each of you can get a $50 Amazon gift voucher.

Dean
Top achievements
Rank 1
Iron
Iron
Veteran
commented on 14 Jun 2023, 12:05 PM

Thank you so much for this!  Absolutely phenomenal, I'm very grateful.

I guess the final thing is that as I will be using a pre-existing template document rather than generating one in code, I'll have to find the "run" object for the start and end tags.  Playing around I can see there's a FindAll method but that returns a collection of FindResult objects rather than "runs".  Each FindResult seems to have one Run associated with it though, so it seems like the following should work?

var startTags = editor.FindAll("DeleteFromHere");
var endTags = editor.FindAll("DeleteToHere");

if (startTags.Count != endTags.Count) throw new Exception("Mismatching number of start and end tags");

int count = 0;
while(count < startTags.Count)
{
    editor.DeleteContent(startTags[count].Runs[0], endTags[count].Runs[0], true);
    count++;
}

It's working for me at the moment, but is there any situation where FindAll could return a FindResult with more than one Run?

Thanks again - so much help.

Dean

 

Yoan
Telerik team
commented on 16 Jun 2023, 10:13 AM

Hello Dean,

It is possible that FindResult might contain more than one run or a match can be split into two runs instead of one, which can lead to unexpected results.

An alternative approach I can offer you is utilizing the EnumerateChildrenOfType method of a RadFlowDocument which can be used to recursively traverse the document tree and return all children of a given type (in this case 'Run').

After obtaining a collection of all the runs in the document, they can be filtered by various properties (in this case 'Text').

Example:

var runs = document.EnumerateChildrenOfType<Run>();

var startTag = runs.FirstOrDefault(s => s.Text == "DeleteFromHere");
var endTag = runs.FirstOrDefault(s => s.Text == "DeleteToHere");

editor.DeleteContent(startTag,endTag, false);

If you have any other questions, we are at your disposal.

Regards,

Yoan

Tags
WordsProcessing
Asked by
Dean
Top achievements
Rank 1
Iron
Iron
Veteran
Answers by
Yoan
Telerik team
Share this question
or