Why on converted from docx to pdf ignore empty

1 Answer 288 Views

PdfProcessing WordsProcessing

Александр asked on 02 Jun 2022, 04:57 AM

Screenshot-55.png

Screenshot-56.png

Screenshot-54.png

Hello! I'm doing a conversion and ran into a problem. Export to pdf removes blank lines. I need them to stay.

Please help. Thanks in advance

this is my code

var htmlProvider = new DocxFormatProvider();
var document = htmlProvider.Import(data);
var pdfProvider = new PdfFormatProvider();
pdfProvider.ExportSettings.ShouldEmbedFonts = false;
return pdfProvider.Export(document);

link to files

1 Answer, 1 is accepted

answered on 06 Jun 2022, 02:28 PM

Hello, Alexander,

Svilen here, I will be glad to help out with this question.

I was able to reproduce the issue you mentioned. We are aware of it and have a bug report for it, which is represented by this public feedback item - WordsProcessing: Paragraph containing a single Run with an empty string is not exported to PDF. You can track its progress, subscribe to status changes, and add your comment to it using the link. Upon completion of the task, we will notify all subscribers of the public feedback item of the news.

The current workaround I can offer is to delete all empty runs from the RadFlowDocument before exporting to PDF. The following code snippet imports a .docx file removes all empty runs, and exports it to a .pdf file:

DocxFormatProvider docxFormatProvider = new DocxFormatProvider();
byte[] bytes = File.ReadAllBytes(@"yourFilePath/Test.docx");
RadFlowDocument document = docxFormatProvider.Import(bytes);

var paragraphs = document.EnumerateChildrenOfType<Paragraph>().ToList();

foreach (var paragraph in paragraphs)
{
    foreach (var inline in paragraph.Inlines.ToList())
    {
        var run = inline as Run;
        if (run != null && string.IsNullOrEmpty(run.Text))
        {
            paragraph.Inlines.Remove(run);
        }
    }
}

using (Stream output = new FileStream(@"yourFilePath/Test.pdf", FileMode.OpenOrCreate))
{
    pdfProvider.Export(document, output);
}

Please let me know if I can further help here.

Regards,
Svilen
Progress Telerik

Love the Telerik and Kendo UI products and believe more people should try them? Invite a fellow developer to become a Progress customer and each of you can get a $50 Amazon gift voucher.

Александр

commented on 06 Jun 2022, 06:52 PM

Hello Svilen. Thanks for your reply. I'm afraid I was misunderstood. My task is to save empty lines and get them in PDF, because there is a sample (file in Word) and the same should turn out in PDF

Svilen

commented on 07 Jun 2022, 11:25 AM

Workaround-Project.zip

Hey, Alexander,

Apologies for not elaborating on how the issue I shared is related to your case.

In the OpenXML "document" file in its XML structure, the empty paragraphs contain a specific OpenXML element called a "run", which is empty. When opening the .docx file in MS Word, there is no way to see these runs but they are the cause for empty paragraphs not being exported to PDF. By removing the empty runs, we will preserve the "empty" paragraphs and our API will be able to properly export the new lines to the PDF file.

Since I noticed your reply to the public item, please note that when exporting to PDF, the file is first loaded in memory, before this fix is applied. This does not affect the original file and allows for the exported PDF to properly show the empty lines. I've attached a small sample project, which does this so you can confirm you are satisfied with the end result.

Please let me know if anything is amiss.

Regards,
Svilen
Progress Telerik

Александр

commented on 08 Jun 2022, 06:32 PM

Hi, Svilen! Your answer helped me a lot to solve my problem. Thank you so much for the quick response and detailed explanation.

Svilen

commented on 09 Jun 2022, 07:29 AM

Hey, Alexander,

Glad to hear my input was helpful.

Please let me know if there is anything else I can assist with.

Regards,
Svilen
Progress Telerik

Upscale

commented on 06 Aug 2022, 12:09 PM

I have been working on my website UpscaleValley.com where I write blog too. When I tried to export the content to the website, I wasn't able to paste it anywhere. Meanwhile, the content wasn't present of the real page too. IS something wrong with my browser or my Telerik is not working. I need assistance. Thank you.

Dimitar

commented on 08 Aug 2022, 12:43 PM

Hi,

I am not sure why you are not getting the desired results. Can you share more information about your requirement? For example, can you share the code that you're using in the case and some screenshots of the expected and the actual results?

Thank you in advance for your patience and cooperation.