This is a migrated thread and some comments may be shown as answers.

Convert Docx to HTML, modify online and convert bak to Docx

4 Answers 898 Views
WordsProcessing
This is a migrated thread and some comments may be shown as answers.
Pierre Yves
Top achievements
Rank 1
Pierre Yves asked on 13 Jul 2016, 08:27 PM

Hi,

as a proof of concept, I want to load a Docx in a Kendo Editor and save it back. So far, I'm able to

  1. load a Docx, convert it to HTML
  2. then using Telerik ASP.Net MVC Editor I'm able to modify the HTML
  3. Save it back to a new Docx document

The issues I have is that the new docx don't have the same margins as the original document and it also takes 4 pages instead of 2. You can take a look at the screenshots attached (before/after).

 

Is there a way to configure the styles and the page setup ?

 

here is the code that I use. Basically, I have a Kendo Editor in my view that loads the HTML and a button to save back the data

 

[HttpPost]
public ActionResult Editor(string editor)
{
    // HTML decode the value before using it.
    var notes = System.Web.HttpUtility.HtmlDecode(editor);
      
    var htmlProvider = new HtmlFormatProvider();
    RadFlowDocument document = htmlProvider.Import(notes);
 
    SaveDocx(document);
     
    return View();
}
 
public ActionResult Editor()
{
    RadFlowDocument sourceDoc = GetSourceDocument();
    SaveDocx(sourceDoc);
    var myHtmlContent = GetHtmlFromDocument(sourceDoc);
 
    ViewBag.MyHtmlContent = myHtmlContent;
 
    return View();
}
 
private RadFlowDocument GetSourceDocument()
{
    var docxProvider = new DocxFormatProvider();
    var stream = new MemoryStream(System.IO.File.ReadAllBytes(@"c:\Temp\TelerikMvcApp1\TelerikMvcApp1\Fiche.docx"));
    RadFlowDocument document;
 
    using (Stream input = System.IO.File.OpenRead(@"c:\Temp\TelerikMvcApp1\TelerikMvcApp1\Fiche.docx"))
    {
        document = docxProvider.Import(stream);
    }
 
    return document;
}
 
private string GetHtmlFromDocument(RadFlowDocument document)
{
    var myHtmlContent = "";
    using (MemoryStream ms = new MemoryStream())
    {
        HtmlFormatProvider htmlProvider = new HtmlFormatProvider();
        myHtmlContent = htmlProvider.Export(document);
    }
 
    return myHtmlContent;
}
 
private void SaveDocx(RadFlowDocument document)
{
    using (Stream output = System.IO.File.OpenWrite(String.Format(@"c:\Temp\TelerikMvcApp1\TelerikMvcApp1\Fiche{0}.docx", DateTime.Now.Ticks)))
    {
        DocxFormatProvider docxProvider = new DocxFormatProvider();
        docxProvider.Export(document, output);
    }
}

 

thanks

4 Answers, 1 is accepted

Sort by
0
Boby
Telerik team
answered on 15 Jul 2016, 08:45 AM
Hello Pierre,

By the way, this is a planned feature for the Kendo Editor, so you can follow the issue on GitHub here:
Import from RTF and DOCX along with Export to RTF, DOCX and PDF
or the UserVoice item here:
Export Kendo UI Editor content to PDF and word file

Otherwise, I suppose that the document you are you are importing has some features which are unsupported when importing from DOCX,  or when subsequently importing from HTML and exporting to DOCX. Could you please open a support ticket and send us the problematic document, as well as the exact version of the editor and the Document Processing binaries, so that we can investigate the issue further?

Regards,
Boby
Telerik by Progress

0
Pierre Yves
Top achievements
Rank 1
answered on 15 Jul 2016, 12:46 PM

Hi Boby, 

thanks for your answer. Since we havent purchased a licence yet, I'm not able to create a support ticket.

 

for the binaries, I used the following versions from the trial based on UI for ASP.NET MVC Q2 2016 SP1

Telerik.Windows.Documents.Core.dll 2016.2.421.45

Telerik.Windows.Documents.Flow.dll 2016.2.421.45

Kendo.Mvc 2016.2.607.545

 

You can get my docx file here : https://drive.google.com/open?id=0Bxxu6yivT-LMNXNoTDdQbUJEdG8

0
Boby
Telerik team
answered on 18 Jul 2016, 11:12 AM
Hello Pierre Yves,

I am seeing some differences, but not exactly as the ones from the screenshot. Please find attached the document produced without any customizations of the editor produced document:
- Hidden text is not supported
- Section margins are not preserved on export to HTML
- Cell paddings in the converted documents are set as local properties, instead of inherited from the whole table.

Note that such differences are sometimes expected. Our document model resembles the one of MS Word, and it supports more properties than HTML, sometimes there are more complex differences between the models (e.g. styles and styles inheritance) - so there isn't always possible for a document to preserve it original look after HTML export - HTML import roundtrip.


Regards,
Boby
Telerik by Progress

0
Pierre Yves
Top achievements
Rank 1
answered on 19 Jul 2016, 12:23 PM

Thanks, 

I will take a look at this.

Tags
WordsProcessing
Asked by
Pierre Yves
Top achievements
Rank 1
Answers by
Boby
Telerik team
Pierre Yves
Top achievements
Rank 1
Share this question
or