Hi,
I have an issue with converting between HTML and DOCX and vice versa. What happens is that I export an HTML unordered list to docx and then back from docx to HTML, using DocxFormatProvider and HtmlFormatProvider. The condensed code below illustrates my problem.
Version of Telerik.Windows.Document.* libraries is 2017.2.428.40, version of DocumentFormat.OpenXml is 2.5.5631.0.
var html =
"<ul><li>1</li><li>2</li></ul>"
;
Console.WriteLine(
"Original HTML: "
+html);
var docxFormatProvider =
new
DocxFormatProvider();
var htmlFormatProvider =
new
HtmlFormatProvider();
var document = htmlFormatProvider.Import(html);
var bytes = docxFormatProvider.Export(document);
document = docxFormatProvider.Import(bytes);
htmlFormatProvider.ExportSettings.DocumentExportLevel = DocumentExportLevel.Fragment;
htmlFormatProvider.ExportSettings.StylesExportMode = StylesExportMode.None;
htmlFormatProvider.ExportSettings.IndentDocument =
false
;
html = htmlFormatProvider.Export(document);
Console.WriteLine(
"New HTML: "
+html);
Console.ReadKey();
The console output is:
Original HTML: <
ul
><
li
>1</
li
><
li
>2</
li
></
ul
>
New HTML: <
body
><
ul
style
=
"list-style-type: disc;"
><
li
style
=
"font-family: Symbol;"
value
=
"1"
><
span
style
=
"font-family: Times New Roman;"
>1</
span
></
li
><
li
style
=
"font-family: Symbol;"
value
=
"2"
><
span
style
=
"font-family: Times New Roman;"
>2</
span
></
li
></
ul
></
body
>
The ooxml generated when exporting the HTML to docx is:
<
w:document
xmlns:r
=
"http://schemas.openxmlformats.org/officeDocument/2006/relationships"
xmlns:w
=
"http://schemas.openxmlformats.org/wordprocessingml/2006/main"
>
<
w:body
>
<
w:p
>
<
w:pPr
>
<
w:pStyle
w:val
=
"NormalWeb"
/>
<
w:numPr
>
<
w:ilvl
w:val
=
"0"
/>
<
w:numId
w:val
=
"1"
/>
</
w:numPr
>
<
w:rPr
/>
</
w:pPr
>
<
w:r
>
<
w:rPr
/>
<
w:t
>1</
w:t
>
</
w:r
>
</
w:p
>
<
w:p
>
<
w:pPr
>
<
w:pStyle
w:val
=
"NormalWeb"
/>
<
w:numPr
>
<
w:ilvl
w:val
=
"0"
/>
<
w:numId
w:val
=
"1"
/>
</
w:numPr
>
<
w:rPr
/>
</
w:pPr
>
<
w:r
>
<
w:rPr
/>
<
w:t
>2</
w:t
>
</
w:r
>
</
w:p
>
<
w:sectPr
/>
</
w:body
>
</
w:document
>
What puzzles me is the style definitions on the exported html, for instance font-family: Symbol; on the list elements. We're using the JavaScript API for Office to inject the exported HTML into content controls in Word documents. The example HTML here injects a bullet list in the content control, but if the user adds new bullets to the list in word the font is set to Symbol. Also, I don't understand why spans with font-family Times New Roman is added, this is causing some line spacing issues and we are using Arial as standard.
Does anyone have some input on this? Thanks.
Best regards,
Geir Morten Hagen