I am trying to convert a HTML body to a PDF using HtmlFormatProvider and PdfFormatProvider.
It works well when I try to create a pdf with "normal" characters, but when I use characters like "åäö" the characters is either missing or replaced with other character.
I have seen a similiar issue that recommended to load in the fonts, but that did not solve the issue for me.
I have added the code in a zip as an attchment to this issue.
Thanks in advance.
5 Answers, 1 is accepted
Hello Linus,
I have saved the HTML and the characters are not displayed there as well. Are these characters from a specific culture. Do they require a specific font? Are they displayed ok in a browser on your side? How was this HTML created?
Please note that if you are using a NET Framework application there is no need to use the assemblies for NET Standard. You can see the difference here: NuGet Packages.
I am looking forward to your reply.
Regards,
Dimitar
Progress Telerik
Virtual Classroom, the free self-paced technical training that gets you up to speed with Telerik and Kendo UI products quickly just got a fresh new look + new and improved content including a brand new Blazor course! Check it out at https://learn.telerik.com/.
Hi Dimitar,
I can see the characters in the html in the browser, I will add a printscreen to this comment to show it. They are nordic characters(used in Sweden at least). The HTML is created from a email that is received trough mail kit that we are trying to convert to PDF.
We are in the progress to migrate to .Net standard and would like to use those packages if possible.
The exampel characters are "åäöÅÄÖ".
Thanks in advance.
Hi Linus,
I've tested this by pasting the chapters in a new empty HTML file and this works on my side. I have attached an updated version of your test project. Cous please test this with it and let me know what are the results?
I am looking forward to your reply.
Hi Dimitar,
I tried with your project and the PDF did not have the letters "åäöÅÄÖ" it only had Test in it.
The docx file had the full string but not the PDF, but did the pdf work on your side? Because I am not really sure what is causing the issue.
The HTML that we are getting are from a email so the formatting on the html are coming from that. I have added the result from when I ran your project locally.
Best regards
Linus
Hi Linus,
Thank you for sharing the results. In this case, is clear that the font is missing for some reason. The pdf uses the Helvetica font when it cannot find the proper fonts. This should work if you import a proper font. You can register a single font with the following approach: RadPdfProcessing manually register a font.
I was able to determine what is wrong with the original HTML as well. You have encountered a known issue with the HTML format prvider. The issue exists when the encoding is set in the HTML metadata. You can track its progress, subscribe to status changes, and add your comment to it here: WordsProcessing: HtmlFormatProvider: Automatically detect the encoding instead of relying on the one set in the HTML.
As a workaround, I can only suggest manually removing the preset charset from the HTML when converting.
I hope this will be useful. Should you have further questions, I would be glad to help.
Regards,
Dimitar
Progress Telerik
Love the Telerik and Kendo UI products and believe more people should try them? Invite a fellow developer to become a Progress customer and each of you can get a $50 Amazon gift voucher.
Hi again Dimitar,
I tried to remove the ISO-8859-1 and just leave it empty but that still didn't solve the issue for me. I also tried to replace it with UTF-8 and that also didn't work. Do you have any other suggestions?
As for your register the font that seem to be a solution when you are creating a new PDF from scratch, I really just want to convert HTML to PDF in the best way possible and without a editor I don't think that I can set the font. Or do you a demo on how to set the font when you are converting HTML to PDF. I tried to search in your documentation but could not find anything similiar to this.
Also I think that these characters should exist inside Helvetica since I can use it in other programs.
Regards Linus
Hello Linus,
I have prepared a small project (Net Standart). Could you test this with it on your side and see if this works.
I am looking forward to your reply.
Regards,
Dimitar
Progress Telerik
Virtual Classroom, the free self-paced technical training that gets you up to speed with Telerik and Kendo UI products quickly just got a fresh new look + new and improved content including a brand new Blazor course! Check it out at https://learn.telerik.com/.