Convert HTML to PDF with special characters

5 Answers 1760 Views
PdfProcessing
Linus
Top achievements
Rank 1
Iron
Linus asked on 22 Sep 2021, 09:19 AM | edited on 22 Sep 2021, 09:20 AM
Hi,

I am trying to convert a HTML body to a PDF using HtmlFormatProvider and PdfFormatProvider. 

It works well when I try to create a pdf with "normal" characters, but when I use characters like "åäö" the characters is either missing or replaced with other character. 

I have seen a similiar issue that recommended to load in the fonts, but that did not solve the issue for me. 

I have added the code in a zip as an attchment to this issue. 

 

Thanks in advance.

5 Answers, 1 is accepted

Sort by
0
Dimitar
Telerik team
answered on 22 Sep 2021, 10:34 AM

Hello Linus,

I have saved the HTML and the characters are not displayed there as well. Are these characters from a specific culture. Do they require a specific font? Are they displayed ok in a browser on your side? How was this HTML created?

Please note that if you are using a NET Framework application there is no need to use the assemblies for NET Standard. You can see the difference here: NuGet Packages.

I am looking forward to your reply.

Regards,
Dimitar
Progress Telerik

Virtual Classroom, the free self-paced technical training that gets you up to speed with Telerik and Kendo UI products quickly just got a fresh new look + new and improved content including a brand new Blazor course! Check it out at https://learn.telerik.com/.

Linus
Top achievements
Rank 1
Iron
commented on 22 Sep 2021, 10:51 AM

Hi Dimitar, 

I can see the characters in the html in the browser, I will add a printscreen to this comment to show it. They are nordic characters(used in Sweden at least). The HTML is created from a email that is received trough mail kit that we are trying to convert to PDF. 

We are in the progress to migrate to .Net standard and would like to use those packages if possible. 

The exampel characters are "åäöÅÄÖ".

Thanks in advance.

Dimitar
Telerik team
commented on 23 Sep 2021, 05:46 AM

Hi Linus, 

I've tested this by pasting the chapters in a new empty HTML file and this works on my side. I have attached an updated version of your test project. Cous please test this with it and let me know what are the results?

I am looking forward to your reply.

0
Linus
Top achievements
Rank 1
Iron
answered on 23 Sep 2021, 06:43 AM

Hi Dimitar, 

I tried with your project and the PDF did not have the letters "åäöÅÄÖ" it only had Test in it. 

The docx file had the full string but not the PDF, but did the pdf work on your side? Because I am not really sure what is causing the issue. 

The HTML that we are getting are from a email so the formatting on the html are coming from that. I have added the result from when I ran your project locally.

 

Best regards

Linus

0
Dimitar
Telerik team
answered on 23 Sep 2021, 08:43 AM

Hi Linus,

Thank you for sharing the results. In this case, is clear that the font is missing for some reason. The pdf uses the Helvetica font when it cannot find the proper fonts. This should work if you import a proper font. You can register a single font with the following approach: RadPdfProcessing manually register a font.

I was able to determine what is wrong with the original HTML as well. You have encountered a known issue with the HTML format prvider. The issue exists when the encoding is set in the HTML metadata. You can track its progress, subscribe to status changes, and add your comment to it here: WordsProcessing: HtmlFormatProvider: Automatically detect the encoding instead of relying on the one set in the HTML.

As a workaround, I can only suggest manually removing the preset charset from the HTML when converting. 

I hope this will be useful. Should you have further questions, I would be glad to help.

Regards,
Dimitar
Progress Telerik

Love the Telerik and Kendo UI products and believe more people should try them? Invite a fellow developer to become a Progress customer and each of you can get a $50 Amazon gift voucher.

0
Linus
Top achievements
Rank 1
Iron
answered on 23 Sep 2021, 01:34 PM

Hi again Dimitar,

I tried to remove the ISO-8859-1 and just leave it empty but that still didn't solve the issue for me. I also tried to replace it with UTF-8 and that also didn't work. Do you have any other suggestions? 

 

As for your register the font that seem to be a solution when you are creating a new PDF from scratch, I really just want to convert HTML to PDF in the best way possible and without a editor I don't think that I can set the font. Or do you a demo on how to set the font when you are converting HTML to PDF. I tried to search in your documentation but could not find anything similiar to this. 

Also I think that these characters should exist inside Helvetica since I can use it in other programs. 

Regards Linus 

0
Dimitar
Telerik team
answered on 24 Sep 2021, 06:45 AM

Hello Linus,

I have prepared a small project (Net Standart). Could you test this with it on your side and see if this works. 

I am looking forward to your reply.

Regards,
Dimitar
Progress Telerik

Virtual Classroom, the free self-paced technical training that gets you up to speed with Telerik and Kendo UI products quickly just got a fresh new look + new and improved content including a brand new Blazor course! Check it out at https://learn.telerik.com/.

Tags
PdfProcessing
Asked by
Linus
Top achievements
Rank 1
Iron
Answers by
Dimitar
Telerik team
Linus
Top achievements
Rank 1
Iron
Share this question
or