This is a migrated thread and some comments may be shown as answers.

how to rebuild a page

11 Answers 31 Views
PdfProcessing
This is a migrated thread and some comments may be shown as answers.
Charles
Top achievements
Rank 1
Charles asked on 23 Mar 2021, 12:12 AM

Hello, I have a project adopted from the code in the ManipulatePages example project from Telerik, the FitAndPositionMultiplePagesOnSinglePage routine to be precise.

Except instead of taking four pages and putting them on one, I'm taking two 5.5x8.5 pages and putting them on a new 11x8x5 page, left to right.

It's working great, when the input PDF is properly formatted.

I have a handful of PDFs that were created in some unknown way that appear to be malformed.  When they are fed through my project, the odd pages never make it to the right side of the output.  It looks like, among other things, the CropBox size isn't the same as the MediaBox size on the source doc, which I suspect is a source (maybe not THE source) of the problem.

It seems I should be able to read the content of the pages on the source doc and insert them into a new page on the output doc, instead of just copying the page from PdfFileSource.Pages[] to essentially rebuild the page, instead of copying the page?

However, I'm at a loss as to how to read the source page, as PdfFileSource.Pages don't seem to expose the content?

Help is appreciated! :)

Thanks,

Charles

11 Answers, 1 is accepted

Sort by
0
Accepted
Dimitar
Telerik team
answered on 23 Mar 2021, 12:05 PM

Hi Charles,

Yes, this is expected, the content is read dynamically and is not loaded into the memory. This is why you cannot access it.

To access the content in code you have to import the file to a RadFixedDocument. Here is an example of this:

var provider = new PdfFormatProvider();
var document = provider.Import(File.ReadAllBytes(@"..\..\SampleDoc.pdf"));

foreach (var page in document.Pages)
{
    foreach (var item in page.Content)
    {
        Console.WriteLine(item);
    }
}

After this is done you can create a new page with the desired content and pass it to the PdfStreamWriter: 

using (PdfStreamWriter fileWriter = new PdfStreamWriter(File.OpenWrite(@"..\..\result.pdf")))
{
    RadFixedPage newPage = new RadFixedPage();
    var position = new SimplePosition();
    position.Translate(100, 100);
    newPage.Content.Add(new TextFragment("TextFragment") { Position = position });

    fileWriter.WritePage(newPage);
}

I hope this helps. Should you have any other questions do not hesitate to ask.

Regards,
Dimitar
Progress Telerik

Love the Telerik and Kendo UI products and believe more people should try them? Invite a fellow developer to become a Progress customer and each of you can get a $50 Amazon gift voucher.

0
Charles
Top achievements
Rank 1
answered on 23 Mar 2021, 09:21 PM

Thanks, this is very helpful.

Apparently the PdfFormatProvider has strict rules on the format of the imported PDF, as the Import call fails on my problematic Pdf:

'StartXRef keyword cannot be found.'

Charles

0
Charles
Top achievements
Rank 1
answered on 23 Mar 2021, 09:34 PM

I get the same error on other PDFs as well, including those that I don't otherwise have problems with in the above stated two pages into one processing.

Charles

0
Dimitar
Telerik team
answered on 24 Mar 2021, 09:19 AM

Hello Charles,

In order to further investigate what is causing this error, I need a specific file. This way I will be able to determine if the file is invalid and we can handle this case so the file is imported correctly. Since this is a forum thread and is public I will suggest opening a new ticket (which is a private thread) and attaching one of the files that cause this issue.

Thank you in advance for your patience and cooperation.  

Regards,
Dimitar
Progress Telerik

Love the Telerik and Kendo UI products and believe more people should try them? Invite a fellow developer to become a Progress customer and each of you can get a $50 Amazon gift voucher.

0
Charles
Top achievements
Rank 1
answered on 24 Mar 2021, 01:54 PM

I understand.  The challenge is every PDF viewer I've used renders them fine.  It's just your app that says it's invalid.

I'm going to try having the PDFs recreated, and only as last resort will I open a ticket.

Thanks.

0
Dimitar
Telerik team
answered on 25 Mar 2021, 06:58 AM

Hi Charles,

Yes, some invalid scenarios are handled by the PDF viewers. We are trying to handles such cases as well, but there are still documents that cannot be loaded. Providing the document will allow me to determine the exact cause and log it for improvement in our feedback portal. 

Let me know if I can assist you further.

Regards,
Dimitar
Progress Telerik

Virtual Classroom, the free self-paced technical training that gets you up to speed with Telerik and Kendo UI products quickly just got a fresh new look + new and improved content including a brand new Blazor course! Check it out at https://learn.telerik.com/.

0
Charles
Top achievements
Rank 1
answered on 25 Mar 2021, 04:41 PM

Hello, I'm sorry, I just double checked my work and found I was using the wrong input stream.  The above code works just fine with loading my PDFs, included those that are malformed, when I use the correct input stream. :)

My apologies!

0
Charles
Top achievements
Rank 1
answered on 25 Mar 2021, 06:50 PM

I haven't had any success so far in recreating a page.

While I can access the content now from page.Content (all TextFragments), replicating or transferring those TextFragments to a new page isn't working.

The TextFragment can't be added to a new page because the Parent property is already defined.

I tried creating a new TextFragment with all the same properties as the source item and adding that to the new page, but the output was mangled.  (There wasn't an easy way to "clone" a TextFragment, as far as I could tell..?)

When I say mangled, everything was in the right place, the words are there, but there are special characters (E with a tilde above it) all over the place).

At the moment I don't think I need a way to do this, but it could come in handy in the future if you can quickly show how it's done!

Thanks.

0
Accepted
Dimitar
Telerik team
answered on 29 Mar 2021, 08:40 AM

Hi Charles,

There is an internal clone method that can be used for such cases. You can get it with reflection. Here is an example of this: 

var provider = new PdfFormatProvider();
var document = provider.Import(File.ReadAllBytes(@"..\..\SampleDoc.pdf"));

RadFixedDocument newDocument = new RadFixedDocument();
var cloneMethod = typeof(TextFragment).GetMethod("CreateClonedInstance", System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.Instance);

foreach (var page in document.Pages)
{
    var newPage = newDocument.Pages.AddPage();

    foreach (var item in page.Content)
    {
        var textFragment = item as TextFragment;
        if (textFragment != null)
        {
            var newFragment = cloneMethod.Invoke(textFragment, null) as TextFragment;
            newPage.Content.Add(newFragment);
        }

    }
}

var resultBytes = provider.Export(newDocument);
File.WriteAllBytes(@"..\..\result.pdf", resultBytes);

In addition, it seems that a font is missing and this is why you are getting invalid characters. If your document is using a specific embedded font that is not available on the operating system you need to manually register it. This is necessary if you are using the NET Standard version of the assemblies as well. 

Let me know if I can assist you further.

Regards,
Dimitar
Progress Telerik

Love the Telerik and Kendo UI products and believe more people should try them? Invite a fellow developer to become a Progress customer and each of you can get a $50 Amazon gift voucher.

0
Charles
Top achievements
Rank 1
answered on 29 Mar 2021, 03:39 PM

Thanks!  Maybe an enhancement you can change the modifier on that method from internal to public? :)

Charles

0
Dimitar
Telerik team
answered on 31 Mar 2021, 08:27 AM

Hello Charles,

I have forwarded your request to the team and they will consider it. 

Do not hesitate to contact us if you have other questions.

Regards,
Dimitar
Progress Telerik

Virtual Classroom, the free self-paced technical training that gets you up to speed with Telerik and Kendo UI products quickly just got a fresh new look + new and improved content including a brand new Blazor course! Check it out at https://learn.telerik.com/.

Tags
PdfProcessing
Asked by
Charles
Top achievements
Rank 1
Answers by
Dimitar
Telerik team
Charles
Top achievements
Rank 1
Share this question
or