An innovative approach for fast, memory efficient and reliable document processing of PDF files—part of the latest Document Processing Library in R1 2017.
Have you ever been required to work with large amounts of PDF files? Have you had the task of processing the pages of these files and either merge, split or add new pages, or add content to the existing pages? Last, but not least, as these files may contain sensitive clients' data, have you needed to guarantee the reliability of your application, ensuring that all page content is preserved unmodified? Well, then you should definitely look into our R1 2017 release, as we've just unveiled our best tool ever for handling these document processing scenarios!
The new PDF Stream Writer functionality (available for UI for ASP.NET AJAX and MVC, as well as UI for WPF, WinForms and Silverlight) is an addition to the RadPdfProcessing library and provides a completely new approach for dealing with PDF files. While the previous approach relied on building a PDF document model in memory through the RadFixedDocument class, the new API allows you to read and write directly to the PDF file streams without keeping unnecessary data in the memory. This innovative stream processing approach is the key to the remarkable results, which the new API shows.
These results may be summarized with three benefits—great performance, minimized memory footprint and guaranteed reliability.
So, how to use the new API? It generally provides four main new classes— PdfStreamWriter, PdfPageStreamWriter, PdfFileSource and PdfPageSource. The first two are responsible for writing the new PDF file and the pages in the new PDF file respectively. The next two classes are responsible for reading existing PDF files and existing PDF pages.
For example, if you have several PDF files and need to merge their pages into a newly created PDF file then you can use code similar to the one below:
If you have a more complex scenario, in which you need to position multiple pages' content on a single page, then instead of calling WritePage method you can use the BeginPage method as described below:
This allows you to combine and position both existing pages' content with PdfPageSource class or newly generated RadFixedPage content created by using the existing RadPdfProcessing editing API. More examples for merging, splitting or combining page content can be seen in this ManipulatePages SDK example.
PdfStreamWriter and PdfFileSource classes write and read PDF objects directly to and from PDF file streams. These PDF objects are simply copied without needing to decompress the PDF data and can be additionally reused when possible. Both facts guarantee maximized performance of the PdfStreamWriter class. As an example you may take a look at this PdfStreamWriterPerformance SDK that shows how a single-paged PDF file is merged 10000 times and the resulting PDF file is generated for less than a second! Impressive, right?
The idea for reading and writing from and to FileStream instances is the key for the low memory usage. The only memory used is for copying objects from one PDF file to the resulting one. However, PdfStreamWriter, PdfPageStreamWriter and PdfFileSource classes implement the IDisposable interface and all resources are released at the same time, as you no longer need them, which guarantees a minimized memory footprint. That being said, writing a single-paged PDF file and writing a multi-paged PDF file will consume practically the same amount of memory as each page is written directly to the resulting stream when ready.
PdfStreamWriter simply copies page content and the related resources from one file to another. This means that the new API does not depend on understanding any complex PDF features and it supports practically all page-related PDF features. This guarantees reliability in preserving the existing content, without modifying it or losing any data.
As an example you may take a look at the PDF file from the picture below. This file contains sound, video and 3D interactive content, which are unsupported in the previous RadPdfProcessing model. However, as PdfStreamWriter is independent from the model, it successfully preserves all page content after processing it.
Merging this file with pages from other PDF files may be seen in ManipulatePages SDK example.
Deyan is an Architect, Senior Software Developer and mathematics enthusiast. He joined the Telerik team in 2013 and has since participated in the development of several different projects - Document Processing Libraries, RadPdfViewer and RadSpreadProcessing WPF controls and most recently in Telerik AR/VR. He is passionate about 3D technologies and loves solving challenging problems.
Subscribe to be the first to get our expert-written articles and tutorials for developers!