Image caching in PDF

by Rossen Hristov

September 09, 2008 Productivity 0 Comments

Over the past few days I worked on an image caching mechanism in the PDF Rendering Extension and I am happy to announce that the achieved output size reduction was more than significant.

The Problem

When a report contains the same binary image many times, the PDF Rendering Extension has no way of knowing that it is actually one and the same and renders it many times thus wasting disc space. For an illustration of this case imagine that you have your company's logo in the page header and your report is 200 pages long.

The Solution: Enter Cyclic Redundancy Check

I've implemented a central storage for all images that need to be rendered in PDF. It is a simple generic Dictionary. For a key I used a class that stores a reference to the original System.Drawing.Image and the source rectangle from that image that needs to be drawn into the PDF document. These two things uniquely identify a PDF image. Now, in order to use that class as a key for my dictionary, I overrode the GetHashCode and Equals methods. The hashcode for the object is calculated very easily:

Convert the image to a byte[] by using the handy System.Drawing.ImageConverter.
Calculate the CRC32 of this buffer.
Raise the calculated CRC32 to the power of the source rectangle's hashcode.

Now CRC32 might sound intimidating, but in fact it is a well known algorithm and is implemented on about 20-30 lines of code. Now, each time a new image arrives its hash is computed, the dictionary is searched to see whether we already have it in there, and if we have it we simply use the existing image. Of course .NET does all of the above for us, since we've done our job by overriding the Equals and GetHashCode methods of the key class.

The Results

Out of curiosity, I decided to export all of our sample reports to PDF and compare the results with and without caching. With image caching the output size fell by 45% which is sweet, but there is more. Often there is a tradeoff between speed and size. I expected the rendering speed to decrease since now I am doing checks in the image cache each time a new image arrives. But the speed increased by 4%. How can that be?

The explanation is simple. The cache checks are slowing the whole thing and there is no doubt about that. However, each time an image is streamed to PDF, some meta-data has to be read from it in order to determine color spaces, palletes, etc. The reading of this meta-data takes some amount of time. Now, since we have an image cache, meta-data extraction will happen only once for each image in the cache, unlike before when it occurred for every image, no matter whether it was redundant.

So in the end we have achieved a considerable size reduction topped with a slight speed increase. I have already checked-in these changes, so they should be available in the next release. Enjoy!

PDF, Performance, Reporting, Telerik

About the Author

Rossen Hristov

Rossen Hristov is Senior Software Developer in Telerik XAML Team

Comments

Comments are disabled in preview mode.

All articles

Topics

Latest Stories
in Your Inbox

Subscribe to be the first to get our expert-written articles and tutorials for developers!

All fields are required

Country/Territory

Blog

Product Bundles

DevCraft

Web

Mobile

Document Management

Desktop

Reporting & Mocking

Automated Testing

CMS

UI/UX Tools

Debugging

Free Tools

Image caching in PDF

The Problem

The Solution: Enter Cyclic Redundancy Check

The Results

Rossen Hristov

Related Posts

Filtering Your Report’s Data at Design Time and Run Time with the Embedded Web Report Designer

Adding Group Headers and Subtotals with the Embedded Web Report Designer

Adding Data and Text to Your Report in the Telerik Web Report Designer

Comments

All articles

Topics

Latest Stories in Your Inbox

Latest Stories
in Your Inbox