This project has moved. For the latest updates, please go here.

Minimize file size of embedded images in PDF output

Dec 23, 2013 at 8:08 PM
I read in two PDF files as image collections and then compare the sheets to show what has changed. When I do it at the command line with imagemagick and ghostscript I get file sizes about 10 times smaller than the output I get with Magick.NET.
I'm wondering how I control the compression of the embedded images in a ghostscript PDF file.
Coordinator
Dec 23, 2013 at 8:33 PM
Can you post an example of what you are doing on the command line?
Dec 23, 2013 at 9:13 PM
Here's the code using Magick.NET
            using (MagickImageCollection new_images = new MagickImageCollection(),
                old_images = new MagickImageCollection(),
                diff_images = new MagickImageCollection())
            {
                new_images.Read(@"C:\Users\username\Pictures\projects\pdfcompare\new_drawing.pdf", settings);
                old_images.Read(@"C:\Users\username\Pictures\projects\pdfcompare\old_drawing.pdf", settings);
                
                int i = 1;
                int j = 1;
 
                foreach (MagickImage image in new_images)
                {
                    image.Alpha(AlphaOption.Opaque);
                    i++;
                }
                foreach (MagickImage image in old_images)
                {
                    image.Alpha(AlphaOption.Opaque);
                    j++;
                }
                
                // Compare both new and old
                if (new_images.Count == old_images.Count )
                {
                    for (int shtNum = 0; shtNum < new_images.Count; shtNum++)
                    {
                        MagickImage diff_image = new MagickImage();
                        MagickImage new_image = new_images.ElementAt(shtNum);
                        MagickImage old_image = old_images.ElementAt(shtNum);
                        new_image.Compare(old_image, Metric.FuzzError, diff_image);
                        diff_images.Add(diff_image);
                    }
                }
                // Output image diff
                diff_images.Write(@"C:\Users\username\Pictures\projects\pdfcompare\output\diff-drawings" + ".pdf");
Here's the command line equivalent:

First I read in the PDFs and output pngs
gswin64c.exe -q -dQUIET -dBATCH -dNOPAUSE -NOPROMPT -sDEVICE=png16m -r150 -sOutputFile="sheet-%d.png" "input.pdf"
Then I do the comparison and output the pdf
convert.exe sheet-1.png sheet-2.png output.pdf
I'm guessing the command line is smaller because I'm turning some pngs into a pdf. I'm not sure what file format gets wrapped by the pdf in Magick.NET + ghostscript. Could also be that I don't have an alpha channel in the command line version and I do with the code.
Coordinator
Dec 24, 2013 at 7:01 AM
I am also not sure what the format of the image is. You could check the Format property of diff_image. Have you tried setting the Format to MagickFormat.Png?
Dec 25, 2013 at 1:39 AM
Here's the format of the diff_image to start with: {Pdf: Portable Document Format (+R+W+M)}. I had already tried converting the images to png on import, but it didn't have much of an effect. Not sure why I didn't think of converting diff_image as well.

Here's what I did that cut the file size of the pdf in half:
                        diff_image.HasAlpha = false;
                        diff_image.Format = MagickFormat.Png;
                        diff_image.Quality = 95;
                        diff_images.Add(diff_image);
The original pdf was 2.53MB and after the above addition it became 1.38MB. However the odd part is when I changed the image quality to 5 the pdf was the same 1.38MB. I also output the png diff_image by itself and compared the difference between the output at quality=5 and quality=95. The quality 5 was 674KB and quality 95 was 167KB. So the pdf shrunk by 45% and the png shrunk by 75%. I'm wondering why the png shrunk so much, but the pdf didn't.

Thanks for the guidance on converting the output to png.
Coordinator
Dec 25, 2013 at 9:15 AM
Edited Dec 25, 2013 at 9:21 AM
Maybe the quality operation does not influence the size of the resulting pdf? Will you get the 2.53MB size when you omit the 'diff_image.Format = MagickFormat.Png' statement?

Does your resulting pdf from the command line has a size of only 260kB? Are you sure both pdf's are created from images with the same dimmension and format?

The quality operator for png does not work the same as for jpeg images. It is used to set the compression level of the image. You can read the following post to see how it works: http://www.imagemagick.org/discourse-server/viewtopic.php?f=2&t=24134#p103040
Dec 25, 2013 at 4:24 PM
You're right the quality doesn't affect the output size of the PDF. It does affect the output size of the pngs. It is just the setting of HasAlpha=false that shrinks the pdf file output. I'll check out the command line after the Christmas gifts get opened :-).
Dec 26, 2013 at 8:13 AM
I did some testing with the command line side and the magick.net isn't far off. I ended up getting 1.06MB with the command line instead of 1.38MB with Magick.NET.
The starting pngs for the cli generated pdf were 199KB and 86.7KB which is about a third of the final pdf. I might try using one bit images instead of grayscale to see what I can squeeze out. I just got a little freaked out when the 6 page D size drawing comparisons were getting close to 11MB.

Thanks for the reality check. Wish there was a way to have the PDF be just twice the size of the pngs. Makes me think the pngs are uncompressed when they're stored in the pdf.
Coordinator
Dec 26, 2013 at 11:06 AM
Have you tried to set the CompressionMethod of diff_image to CompressionMethod.Zip? This should help according to this page: http://www.imagemagick.org/Usage/formats/#pdf_options.
Dec 26, 2013 at 11:24 PM
Woohoo!!! That did the trick. Now the pdf is 414KB instead of 1.38MB. Magick.NET has now caught up and surpassed my command line method. Thanks for all your help.
Jan 17, 2014 at 9:52 PM
pnw_bob,

Could you post a fragment of what your final code looked like ? Turns out I want to do something very similar and it would save me some time.

And I am doing the same thing, converting ImageMagic convert.exe calls over to using Magick.Net. I'm not so concerned so much of file size but with speed cause I have to do hundreds of images. Thanks in advance, Rick