This project has moved. For the latest updates, please go here.

PDF to PNG conversion running slow

Jun 17, 2014 at 11:50 AM
Edited Jun 17, 2014 at 11:55 AM
I am using Magick.NET in a multi-threaded WPF application to convert PDFs to PNGs in batches of thousands. The application gets all the documents from a database and then the threads take out the documents one by one from a concurrent queue. This works very good with small documents, but with the large documents (going as big as 10-15MB), it takes quite a lot of time.

When I try to convert these particular files one by one, they are relatively quite fast. I am not sure what is causing the conversion to run slower when there are a few more threads running, trying to achieve the same thing. I was also testing my CPU usage and during the conversion of a large file, my CPU usage drops significantly.

With smaller files, the conversion runs smoothly with around 90-100% CPU usage but when the application picks up a larger file, it will drop to around 50-60% for a few seconds and then go back up and might drop again. I have over a million documents to convert and would like to know how, if possible, I can improve the performance.

I am using an Intel Xeon CPU E5-1607 0 @ 3.00GHz running Windows 7 64 bit with 8GB RAM and Radeon HD 5450.

Below is a part of my conversion code that runs on different threads:
        MagickReadSettings settings = new MagickReadSettings();
        settings.Density = new MagickGeometry(150, 150);

        // Read the PDF into a collection of images
        images.Read(currentDocument.OriginalPath, settings);

        int page = 1;
        string destinationPath = "C:\\Test\\" + fileNameWithoutExtension;

        // Create the directory if it does not exist.
        if (!Directory.Exists(destinationPath))
        {
            Directory.CreateDirectory(destinationPath);
        }

        foreach (MagickImage image in images)
        {
           // Formatting image
           image.Format = MagickFormat.Png;
           image.Alpha(AlphaOption.Remove);
           image.Annotate("Some Watermark", Gravity.Southwest);
           if (image.BaseWidth > image.BaseHeight)
           {
                    image.Resize(1000, 0);
           }
           else
           {
                     image.Resize(0, 1000);
           }

           image.Quality = 100;
           image.Enhance();
           image.Normalize();
           QuantizeSettings quantizeSettings = new QuantizeSettings();
           quantizeSettings.Colors = 250;
           image.Quantize(quantizeSettings);

           // Write the image to the specified path as a PNG
           image.Write(destinationPath + "\\" + page.ToString() + ".png");
        }
Thanks a lot.
Coordinator
Jun 17, 2014 at 12:42 PM
You are probably hitting a memory limit. When your application cannot get enough memory it will switch to disk cache. You should see an increase in disk IO at that point. Are you using the Q8 version or the Q16 version of Magick.NET? The Q8 version might be enough for you since you are already calling the method Quantize. Switching from Q16 to Q8 will half your memory usage. You should probably also limit the amount of threads to a value that will allow you to keep everything in memory.
Jun 17, 2014 at 12:54 PM
Edited Jun 17, 2014 at 1:03 PM
I am using the latest x86 Q16 version of Magick.NET.

I didnot pay attention towards the disk IO but I did see my memory and the maximum it reached was around 4.5GB so I still had around 3GB left. I have run a lot of tests and never saw any sudden increase in the memory. Could it be something else?

Thanks.
Coordinator
Jun 17, 2014 at 1:09 PM
Edited Jun 17, 2014 at 1:18 PM
An x86 process can only use 2 GB (http://msdn.microsoft.com/en-us/library/windows/desktop/aa366778(v=vs.85).aspx). So you might still be hitting that limit. Did you also install the x86 version of Ghostscript? The architecture should match that of Magick.NET so an dll call can be made. Otherwise it will spawn a process.
Jun 17, 2014 at 1:17 PM
Good point. I will try converting it to a 64 bit application and report back the result.
Thanks a lot.
Jun 17, 2014 at 2:03 PM
I can't make it to run on more than one thread. It gives me an error saying:

Magick: FailedToExecuteCommand `"C:/Program Files/gs/gs9.14/bin/gswin64c.exe" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r150x150" "-sOutputFile=C:/Users/MANVIR~1/AppData/Local/Temp/magick-4476aUJDp_uyc4-P%d" "-fC:/Users/MANVIR~1/AppData/Local/Temp/magick-4476uEKeFF_zA932" "-fC:/Users/MANVIR~1/AppData/Local/Temp/magick-4476DOgFR85W5Q_c"' (1) @ error/pdf.c/InvokePDFDelegate/237

However, the error is not on all files. It is converting some files but most of them give this error. On a single thread, all works fine.

I installed the x64 Q16 version of Magick.NET and also removed and installed the latest version of the 64 bit Ghostscript. Also, changed my target platform to be x64 in the Build properties of my project.

Thanks.
Coordinator
Jun 17, 2014 at 2:22 PM
Edited Jun 17, 2014 at 2:23 PM
It looks like the 64 bit version of the library is not build with 'GS_THREADSAFE' which allows more then one instance when the library is used. It will switch to the command line version then. There is a bug with the call to the command line version in the current version of Magick.NET. This should be fixed in the next release but I have a memory leak that needs to be fixed before I can publish the next release and I am still working on that. This is a project I do in my spare time and I don't really have that much time at the moment so it might take a while. Can you try and see if the previous version of Magick.NET works for you?
Jun 18, 2014 at 9:25 AM
I tried the 6.8.9.002 64bit version and that seems to work with multiple threads, however a PDF file of around 3MB (containing 194 pages) seems to take up around 7.5GB of RAM initially and then goes down slowly and stays at somewhat around 5-6GB until the conversion is finished. Is this the memory leak bug that you are talking about?
Coordinator
Jun 18, 2014 at 9:52 AM
The huge amount of memory usage is not due to the memory leak. It only leaks small memory blocks. The previous Q16 version of Magick.NET used 32 bits per pixel. The new version uses 'only' 16 bits per pixel. For PDF files with a lot of pages Magick.NET will still have to create an in memory block for each page. If your PDF contains 194 pages and is 1240x1754 the total memory used is: 194 * 1240 * 1754 * 4 (nr channels) * 32 = 54008606720 bits = 6.2GB.
Jun 18, 2014 at 11:21 AM
Okay. Thanks for clearing that up. I will continue to use this version (6.8.9.002) and wait for the next release. I request you to build the next release as thread safe with the Ghostscript so it can run multiple instances of it.

Apart from that, really great work done with the Magick.NET. Appreciate your work.

Thanks a lot.
Coordinator
Jun 18, 2014 at 12:07 PM
I don't build Ghostscript so I cannot make the call to the library with the thread safe option. I might be able to add an extra download with a custom build of Ghostscript but I will have to find some spare time to work on that. I will put this on my todo list.
Jun 18, 2014 at 4:56 PM
Sorry, I meant building it with 'GS_THREADSAFE' to allow multiple instances of it like in previous 64 bit versions.

I ran a test with the same file, the latest version converted it in 6 min 21 seconds while the 6.8.9.002 took 5 min 06 seconds. The latest version took around 5GB of RAM and the other one went as high as 7.6GB. The RAM usage, I suppose is because the latest version uses 16 bits per pixel.

I am just trying to see which method would be the fastest for me, the 64 bit works well but on multiple threads it runs slow anyway because a single file takes almost all of my RAM and then I think all the other threads use the disk.

Thanks
Coordinator
Jun 18, 2014 at 8:04 PM
The 'GS_THREADSAFE' flag is for when you build Ghostscript, it is not a flag for ImageMagick. The binary release published by Ghostscript is not build with that flag. I could create a custom binary that has this flag but I don't really have the time to do that now.

The conversion should become a bit faster when you use the next release. I hope I can publish it this week.
Jun 20, 2014 at 1:02 PM
Okay, thanks a lot. Will wait for the next release!