September 09, 2011

Simplifying our JPEG2000 conversion workflow

Over the summer, we have been working to streamline our JPEG 2000 conversion workflow. With the help of software developers from Genisys - one of the Trust’s strategic IT development and support partners - we have put the LuraWave command line interface to use in automating batch conversion.

Up to now we have been using the native GUI interface that comes with the LuraWave software, manually entering parameters and initiating the conversion process for each batch of images. This was useful for us as we settled into a large-scale digitisation workflow incorporating RAW - TIFF - JP2 conversion, cleared our backlog and established our compression testing methodology (as described in previous posts on this blog). With no relevant in-house programming expertise, the GUI was essential during these early stages. 

Now that we have a firm idea of how we want to use LuraWave, where it fits into the overall workflow, and what kind of throughput we need on a day-to-day basis, it was time to set up an automated solution.

The Wellcome Trust operates in an (almost) entirely Windows environment, so we commissioned the Genisys software engineers to code a .NET wrapper script running as an executable.  The wrapper script invokes LuraWave’s command line conversion to allow us to convert images with no manual intervention. An XML configuration file that contains the following information is used to control how the wrapper script invokes LuraWave:
  • "Inbox" directory (files ready for conversion)
  • Temporary directory (files copied before conversion)
  • "Outbox" directory (converted files)
  • LuraWave command line
  • Error directory
  • List of any files to exclude from conversion
LuraWave retains the original folder structure, so the "Inbox" and "Outbox" is the top level directory, with the original folder hierarchy maintained throughout the conversion process.

Polling of the specified input folder is handled with Windows Scheduler, which can be run on a PC or on a server (we run it on a virtual server). Every 5 minutes Windows Scheduler prompts the script to check for TIFFs in the "Inbox".  Lurawave is then invoked, converting the TIFFs to JP2s that are copied out to the “Outbox”.  We’ve got some really good error handling in place so if one rogue file can’t be converted the rest of the files still get converted – essential when converting big volumes, we don’t want the first file failing and halting an overnight run of thousands of files.

Windows Scheduler does not parallel process, so folders are queued for conversion. With speeds of around 30Gb (at least 1,200 TIFFs) per hour, this is quick enough for our needs.

This implementation means that a single LuraWave license can be used for any number of input streams, and with the facility to "call" multiple definitions; it can also convert images to multiple JPEG 2000 profiles (we currently have a lossless profile and a lossy profile).

With thanks to Alastair Reid, Wellcome Trust IT Account Manager, for providing this information and reviewing this post.