JPEG2000 isn’t the easiest of formats to disseminate. Browsers typically handle the format with difficulty and then require plugins or extensions to render the format. We don’t want our users to have to download anything just to be able to view our material on-line. So, we plan to convert our JPEG2000 files to a browser friendly JPEG or PDF for dissemination. Both formats admirably handled by browsers. (OK, PDF needs an Adobe plugin but it's commonly included with browsers.) Other formats may come along later. The thing is, how do we do that conversion? There are plenty of conversion tools out there – we use Lurawave for the image conversion. But then the question becomes when do we convert from a master to a dissemination format? Especially if we want a speedy delivery of content to the end user.
One of the guiding principles behind our decision to use JPEG2000 was that we could reduce our overall storage requirements by creating smaller files than we might have done if we’d used, say, TIFF. So if we automatically convert every JPEG2000 to a low res thumbnail JPEG, a medium res JPEG and a high res JPEG and to a PDF then we’re back to having to find storage for these dissemination files. OK, JPEG won’t consume terabytes of storage and nor will PDF, but we’d need structured storage to keep track of each manifestation and metadata to provide to our front end delivery system as to which JPEG was to be used in which circumstances. True, this has been very successfully done for many projects before now but alongside efficiency of storage is efficiency of managing what we have stored and a speedy delivery.
So we plan to convert JPEG2000 to JPEG or PDF on-the-fly at the time each image is requested. The idea is that we serve JPEG2000 images out of our DAM to an image server, the image is converted and the dissemination file served up. Instead of paying for large volumes of static storage we believe that putting the saving on storage into a fast image server will directly benefit those who want to use our material online.
One outcome of a conversation had with DLConsulting is that we've learned that on-the-fly conversion is a potentially system intensive (and at worst inefficient) activity that could create a bottleneck in the delivery of content to the end user. We've said that speed is an issue. We need to efficiently process the tiled and layered JPEG2000 files we plan to create. A faster more powerful image server may help but good conversion software qwill be key. Alongside on-the-fly conversion we plan to use a cache that would hold, in temporary storage the most requested images/PDFs. The cache would work something like this. It has a limited size/capacity and contains the most popular/most often requested images/PDFs. If an image/PDF in the cache were not requested for n amount of time it would be removed from the cache. In practice a user requests an digitised image of a painting, the front end system queries the cache to see if the image is there, if it is its served directly and swiftly to the user. If not the front end system calls the file from the back end DAM. The DAM delivers that image to the image server, which converts JPEG2000 to JPEG and places that images in the cache. From where it can be passed to the front end system and the end user. Smooth, fast and efficient in the use of system resources.
But there are still questions. If we pass the JPEG2000 to the image server for conversion to JPEG that’s fine; but what happens next? Is the JPEG2000 discarded after the conversion process leaving only the JPEGs? Is this the best way to support the zooming in on image sections that we want to offer. The original proposal was to hold only dissemination formats in the cache, now we’re thinking that for flexibility we may prefer to hold the JPEG2000 images and convert them as the image is requested by a user. Is this still the most efficient process? It's easy to build bottlenecks into a system that slow processes down, much more difficult to design a system for speed and efficiency. We’re pretty certain that the conversion–on-the-fly is a good idea and we also think the cache is too. Unless you know differently….