Engineering

Open sourcing Imager - yet another image processing service

Open-sourcing Imager

Today, we announce our new FLOSS project Imager which is REST wrapper over ImageMagick CLI.

Whole project is written in Elixir and MIT licensed, however we wouldn't recommend to use it in production yet. API is still in development and it is almost 100% sure that it will radically change in near future.

Reasoning

In system we are working on, Helium, we encountered the need to generate minatures of different files uploaded to our platform. This is 3rd approach to the problem, earlier attempts were:

  • Process images within application itself, this was working quite well firstly, but then we encounter some problems:

    • high resource usage - we needed to store file in temporary file within our application and run external process that is resource hungry (especially for large files like DICOMs)
    • increased vector of the attack - as application deals with medical records, we need to reduce possible attack vectors to minimum; image processing is known to be hard, especially when you need to process broad range of formats.
  • Process images on creation based on notification from SNS queue. This solved above issues, however introduced new problems:

    • while OTP can handle processing failures it will not handle case when master process will be down during processing
    • deployment of the application became quite troublesome, our application need to be easily deployed by unskilled sysop within institution itself, while Minio provides SNS-like messages these are quite quirky to get running.

After these failures, We had to find different solution that would improve our current situation.

In search of the solution

The possible solution was quite easy, instead on-upload processing we should use on-request one. This solution is quite popular and we already have been using it in few our services via Thumbor. I started to think about implementing Thumbor in Helium as well, however there is big problem with that: Thumbor do not support neither DICOM nor PDFs. But that isn't the only issue, another one is that Thumbor URLs looks like that:

http://<thumbor-server>/300x200/smart/s.glbimg.com/et/bb/f/original/2011/03/24/VN0JiwzmOw0b0lg.jpg

With that URL there are few problems:

  • size is hidden in the middle of the URL
  • if S3 assets aren't public you need custom loader to fetch them

While both of these issues can be mitigated, the lack of support for PDFs and DICOMs was the biggest issue, and there is no interest in the project for supporting them.

I have also reviewed few other solutions I have found:

  • Imaginary - similar service written in Go. While this one supports PDFs (IIRC DICOMs as well) it has problems with private S3 resources and do not provide caching. Caching problem can be solved with hosted instances, but adding additional layer for on-premise deployments would be troublesome.
  • Imageflow - image processing proxy server written in Rust. While this one has most appealing API (as it is proxy it access images directly and processing is done by query params), it as well has problem with not supporting PDFs and DICOMs.

In such situation the only solution was to create own image processing service that will provide all requested features.

Imager

Imager is image processing proxy server that uses query params to define transforms. The great thing, that incredibly simplifies deployment, is that in case when there is no transform defined, it will behave as a transparent proxy. What that mean? I case when you request PDF from the store and you will use Imager, it will return that PDF, without any changes. Why is this important? Because now front-end can receive URL directly to Imager instance and all it needs to do to receive thumbnail image is adding query params, i.e. when you receive https://imager.localhost/bucket/image.pdf you can add ?thumbnail=190x190 to get back thumbnail of the first page. It will cache result of that computation, so whole image processing will happen only once, next time the response will be almost instantaneous.

Thanks to using Elixir's Stream module it doesn't need to load whole file to memory when there is no actions to do (sending raw file or cached one), which mean that in such cases memory usage should be almost constant.

Future

Current API provides some ImageMagick commands directly to the user. Instead in future releases we would like to introduce higher level API instead, that would allow us to stop relying on ImageMagick and use different engine in future.

Summary

Feel free to check out our project and test it out. We are open for new ideas how to improve our project before hitting 1.0.0.