Yesterday I spent some time researching how to reduce the size of my Docker image. Weighing in at 841.6 MB while loaded my Rails application image is probably average sized. Docker Hub reports it as 338 MB which is approximately the size of the image after I have docker saved and gzip’d it. That tells me that docker push and docker pull are transporting much lighter files than I work with on a daily basis. That’s good, but smaller images mean faster deploys, so time researching how to lighten their load seemed worthwhile.
Known techniques for reducing Docker image size
There are some decent articles that discuss Docker image size in depth so I will only summarize here. First and foremost, build from a small base image. Don’t use a full-blown Ubuntu desktop image if all you need is a minimal Linux instance that can run your application. That should be obvious. Secondly, chain your RUN commands together when appropriate. Apparently every statement in your Dockerfile produces a layer of your final image, and each layer is an image itself. Reduce the number of layers and your resulting image should be smaller. That said && should be your friend in your Dockerfile. For example, you might as well combine two statements like this:
1 2 3 4 5 6 7 8 |
# install packages RUN apt-get update -qq && \ apt-get install -y build-essential libpq-dev postgresql-client # clean up after installation RUN apt-get clean autoclean && \ apt-get autoremove -y && \ rm -rf /var/lib/apt /var/lib/dpkg /var/lib/cache /var/lib/log |
into one:
1 2 3 4 5 6 |
# install packages and clean up in one shot RUN apt-get update -qq && \ apt-get install -y build-essential libpq-dev postgresql-client && \ apt-get clean autoclean && \ apt-get autoremove -y && \ rm -rf /var/lib/apt /var/lib/dpkg /var/lib/cache /var/lib/log |
Doing so saved me 11.1 MB. As discussed below in the comments, this is mostly useful when you chain statements that dirty your image with statements that clean up the dirt. If left in two statements then the first will create a layer with all the junk that the second statement cleans up. Chain them together and the junk won’t be in the final image at all.
A word on dependencies
I think gems are likely a big bloat culprit in any Rails app image. That said, you should question every gem in your Gemfile and make sure it’s serving a purpose in your app. Remove gems that aren’t. Same is true for any other software platform — scrutinize your dependencies, and get rid of the cruft. Not only will this reduce your Docker image size but it will help keep your application maintainable.
Known tools for reducing Docker image size
I came across two tools that claim to reduce the size of your existing Docker images. One is called docker-slim and the other docker-squash. I tried them both out, but didn’t have success with either. docker-slim did dramatically reduce my image size from 841.6 MB to an amazingly small 31.59 MB, but afterwards my Rails app failed to boot and I saw failures in my unicorn log file that looked like this:
1 2 3 4 5 6 7 8 9 |
config.ru:1:in `<main>': cannot load such file -- rack/builder (LoadError) from /usr/local/bundle/gems/unicorn-4.9.0/lib/unicorn.rb:48:in `eval' from /usr/local/bundle/gems/unicorn-4.9.0/lib/unicorn.rb:48:in `block in builder' from /usr/local/bundle/gems/unicorn-4.9.0/lib/unicorn/http_server.rb:768:in `call' from /usr/local/bundle/gems/unicorn-4.9.0/lib/unicorn/http_server.rb:768:in `build_app!' from /usr/local/bundle/gems/unicorn-4.9.0/lib/unicorn/http_server.rb:137:in `start' from /usr/local/bundle/gems/unicorn-4.9.0/bin/unicorn:126:in `<top (required)>' from /usr/local/bundle/bin/unicorn:16:in `load' from /usr/local/bundle/bin/unicorn:16:in `<main>' |
Clearly docker-slim chewed up my application and spit it out into something unusable. On the other hand, I gave up on making docker-squash work simply because I was too lazy to install its hard dependency of tar version 1.27. My CI server is stuck at Ubuntu 12.04 (precise) and tar 1.27 isn’t easily available until 14.04. Yes, I could compile from source or try to pull the 14.04 binaries. However, at the end of the day I just didn’t want to go down those roads. In my experience any tool that forces you in those directions just isn’t ready for prime time.
Despite not having luck with docker-squash or docker-slim I applaud the authors of these tools. They are taking on a real issue in the Docker community and have to keep pace with some fast moving, cutting-edge technology. I’ll be watching these projects; I wouldn’t be surprised if one or the other matured on its own or had its ideas absorbed into Docker itself.
Got questions or feedback? I want it. Drop your thoughts in the comments below or hit me @ccstump. You can also follow me on Twitter to be aware of future articles.
Thanks for reading!
I like the tidbit on reducing image size the the &&. Nice little find there. What kind of change were you able to see by adding this small change?
Thanks again, and keep posting!
Honestly, I did not see a significant change in image size once I chained my RUN statements together. That said, I don’t have a lot of RUN statements in my Dockerfile. Those who do might see a difference. It’s cheap to try so worthwhile. You will see a change in thedocker history
of the image.I updated the post with a better example of chaining. Now I’m saving 11 MB in the final image.
I believe the only change in image size from chaining RUN statements is if you are installing/compiling then removing the no-longer-needed install dependencies, cleaning up packaging caches, and such.
For example, your 800M ubuntu image probably has build-essential, man pages, include files, and other, possibly, unnecessary packages that could be removed. Using Ubuntu 14.04, I’ve been able to get a ruby on rails app under 300M (half of that are ruby gems) without slim or squash. I’ve been meaning to try these tools out, but they feel dirty. I’d rather use Alpine Linux and build up from nothing, but I *think* I encountered issues with their long-known DNS bug…
That makes sense Kevin. If you install in one RUN and then clean up in another RUN the first RUN would leave all the stuff you clean up in the second RUN in the final image. Chain them together and that won’t happen. I played around with my Dockerfile and you’re right. I’ve updated the example in this post to acknowledge. Thanks for the info!
Building (essentially) from scratch is definitely going to get you the slimmest image possible. However, most folks aren’t going to do it. I see tools like docker-slim and docker-squash filling a gap similar to compiler optimizations. Compiler optimizations do stuff that developers could do on their own but don’t, so the compiler (tool) picks up the slack.
Interesting tools you found and nice writeup. I think what is missing at the moment is a tool to analyze images and make recommendations on size optimizations. Maybe one already exists but something like https://imagelayers.io with added functionality would be ideal.
Sorry to hear that docker-slim didn’t work for your app. Is this the app: https://github.com/cstump/docker_example If it isn’t I wonder if it’s close enough. Not every stack has been tested yet. Rails with unicorn is one of them 🙂 I’ll open a ticket and update you once I have good news.
Thanks for following up Kyle! The docker_example app isn’t exactly what I tried docker-slim on, but the Docker setup is the same.
Let me know if you’d like a hand with testing, I’d be happy to help.
Sounds like it’s close enough! I might need some help. It’s been a while since my Rails days 🙂 Thank you for trying out the tool and discovering the issue. I know it’s not fun for you (because you just want to get stuff done), but for me it’s an opportunity to improve docker-slim.
Thanks Chris for your article! Especially the link to the companion blog: Optimizing Docker Images. In short, images are generally larger than they need to be because Dockerfiles mix the concerns of building and running applications within the resultant image. One could dramatically reduce runtime image size and improve image security if all build tooling was removed from them, leaving only the necessary runtime artifacts. For example, the classic golang Docker “Hello World!” app relies on a 744MB golang:onbuild image. Once built, moving only the runtime helloworld executable to its own image consumes a mere 2.3MB and greatly reduces the attack surface of any running container derived from this image. This idea is akin to the docker export-import technique mentioned by the companion blog.
Although this technique can help reduce image size, export doesn’t directly provide a means to filter/select what’s exported nor does it accept an image reference as an argument. To address these issues, I created a bash script that wraps several docker CLI commands to enable selective copying of individual files/directories from/to host file system, containers, and images. This script can be used to “squash/compress” images by copying only the necessary bits from an existing container/image to a new image. The dkrcp script is available on github https://github.com/WhisperingChaos/dkrcp .
If you get an opportunity, try it and let me know what you think.
Thanks for the explanation Rich, and for the link to the script. dkrcp seems like a sophisticated general purpose tool; not for the faint of heart but a viable option for those who know their images well.