Scenario

I want to transfer efficiently my images to the production machines.

Given

My Dockerfile with android sdk

ADD http://dl.google.com/android/android-sdk_r24.0.2-linux.tgz /opt/android-sdk.tgz
RUN tar xzf /opt/android-sdk.tgz -C /opt/
RUN rm -f /opt/android-sdk.tgz

built with docker build -t docker_layers .

When

I want to push it to the registry

Then

It will push 3 layers:

  • the first is the android tgz pushed into /opt
  • the second is the extracted sdk into the /opt directory
  • the last one is the deletion of the tgz

This is really not optimal, the first and third layer are useless. Worst, they use space on your filesystems, on your network when you pull/push, …

Solution

It is a good practice to merge common operations into one single docker command.

For this example it would be:

RUN cd /opt && \
    wget --output-document=android-sdk.tgz --quiet http://dl.google.com/android/android-sdk_r24.0.2-linux.tgz && \
    tar xzf android-sdk.tgz && \
    rm -f android-sdk.tgz

In that way you only have one layer commited, and it contains only the needed android sdk extracted.

Caveats

There is one drawback with this particular approach: You lose the capacities of the ADD syntax.

If you build the first Dockerfile, you will notice that each time the tgz is downloaded. Docker then verifies that the file hasn’t changed and if not, uses the cached image.

That means that if the tgz changes, there will be a new build, whereas with the one-line approach, it will not, because docker will just verify that the command hasn’t changed, not the content downloaded by the wget.

Templates

The general approach is :

  • Prepare what you want to do
  • Do it
  • Clean everything that is not necessary

The example with the android sdk works for all archives that must be extracted.

Below are templates from our experience that we think doesn’t alter readability while reducing the layers and the size of the docker image.

APT

RUN apt-get update &&  \
    apt-get install -y whatever && \
    apt-get clean && \
    apt-get autoclean && \
    rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

Yum

RUN yum -y update && \
    yum -y install whatever && \
    yum clean all

An other very important thing is to keep your Dockerfile readable

This example is, in my opinion, a bad one.

{% gist MichaelBitard/7bd7bc71385326ab3238 Dockerfile %}

Yes, it’s only one docker RUN command

PS

  • If you know other tips to reduce the number of layers without losing readability, feel free to post a comment, I’ll gladly add them here.