Using Docker for Developing Python Applications

Docker became popular software solution permitting to deploy applications inside isolated Linux software containers. From a Python related point of view, one could consider Docker containers as "virtual environments on steroids", because they encapsulate and isolate not only application's Python pre-requisites (say given version of PyPDF2 package), but also any non-Python utilities of the operating system that the application relies on (say given version of LibreOffice). The following primer shows how to use Docker for developing Python applications.

Installation

Installing docker on Debian GNU/Linux is easy:

sudo apt-get install docker.io

The docker now runs:

$ docker info
Containers: 0
Images: 289
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Dirs: 289
Execution Driver: native-0.2
Kernel Version: 3.16.0-4-amd64
Operating System: Debian GNU/Linux 8 (jessie)
WARNING: No memory limit support
WARNING: No swap limit support

We can use it as is, however, note the memory and swap limit warning that may be good to fix before we continue.

Enabling memory/swap limit support

On Debian GNU/Linux systems, the memory limit and swap limit features can be set by configuring kernel boot parameters. This is done by editing /etc/default/grub in the following way:

$ sudo vim /etc/default/grub # edit GRUB_CMDLINE_LINUX as follows
$ grep GRUB_CMDLINE_LINUX /etc/default/grub
GRUB_CMDLINE_LINUX="cgroup_enable=memory swapaccount=1"
$ sudo update-grub
$ sudo shutdown -r now

After which the two WARNING lines disappear.

Note that memory accounting of running containers will be inspectable via cgroup and friends:

$ systemd-cgtop
$ ls -l /sys/fs/cgroup/memory/system.slice

Enabling CERN DNS

One more installation related comment, of importance to inside-CERN users. The network works best when one specifies CERN DNS IPs explicitly. This can be done by using --dns parameter to the docker commands below, or else it can be done globally by means of configuring DOCKER_OPTS in the following way:

$ sudo vim /etc/default/docker # edit DOCKER_OPTS as follows
$ grep DOCKER_OPTS /etc/default/docker
DOCKER_OPTS="--dns 137.138.16.5 --dns 137.138.17.5 --dns 8.8.8.8 --dns 8.8.4.4"
$ sudo /etc/init.d/docker restart

Throw-away Python containers

Now that Docker is installed, how to use it to develop Python applications? We can start by pulling pre-existing Python docker images from the docker registry hub:

$ docker search python
$ docker pull python:2.7
$ docker pull python:3.4

This will permit us to start throw-away Python containers:

$ docker run -i -t --rm python:2.7
Python 2.7.9 (default, Jan 28 2015, 01:38:45)
[GCC 4.9.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 1 + 1
2

This creates interactive (-i) Python container attached to the terminal (-t) that will be removed once we quit the session (--rm).

Throw-away containers are useful to quick test Python constructs. For example, how fast are Python list comprehensions in various Python versions?

$ docker run -i -t --rm python:2.7 python -m timeit "[i for i in range(1000)]"
10000 loops, best of 3: 82.2 usec per loop
$ docker run -i -t --rm python:3.3 python -m timeit "[i for i in range(1000)]"
10000 loops, best of 3: 83 usec per loop
$ docker run -i -t --rm python:3.4 python -m timeit "[i for i in range(1000)]"
10000 loops, best of 3: 87.7 usec per loop

The higher the version, the slower the Python seem to be; but let's not digress again.

Simple application

Consider we are developing some simple Python application, such as web site based on Flask framework. Here is minimal "hello world" code example:

$ cat app.py

from flask import Flask
app = Flask(__name__)

@app.route('/')
def hello():
  return 'Hello world'

if __name__ == "__main__":
    app.run(host="0.0.0.0", debug=True)

with the following requirements:

$ cat requirements.txt
Flask

The application is started as:

$ python app.py

It will run on http://0.0.0.0:5000 and simply greets its user.

Dockerfile

Let us build a Docker image enabling to start a container running this application. While we could start an interactive Python container as described above, install pre-requisite and save the work for later, it is best to fully automatise creation of Docker images by means of a Dockerfile.

For our simple application, the Dockerfile would look as follows:

$ cat Dockerfile

FROM python:2.7
ADD requirements.txt /tmp/requirements.txt
RUN pip install -r /tmp/requirements.txt
ADD . /code
WORKDIR /code
EXPOSE 5000
CMD ["python", "app.py"]

This means we are starting from Python-2.7 Docker image, adding current requirements.txt file and run it to install Flask, then adding current directory in a /code directory in the container, working on the code there. The application will be run on port 5000 when the container starts by means of python app.py.

The docker image can be then built by running:

$ docker build -t tiborsimko/helloworld .

A new container can be instantiated out of this image as follows:

$ docker run -p 5000:5000 tiborsimko/helloworld

On the host OS, we see the web site running on port 5000 that is exposed from the container to the host system.

Another useful option is -v (for volume management) that permits to mount current working directory under /code in the container, so that we could use our preferred editor on the host machine to edit the application and see its changes live in the container. This can be achieved by -v .:/code option, but there's another way to automatise this.

docker-compose

docker-compose provides useful composition services on top of Docker that permit us to automatise building and running of containers. First install it as follows:

sudo pip install docker-compose==1.1.0-rc1

You may need to upgrade PyYAML beforehand:

sudo apt-get remove python-openssl
sudo apt-get install libyaml-dev
sudo pip install PyYAML

(Note that the above example replaces system Python packages with locally installed ones, which may be dangerous. A better technique would be to use pipsi that installs Python programs and their dependencies into virtual environments, permitting their better isolation from system Python package versions.)

Here is docker-compose configuration for our simple application example:

$ cat docker-compose.yml

web:
  build: .
  command: python app.py
  ports:
   - "5000:5000"
  volumes:
   - .:/code

The building is then done via:

$ docker-compose build

and a container can be fired up via:

$ docker-compose up

Note how docker command line options are being stored in more readable YAML configuration, including exposing port 5000 or mounting current working directory under /code. Basically, docker-compose permits to automatise via YAML what we would otherwise have to express by hand via docker command line options.

This advantage will be even more apparent for complex applications where the application would require to link more containers together, such as the Python application running inside the web container, that is linked to a redis container caching, a db container running PostgreSQL database, and a worker container running Celery tasks.

.dockerignore

If we want to share our created image with others, it is useful to define .dockerignore file that will permit to ignore certain files or directories from being included in the built Docker image. A good example is .git: by putting it in .dockerignore, we won't expose our local unstable branches to friends, though we still retain the option to have them available for local developments by volume mounting.

Docker build cache

Why have we defined in Dockerfile the following part?

ADD requirements.txt /tmp/requirements.txt
RUN pip install -r /tmp/requirements.txt
ADD . /code

The requirements.txt is mounted in the third line as /code anyway, isn't it?

The reason the requirements.txt file was added explicitly before the rest of the code is the docker build cache. If we repeat the build process several times, docker caches prior layers (roughly speaking prior RUN statements) and reuse them whenever possible. For example, if requirements.txt did not change, and only our app.py changes, this means that our application requirements won't have to be installed over and over; Docker will reuse previously built layers.

Automated build cache is one of the very cool features of Docker. It makes building images and creating containers an easy, fast, and disposable process. It is therefore important to write Dockerfile in such a manner that most of the pre-requisite installation job is being done before we add our code.

Container user

If we run bash shell in the built container:

$ docker run -i -t --rm tiborsimko/helloworld bash
root@06436a85c124:/code# id
uid=0(root) gid=0(root) groups=0(root)

we'll see that the container process runs as root, which is not the best from the point of view of security.

It is desirable to create a new user that the application would run as. Ideally, it would be a user with the same UID as the main user of the host system, so that if we mount current directory into the container, and if the build process needs to create some files (say by running Bower and friends), all files created within the container would bear the same ownership as the files in the host system.

This can be achieved in Dockerfile via:

RUN adduser --uid 1000 --disabled-password --gecos '' tiborsimko && \
    chown -R tiborsimko:tiborsimko /code
USER tiborsimko

before starting the application.

Wash your bowl

Consider we've been developing Docker images and running Docker containers for some time. The crust may have accumulated while we have been tweaking the least bits of Dockerfile. How to clean after ourselves?

We can remove all containers by running:

$ docker rm $(docker ps -aq)
202c5f3e482e
93112fa2ad87

We can remove all "incompletely built" images by running:

$ docker images | grep none | awk '{print "docker rmi " $3;}' | sh

Conclusions

We have seen a simple example on how to start developing Python applications using Docker. A more realistic examples of Docker configurations will be committed to various inveniosofware projects in the coming days.