Docker became popular software solution permitting to deploy applications inside isolated Linux software containers. From a Python related point of view, one could consider Docker containers as "virtual environments on steroids", because they encapsulate and isolate not only application's Python pre-requisites (say given version of PyPDF2 package), but also any non-Python utilities of the operating system that the application relies on (say given version of LibreOffice). The following primer shows how to use Docker for developing Python applications.
Installation
Installing docker on Debian GNU/Linux is easy:
sudo apt-get install docker.io
The docker now runs:
$ docker info Containers: 0 Images: 289 Storage Driver: aufs Root Dir: /var/lib/docker/aufs Dirs: 289 Execution Driver: native-0.2 Kernel Version: 3.16.0-4-amd64 Operating System: Debian GNU/Linux 8 (jessie) WARNING: No memory limit support WARNING: No swap limit support
We can use it as is, however, note the memory and swap limit warning that may be good to fix before we continue.
Enabling memory/swap limit support
On Debian GNU/Linux systems, the memory limit and swap limit features can be set by configuring kernel boot parameters. This is done by editing /etc/default/grub in the following way:
$ sudo vim /etc/default/grub # edit GRUB_CMDLINE_LINUX as follows $ grep GRUB_CMDLINE_LINUX /etc/default/grub GRUB_CMDLINE_LINUX="cgroup_enable=memory swapaccount=1" $ sudo update-grub $ sudo shutdown -r now
After which the two WARNING lines disappear.
Note that memory accounting of running containers will be inspectable via cgroup and friends:
$ systemd-cgtop $ ls -l /sys/fs/cgroup/memory/system.slice
Enabling CERN DNS
One more installation related comment, of importance to inside-CERN users. The network works best when one specifies CERN DNS IPs explicitly. This can be done by using --dns parameter to the docker commands below, or else it can be done globally by means of configuring DOCKER_OPTS in the following way:
$ sudo vim /etc/default/docker # edit DOCKER_OPTS as follows $ grep DOCKER_OPTS /etc/default/docker DOCKER_OPTS="--dns 137.138.16.5 --dns 137.138.17.5 --dns 8.8.8.8 --dns 8.8.4.4" $ sudo /etc/init.d/docker restart
Throw-away Python containers
Now that Docker is installed, how to use it to develop Python applications? We can start by pulling pre-existing Python docker images from the docker registry hub:
$ docker search python $ docker pull python:2.7 $ docker pull python:3.4
This will permit us to start throw-away Python containers:
$ docker run -i -t --rm python:2.7 Python 2.7.9 (default, Jan 28 2015, 01:38:45) [GCC 4.9.1] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> 1 + 1 2
This creates interactive (-i) Python container attached to the terminal (-t) that will be removed once we quit the session (--rm).
Throw-away containers are useful to quick test Python constructs. For example, how fast are Python list comprehensions in various Python versions?
$ docker run -i -t --rm python:2.7 python -m timeit "[i for i in range(1000)]" 10000 loops, best of 3: 82.2 usec per loop $ docker run -i -t --rm python:3.3 python -m timeit "[i for i in range(1000)]" 10000 loops, best of 3: 83 usec per loop $ docker run -i -t --rm python:3.4 python -m timeit "[i for i in range(1000)]" 10000 loops, best of 3: 87.7 usec per loop
The higher the version, the slower the Python seem to be; but let's not digress again.
Simple application
Consider we are developing some simple Python application, such as web site based on Flask framework. Here is minimal "hello world" code example:
$ cat app.py from flask import Flask app = Flask(__name__) @app.route('/') def hello(): return 'Hello world' if __name__ == "__main__": app.run(host="0.0.0.0", debug=True)
with the following requirements:
$ cat requirements.txt Flask
The application is started as:
$ python app.py
It will run on http://0.0.0.0:5000 and simply greets its user.
Dockerfile
Let us build a Docker image enabling to start a container running this application. While we could start an interactive Python container as described above, install pre-requisite and save the work for later, it is best to fully automatise creation of Docker images by means of a Dockerfile.
For our simple application, the Dockerfile would look as follows:
$ cat Dockerfile FROM python:2.7 ADD requirements.txt /tmp/requirements.txt RUN pip install -r /tmp/requirements.txt ADD . /code WORKDIR /code EXPOSE 5000 CMD ["python", "app.py"]
This means we are starting from Python-2.7 Docker image, adding current requirements.txt file and run it to install Flask, then adding current directory in a /code directory in the container, working on the code there. The application will be run on port 5000 when the container starts by means of python app.py.
The docker image can be then built by running:
$ docker build -t tiborsimko/helloworld .
A new container can be instantiated out of this image as follows:
$ docker run -p 5000:5000 tiborsimko/helloworld
On the host OS, we see the web site running on port 5000 that is exposed from the container to the host system.
Another useful option is -v (for volume management) that permits to mount current working directory under /code in the container, so that we could use our preferred editor on the host machine to edit the application and see its changes live in the container. This can be achieved by -v .:/code option, but there's another way to automatise this.
docker-compose
docker-compose provides useful composition services on top of Docker that permit us to automatise building and running of containers. First install it as follows:
sudo pip install docker-compose==1.1.0-rc1
You may need to upgrade PyYAML beforehand:
sudo apt-get remove python-openssl sudo apt-get install libyaml-dev sudo pip install PyYAML
(Note that the above example replaces system Python packages with locally installed ones, which may be dangerous. A better technique would be to use pipsi that installs Python programs and their dependencies into virtual environments, permitting their better isolation from system Python package versions.)
Here is docker-compose configuration for our simple application example:
$ cat docker-compose.yml web: build: . command: python app.py ports: - "5000:5000" volumes: - .:/code
The building is then done via:
$ docker-compose build
and a container can be fired up via:
$ docker-compose up
Note how docker command line options are being stored in more readable YAML configuration, including exposing port 5000 or mounting current working directory under /code. Basically, docker-compose permits to automatise via YAML what we would otherwise have to express by hand via docker command line options.
This advantage will be even more apparent for complex applications where the application would require to link more containers together, such as the Python application running inside the web container, that is linked to a redis container caching, a db container running PostgreSQL database, and a worker container running Celery tasks.
.dockerignore
If we want to share our created image with others, it is useful to define .dockerignore file that will permit to ignore certain files or directories from being included in the built Docker image. A good example is .git: by putting it in .dockerignore, we won't expose our local unstable branches to friends, though we still retain the option to have them available for local developments by volume mounting.
Docker build cache
Why have we defined in Dockerfile the following part?
ADD requirements.txt /tmp/requirements.txt RUN pip install -r /tmp/requirements.txt ADD . /code
The requirements.txt is mounted in the third line as /code anyway, isn't it?
The reason the requirements.txt file was added explicitly before the rest of the code is the docker build cache. If we repeat the build process several times, docker caches prior layers (roughly speaking prior RUN statements) and reuse them whenever possible. For example, if requirements.txt did not change, and only our app.py changes, this means that our application requirements won't have to be installed over and over; Docker will reuse previously built layers.
Automated build cache is one of the very cool features of Docker. It makes building images and creating containers an easy, fast, and disposable process. It is therefore important to write Dockerfile in such a manner that most of the pre-requisite installation job is being done before we add our code.
Container user
If we run bash shell in the built container:
$ docker run -i -t --rm tiborsimko/helloworld bash root@06436a85c124:/code# id uid=0(root) gid=0(root) groups=0(root)
we'll see that the container process runs as root, which is not the best from the point of view of security.
It is desirable to create a new user that the application would run as. Ideally, it would be a user with the same UID as the main user of the host system, so that if we mount current directory into the container, and if the build process needs to create some files (say by running Bower and friends), all files created within the container would bear the same ownership as the files in the host system.
This can be achieved in Dockerfile via:
RUN adduser --uid 1000 --disabled-password --gecos '' tiborsimko && \ chown -R tiborsimko:tiborsimko /code USER tiborsimko
before starting the application.
Wash your bowl
Consider we've been developing Docker images and running Docker containers for some time. The crust may have accumulated while we have been tweaking the least bits of Dockerfile. How to clean after ourselves?
We can remove all containers by running:
$ docker rm $(docker ps -aq) 202c5f3e482e 93112fa2ad87
We can remove all "incompletely built" images by running:
$ docker images | grep none | awk '{print "docker rmi " $3;}' | sh
Conclusions
We have seen a simple example on how to start developing Python applications using Docker. A more realistic examples of Docker configurations will be committed to various inveniosofware projects in the coming days.