April 23, 2019

Using Files in One-off Docker Images

I have been using Docker for a lot of things. For example, docker-compose is extremely useful for quickly setting up a service-based development environment.

Recently, I had the need to run a command once in a Docker image and do something with the file output. For instance, this is quite useful when encapsulating a compilation step or when processing data from one file to another using a docker image, so a user only needs docker as a dependency and the image does the rest.

In this short post, we will explore using files in one-off docker commands with and without defining a custom Dockerfile.

What we want to achieve

To simulate compilation or data processing from one file to another, we are going to create a Python 3 script that reads input from a file, reverses it, and outputs it to another file. Although we are using some Python code here, the principles apply to any step involving file input and output.

Create a file main.py, and write the following code in it:

import os

currentDir = os.path.dirname(__file__)
inputPath = f"{currentDir}/input/input.txt"
outputPath = f"{currentDir}/output/output.txt"

with open(outputPath, "w") as outputFile:
    with open(inputPath) as inputFile:
        content = inputFile.read()
        outputFile.write(content[::-1])

This code reads input from input/input.txt, reverses its contents and outputs it to /output/output.txt. As you can see, the data is expected to be located relative to the script itself.

If you have Python 3 installed, you can test this by creating the directories input and output relative to main.py and put something create a file input/input.txt. Write something in this file, for example: “Hello World”.

To test this, run python main.py and checkout the newly created output/output.txt. As expected, It contains “dlroW olleH”.

Remove output/output.txt before moving on.

Using Docker

What if we don’t have Python 3 installed and we don’t want to install it? This is where Docker can help.

In short, Docker is a means of containerizing an application and bundling it with all its dependencies. It is somewhat similar to virtualization in that almost no assumption is being made about the host machine that is running it, but it is more light-weight because it does not requiring packaging an entire operating system.

Install Docker if you need to and let’s see how we can run our code once using Docker.

Method 1: without custom Docker image

In order to run a command in a new docker container, we can use the following command:

docker run [OPTIONS] IMAGE [COMMAND] [ARG...]

The image we want to use is a small Python 3 image. We can find one in the container registry on the Docker hub. At the time of writing, python:3.7-slim seems like a good contender. It is a good practice to keep containers as small as possible. Because we do not need any other dependencies, the slim image will suffice.

Let’s try it out on our main.py. Make sure you go to the directory main.py is located in and run the following:

docker run python:3.7-slim python main.py

python: can't open file 'main.py': [Errno 2] No such file or directory

The problem here is that the python executable inside the container has no knowledge of the main.py file outside of the container.

In order to solve this, we have to mount a volume inside the container. We can do this using the -v or --volume option. To get the current directory we can use $pwd in bash or powershell.

Let’s mount our file as /var/main.py in the container. Be sure to the command to python /var/main.py as well:

docker run -v $pwd/main.py:/var/main.py python:3.7-slim python /var/main.py

Traceback (most recent call last):
  File "/var/main.py", line 7, in <module>
    with open(outputPath, "w") as outputFile:
FileNotFoundError: [Errno 2] No such file or directory: '/var/output/output.txt'

As you can see, we have succeeded in running our script, but we still get an error. /var/output/output.txt cannot be found. Mounting is not restricted to files, we can mount directories as well. Let’s mount our output and input directories.

docker run -v $pwd/main.py:/var/main.py -v $pwd/output:/var/output -v $pwd/input:/var/input python:3.7-slim python /var/main.py

Our script is quiet! That’s good news, isn’t it? Let’s check out our output/output.txt file on our host machine:

cat output/output.txt

dlroW olleH

Using a bind-mount

The --mount option gives a bit more control over the type of mount that should be used. In our case we want to read from a certain location and write to another location.

Method 2: with custom Docker image

You may want to create your own reusable Docker image if you have more complex dependencies in for your command. Although overkill for our tiny command, a custom image is interesting when performing some kind of multi-step process involving several programs.

We can define a custom image by writing our own Dockerfile. Let’s base it on the same image as we used before.

FROM python:3.7-slim

CMD ["python", "/var/main.py"]

We can then build our image using the build command:

docker build [OPTIONS] PATH | URL | -

For our project, we will tag our image as py-reverse and use the current directory (.):

docker build -t py-reverse .

To see the images currently installed on our system, you can use the images command:

docker images

REPOSITORY    TAG      IMAGE ID         CREATED             SIZE
py-reverse    latest   a930c64cb9df     15 minutes ago      143MB

Our image can now be instantiated as a container by invoking the run command again:

docker run -v $pwd/main.py:/var/main.py -v $pwd/output:/var/output -v $pwd/input:/var/input py-reverse

You can add extra dependencies to the docker image by running the installation commands when building. This is done by adding RUN commands to the Dockerfile. For instance, if we needed g++ for some reason, we could add it:

FROM python:3.7-slim

RUN apt-get install g++

CMD ["python", "/var/main.py"]

Keep in mind that RUN is only executed when building an image, not when running it as a container. Do not forget to rebuild your images when updating your Dockerfile.

Limiting mount access

In our current mounting strategy, the container is allowed to write to every bound volume. This might be a bit too much as we only want to write to the output directory. The other directories should be marked as read-only.

One way to do this (and provide other details) is to use the --mount [option][docker-build-mount]. If we don’t need a custom image, it looks like this:

docker run --mount type=bind,readonly=true,src=$pwd/main.py,dst=/var/main.py --mount type=bind,readonly=true,src=$pwd/input,dst=/var/input --mount  type=bind,src=$pwd/output,dst=/var/output python:3.7-slim python /var/main.py

For our custom image, after we build and tag our image, we can use the container with a specified mount as follows:

docker run --mount type=bind,readonly=true,src=$pwd/main.py,dst=/var/main.py --mount type=bind,readonly=true,src=$pwd/input,dst=/var/input --mount  type=bind,src=$pwd/output,dst=/var/output py-reverse

Although more verbose, it gives us more options than -v.

Cleaning up

That’s it for We may want to remove the images we have downloaded in order to clean up. Docker has the rmi command for this purpose. You can remove images by referring to their image id. We can find this out by using the docker images command. Now, we can remove these images by referring to their image id or a part of it.

You may get a notice that an image cannot be deleted. If possible, you can force delete the by adding the -f option or removing the container first by using the rm command. To list all active docker containers, you can use the [ls command] docker-ls. In order to view all containers (even those that have been stopped), add the -a option.

Remove as many containers and images as you like and your done!

Conclusion

In this post we have seen how to run file-dependent one-off commands in Docker using volume mounts. Along the way, we created a Dockerfile and came across the docker build, docker images, docker run, docker rmi, docker rm, docker ps commands. We have also seen that we can find images on a container registry like Docker hub.

A. Rothuis