I have been using Docker for a lot of things. For example, docker-compose is extremely useful for quickly setting up a service-based development environment.
Recently, I had the need to run a command once in a Docker image and do something with the file output. For instance, this is quite useful when encapsulating a compilation step or when processing data from one file to another using a docker image, so a user only needs docker as a dependency and the image does the rest.
In this short post, we will explore using files in one-off docker commands with and without defining a custom Dockerfile.
What we want to achieve
To simulate compilation or data processing from one file to another, we are going to create a Python 3 script that reads input from a file, reverses it, and outputs it to another file. Although we are using some Python code here, the principles apply to any step involving file input and output.
Create a file main.py
, and write the following code in it:
import os
currentDir = os.path.dirname(__file__)
inputPath = f"{currentDir}/input/input.txt"
outputPath = f"{currentDir}/output/output.txt"
with open(outputPath, "w") as outputFile:
with open(inputPath) as inputFile:
content = inputFile.read()
outputFile.write(content[::-1])
This code reads input from input/input.txt
,
reverses its contents and outputs it to /output/output.txt
.
As you can see, the data is expected to be located relative
to the script itself.
If you have Python 3 installed, you can test this
by creating the directories input
and output
relative to main.py
and put something create a file
input/input.txt
.
Write something in this file, for example: “Hello World”.
To test this, run python main.py
and checkout
the newly created output/output.txt
.
As expected, It contains “dlroW olleH”.
Remove output/output.txt
before moving on.
Using Docker
What if we don’t have Python 3 installed and we don’t want to install it? This is where Docker can help.
In short, Docker is a means of containerizing an application and bundling it with all its dependencies. It is somewhat similar to virtualization in that almost no assumption is being made about the host machine that is running it, but it is more light-weight because it does not requiring packaging an entire operating system.
Install Docker if you need to and let’s see how we can run our code once using Docker.
Method 1: without custom Docker image
In order to run a command in a new docker container, we can use the following command:
docker run [OPTIONS] IMAGE [COMMAND] [ARG...]
The image we want to use is a small Python 3 image.
We can find one in the container registry on the
Docker hub. At the time of writing,
python:3.7-slim
seems like a good contender.
It is a good practice to keep containers as small
as possible. Because we do not need any other
dependencies, the slim
image will suffice.
Let’s try it out on our main.py
. Make sure
you go to the directory main.py
is located in and
run the following:
docker run python:3.7-slim python main.py
python: can't open file 'main.py': [Errno 2] No such file or directory
The problem here is that the python executable
inside the container has no knowledge
of the main.py
file outside
of the container.
In order to solve this,
we have to mount a volume
inside the container.
We can do this using the -v
or --volume
option.
To get the current directory we can use $pwd
in bash
or powershell.
Let’s mount our file as /var/main.py
in the container.
Be sure to
the command to python /var/main.py
as well:
docker run -v $pwd/main.py:/var/main.py python:3.7-slim python /var/main.py
Traceback (most recent call last):
File "/var/main.py", line 7, in <module>
with open(outputPath, "w") as outputFile:
FileNotFoundError: [Errno 2] No such file or directory: '/var/output/output.txt'
As you can see, we have succeeded in running our script,
but we still get an error. /var/output/output.txt
cannot be found.
Mounting is not restricted to files, we can mount directories as well.
Let’s mount our output and input directories.
docker run -v $pwd/main.py:/var/main.py -v $pwd/output:/var/output -v $pwd/input:/var/input python:3.7-slim python /var/main.py
Our script is quiet! That’s good news, isn’t it?
Let’s check out our output/output.txt
file on our host machine:
cat output/output.txt
dlroW olleH
Using a bind-mount
The --mount
option gives a bit more control over
the type of mount that should be used. In our case
we want to read from a certain location and write to
another location.
Method 2: with custom Docker image
You may want to create your own reusable Docker image if you have more complex dependencies in for your command. Although overkill for our tiny command, a custom image is interesting when performing some kind of multi-step process involving several programs.
We can define a custom image by writing our own Dockerfile
.
Let’s base it on the same image as we used before.
FROM python:3.7-slim
CMD ["python", "/var/main.py"]
We can then build our image using the build command:
docker build [OPTIONS] PATH | URL | -
For our project, we will tag our image as
py-reverse
and use the current directory (.
):
docker build -t py-reverse .
To see the images currently installed on our system, you can use the images command:
docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
py-reverse latest a930c64cb9df 15 minutes ago 143MB
Our image can now be instantiated as a container by invoking the run command again:
docker run -v $pwd/main.py:/var/main.py -v $pwd/output:/var/output -v $pwd/input:/var/input py-reverse
You can add extra dependencies to the docker image by
running the installation commands when building. This
is done by adding RUN
commands to the Dockerfile.
For instance, if we needed g++
for some reason, we
could add it:
FROM python:3.7-slim
RUN apt-get install g++
CMD ["python", "/var/main.py"]
Keep in mind that RUN
is only executed when building
an image, not when running it as a container.
Do not forget to rebuild your images
when updating your Dockerfile.
Limiting mount access
In our current mounting strategy, the container
is allowed to write to every bound volume.
This might be a bit too much as we only want
to write to the output
directory.
The other directories should be marked as read-only.
One way to do this (and provide other details) is to
use the --mount
[option][docker-build-mount].
If we don’t need a custom image,
it looks like this:
docker run --mount type=bind,readonly=true,src=$pwd/main.py,dst=/var/main.py --mount type=bind,readonly=true,src=$pwd/input,dst=/var/input --mount type=bind,src=$pwd/output,dst=/var/output python:3.7-slim python /var/main.py
For our custom image, after we build and tag our image, we can use the container with a specified mount as follows:
docker run --mount type=bind,readonly=true,src=$pwd/main.py,dst=/var/main.py --mount type=bind,readonly=true,src=$pwd/input,dst=/var/input --mount type=bind,src=$pwd/output,dst=/var/output py-reverse
Although more verbose, it gives us more options than
-v
.
Cleaning up
That’s it for
We may want to remove the images we have downloaded in
order to clean up. Docker has the rmi command
for this purpose. You can remove images by referring to
their image id. We can find this out by using the docker images
command.
Now, we can remove these images by referring to their image id or a part of it.
You may get a notice that an image cannot be
deleted.
If possible, you can force delete the by
adding the -f
option
or removing the container first by using the
rm command.
To list all active
docker containers, you can use the [ls command]
docker-ls. In order to view all containers
(even those that have been stopped), add the
-a
option.
Remove as many containers and images as you like and your done!
Conclusion
In this post we have seen how to run
file-dependent one-off
commands in Docker using volume mounts. Along
the way,
we created a Dockerfile
and came across the
docker build
, docker images
, docker run
,
docker rmi
, docker rm
, docker ps
commands. We have also seen that we can find images
on a container registry like Docker hub.
Thoughts?
Leave a comment below!