Sometimes, operations need to be run upon booting a Docker container, for instance when creating, configuring or pre-seeding a database.
In this post, we will see how to run CQL when starting a Scylla Docker container. These solutions could also be applied to Cassandra containers, for which Scylla is a drop-in replacement, or, more generally, containers for other services.
This solution consists of creating a Dockerfile which
runs a custom entrypoint script, wrapping the original entrypoint
script. The new script can be configured to run .cql
files
upon boot or run specific queries from within the script.
Cassandra and Scylla
Cassandra is a linearly scalable and highly available distributed column-oriented database. It originated at Facebook around 2008 and is written in Java and based on Amazon’s Dynamo paper. Cassandra uses its own SQL-like language called CQL (Cassandra Query Language) for data manipulation and retrieval.
Even though Cassandra is already pretty performant, Scylla aims to be an API-compatible replacement for Cassandra claiming higher throughput and lower latency.
The dockerfile
Create a new Dockerfile based on the Scylla image of choice.
For example’s sake, we will use scylladb/scylla:2.1.3
. You
may want to pin down a newer version.
Its contents are as follows:
FROM scylladb/scylla:2.1.3
COPY wrapper.sh /wrapper.sh
ENTRYPOINT ["/wrapper.sh"]
Nothing too wild. 👍
The wrapper script
Add a wrapper.sh
file to the same directory as the
Dockerfile — or somewhere else, but remember
to change the COPY
statement in the Dockerfile.
This bash script is where the magic happens. We can go for two routes:
- Run raw CQL in the script using
cqlsh -e
- Allow cql-files to be run on start-up using
cqlsh -f
The first approach suffices if you are only interested in
a small change, like guarantueeing a keyspace exists upon
instantiation.
The second approach is more extensible as multiple scripts
can be by writing raw CQL in cql-files and COPY
‘ing them
as needed.
Let’s have look at both.
The script-based solution
Based on a similar approach
for Cassandra,
this approach runs the CQL in wrapper.sh
.
It looks like this:
#!/bin/bash
CQL="CREATE KEYSPACE IF NOT EXISTS my_keyspace WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 1};"
echo "Executing: $CQL"
until cqlsh -e "$CQL"; do
echo "Unavailable: sleeping"
sleep 10
done &
exec /docker-entrypoint.py "$@"
If the database is not reachable (or some other error occurs!) the script will sleep for a couple of seconds and try again. At the same time, control is passed to the entry point specified by the original container.
The CQL creates a keyspace within a basic single node cluster using the simple strategy. Note the use of CREATE KEYSPACE IF NOT EXISTS This prevent errors on subsequent start-ups.
If you want some configurability, you might want to
consider making the keyspace configurable through
an environment variable, say DB_CREATE_KEYSPACE
.
No keyspace should be created if no variable is defined.
This can be achieved as follows:
#!/bin/bash
if [ -n "$DB_CREATE_KEYSPACE" ]; then
CQL="CREATE KEYSPACE IF NOT EXISTS $DB_CREATE_KEYSPACE WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 1};"
echo "Executing: $CQL"
until cqlsh -e "$CQL"; do
echo "Unavailable: sleeping"
sleep 10
done &
fi
exec /docker-entrypoint.py "$@"
The file-based Solution
This solution is also used in MySQL and MariaDB containers.
The wrapper.sh
implementation is as follows:
#!/bin/bash
for f in docker-entrypoint-initdb.d/*; do
case "$f" in
*.cql) echo "$0: running $f" &&
until cqlsh -f "$f"; do
>&2 echo "Unavailable: sleeping";
sleep 10;
done & ;;
esac
echo
done
exec /docker-entrypoint.py "$@"
The script loops over a dedicated directory and tries
to run the files present ending in .cql
.
For each file, the actions are comparable to the script-based
solution: CQL is executed or retried
after a pause if there is some error.
The entrypoint of the original
container is called during these operations.
A more complex implementation,
also accounting for other files than .cql
,
can be found in an answer on Stack Overflow.
Now, create a cql-file called create-keyspace.cql
.
In it, write the same CQL-query (DDL) as before:
CREATE KEYSPACE IF NOT EXISTS my_keyspace WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 1};
To add this file to our container, amend the Dockerfile
so the cql-file gets placed in the correct directory:
FROM scylladb/scylla:2.1.3
# Add CQL queries to be run upon boot
COPY create-keyspace.cql /docker-entrypoint-initdb.d/create-keyspace.cql
COPY wrapper.sh /wrapper.sh
ENTRYPOINT ["/wrapper.sh"]
Checking that the image works
You can test your Dockerfile by executing the following in the commandline:
docker build -t my-scylla .
# For the script-based solution (add the environment variable):
docker run -it -e DB_CREATE_KEYSPACE=my_awesome_project my-scylla
# For the file-based solution:
docker run -it my-scylla
It takes a while before Scylla is fully booted, but after that we can see the keyspace has been created.
In another terminal window, you can run the interactive CQL shell inside the container:
docker exec -it [CONTAINER_ID] cqlsh
Find out your container id by running docker ps
.
In cqlsh
, check if the keyspace is present:
-- general: shows all keyspaces
DESCRIBE keyspaces;
-- specific: checks and describes the specified keyspace
DESCRIBE keyspace my_awesome_project
Common issues
I was developing on a Windows machine and I encountered the following error:
standard_init_linux.go:190: exec user process caused "no such file or directory"
This was due to the use of Windows line-endings (CRLF
).
Bash expects Linux line-endings (LF
),
so be sure to update your .sh
files accordingly.
Another issue you might come across is CQL-related:
<stdin>:1:SyntaxException: line 1:32 : syntax error...
Unavailable: sleeping
This error pops up when you have syntax error
in your CQL code. In my case, it was the use of
dashes (-
) in my keyspace name,
which is not allowed.
Final remarks
We discussed two ways of running CQL upon booting a Scylla container, both based on introducing an entrypoint wrapper script. It took me some time figuring this out and piecing together the solutions. Hopefully this benefits someone. Keep in mind that the Scylla team will probably make all this easier to accomplish in future Scylla containers.
Thoughts?
Leave a comment below!