May 20, 2018

Running CQL when Booting a Scylla Container

Sometimes, operations need to be run upon booting a Docker container, for instance when creating, configuring or pre-seeding a database.

In this post, we will see how to run CQL when starting a Scylla Docker container. These solutions could also be applied to Cassandra containers, for which Scylla is a drop-in replacement, or, more generally, containers for other services.

This solution consists of creating a Dockerfile which runs a custom entrypoint script, wrapping the original entrypoint script. The new script can be configured to run .cql files upon boot or run specific queries from within the script.

Cassandra and Scylla

Cassandra is a linearly scalable and highly available distributed column-oriented database. It originated at Facebook around 2008 and is written in Java and based on Amazon’s Dynamo paper. Cassandra uses its own SQL-like language called CQL (Cassandra Query Language) for data manipulation and retrieval.

Even though Cassandra is already pretty performant, Scylla aims to be an API-compatible replacement for Cassandra claiming higher throughput and lower latency.

The dockerfile

Create a new Dockerfile based on the Scylla image of choice. For example’s sake, we will use scylladb/scylla:2.1.3. You may want to pin down a newer version.

Its contents are as follows:

FROM scylladb/scylla:2.1.3

COPY wrapper.sh /wrapper.sh

ENTRYPOINT ["/wrapper.sh"]

Nothing too wild. 👍

The wrapper script

Add a wrapper.sh file to the same directory as the Dockerfile — or somewhere else, but remember to change the COPY statement in the Dockerfile.

This bash script is where the magic happens. We can go for two routes:

Run raw CQL in the script using cqlsh -e
Allow cql-files to be run on start-up using cqlsh -f

The first approach suffices if you are only interested in a small change, like guarantueeing a keyspace exists upon instantiation. The second approach is more extensible as multiple scripts can be by writing raw CQL in cql-files and COPY‘ing them as needed.

Let’s have look at both.

The script-based solution

Based on a similar approach for Cassandra, this approach runs the CQL in wrapper.sh.

It looks like this:

#!/bin/bash

CQL="CREATE KEYSPACE IF NOT EXISTS my_keyspace WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 1};"
echo "Executing: $CQL"

until cqlsh -e "$CQL"; do
    echo "Unavailable: sleeping"
    sleep 10
done &

exec /docker-entrypoint.py "$@"

If the database is not reachable (or some other error occurs!) the script will sleep for a couple of seconds and try again. At the same time, control is passed to the entry point specified by the original container.

The CQL creates a keyspace within a basic single node cluster using the simple strategy. Note the use of CREATE KEYSPACE IF NOT EXISTS This prevent errors on subsequent start-ups.

If you want some configurability, you might want to consider making the keyspace configurable through an environment variable, say DB_CREATE_KEYSPACE. No keyspace should be created if no variable is defined. This can be achieved as follows:

#!/bin/bash

if [ -n "$DB_CREATE_KEYSPACE" ]; then
    CQL="CREATE KEYSPACE IF NOT EXISTS $DB_CREATE_KEYSPACE WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 1};"
    echo "Executing: $CQL"

    until cqlsh -e "$CQL"; do
        echo "Unavailable: sleeping"
        sleep 10
    done &
fi

exec /docker-entrypoint.py "$@"

The file-based Solution

This solution is also used in MySQL and MariaDB containers.

The wrapper.sh implementation is as follows:

#!/bin/bash

for f in docker-entrypoint-initdb.d/*; do
    case "$f" in
        *.cql)    echo "$0: running $f" && 
            until cqlsh -f "$f"; do 
                >&2 echo "Unavailable: sleeping"; 
                sleep 10; 
            done & ;;
    esac
    echo
done

exec /docker-entrypoint.py "$@"

The script loops over a dedicated directory and tries to run the files present ending in .cql. For each file, the actions are comparable to the script-based solution: CQL is executed or retried after a pause if there is some error. The entrypoint of the original container is called during these operations.

A more complex implementation, also accounting for other files than .cql, can be found in an answer on Stack Overflow.

Now, create a cql-file called create-keyspace.cql. In it, write the same CQL-query (DDL) as before:

CREATE KEYSPACE IF NOT EXISTS my_keyspace WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 1};

To add this file to our container, amend the Dockerfile so the cql-file gets placed in the correct directory:

FROM scylladb/scylla:2.1.3

# Add CQL queries to be run upon boot
COPY create-keyspace.cql /docker-entrypoint-initdb.d/create-keyspace.cql

COPY wrapper.sh /wrapper.sh

ENTRYPOINT ["/wrapper.sh"]

Checking that the image works

You can test your Dockerfile by executing the following in the commandline:

docker build -t my-scylla .

# For the script-based solution (add the environment variable):
docker run -it -e DB_CREATE_KEYSPACE=my_awesome_project my-scylla

# For the file-based solution:
docker run -it my-scylla

It takes a while before Scylla is fully booted, but after that we can see the keyspace has been created.

In another terminal window, you can run the interactive CQL shell inside the container:

docker exec -it [CONTAINER_ID] cqlsh

Find out your container id by running docker ps. In cqlsh, check if the keyspace is present:

-- general: shows all keyspaces
DESCRIBE keyspaces;

-- specific: checks and describes the specified keyspace
DESCRIBE keyspace my_awesome_project

Common issues

I was developing on a Windows machine and I encountered the following error:

standard_init_linux.go:190: exec user process caused "no such file or directory"

This was due to the use of Windows line-endings (CRLF). Bash expects Linux line-endings (LF), so be sure to update your .sh files accordingly.

Another issue you might come across is CQL-related:

<stdin>:1:SyntaxException: line 1:32 : syntax error...

Unavailable: sleeping

This error pops up when you have syntax error in your CQL code. In my case, it was the use of dashes (-) in my keyspace name, which is not allowed.

Final remarks

We discussed two ways of running CQL upon booting a Scylla container, both based on introducing an entrypoint wrapper script. It took me some time figuring this out and piecing together the solutions. Hopefully this benefits someone. Keep in mind that the Scylla team will probably make all this easier to accomplish in future Scylla containers.

A. Rothuis