Recently, working on the Parkanizer project our team faced the problem of backing up MongoDB located within Docker container. As we are .NET guys we are using Azure as our cloud infrastructure provider. We are storing our backups in the Azure Blob storage. This time we wanted to do the same with the Mongo backups.

To make long story short, the goal is to back up the MongoDB hosted within docker container and store the result safely in Azure Blob storage.

Mongo backuper

On the Linux side of our project we are using Docker containers to keep our modules under control. Docker allows us to simplify our deployment process, make it more secure and what is more important - make it reproducible. We are trying to avoid - as much as we can - installing components directly on Linux machines. Each additional installation complicates deployment and worse - the installation can fail.

Unfortunately, to back up the Mongo and to send the backups from Linux to the Azure Blob storage we need some components installed. We have decided to use the tools we already had at our fingertips to simplify whole process - Docker.

To send files from Linux to Azure there is blobxfer - AzCopy-like tool. To use blobxfer we need Python, and to back up Mongo we need Mongo with its mongodump and mongorestore utilities. Let's prepare a Docker image for them, and call it "mongo_backuper"

FROM mongo

# Install Blobxfer
RUN     apt-get update && apt-get install python-dev libxml2-dev libxslt1-dev zlib1g-dev python3-dev libffi-dev libssl-dev python-setuptools -y
RUN     easy_install pip 
RUN     pip install blobxfer

As we can see, prepared image inherits from mongo image and additionally installs stuff necessary to run blobxfer. Now we only have to build the image and use it locally (or we can tag it and send it to our Docker Registry - if we have one - and use it everywhere).

docker build -t mongo_backuper ./

Backing up script

The tool is ready now - so let's use it to back up and store our priceless data.

#!/bin/bash

rm -rf /tmp/mongodump && mkdir /tmp/mongodump && echo "/tmp/mongodump recreated"
rm -rf /tmp/backups && mkdir /tmp/backups && echo "/tmp/backups recreated"

docker run --rm --link mongo_container:mongo \
    -e "STORAGE_ACCOUNT_NAME=#{BackupStorageAccountName}" \
    -e "CONTAINER_NAME=#{BackupStorageContainer}" \
    -e "STORAGE_ACCOUNT_KEY=#{BackupStorageKey}" \
    -v /tmp/mongodump:/tmp \
    -v /tmp/backups:/backups \
    mongo_backuper bash -c \
       'mongodump -v --host mongo:27017 --db "foos_db" --collection "foos_collection_1" --out=/tmp && \
        mongodump -v --host mongo:27017 --db "foos_db" --collection "foos_collection_2" --out=/tmp && \
        mongodump -v --host mongo:27017 --db "foos_db" --collection "foos_collection_3" --out=/tmp && \
        BACKUP_NAME=backup.$(date "+%y_%m_%d_%H_%M").tar.gz && \
        tar zcvf "/backups/$BACKUP_NAME" /tmp && \
        blobxfer $STORAGE_ACCOUNT_NAME $CONTAINER_NAME /backups/$BACKUP_NAME --storageaccountkey $STORAGE_ACCOUNT_KEY --upload'

Let's go through the script line by line to explain it a little. First two lines are responsible for creating temporary directories which will be used to store mongodump result files as well as packed and compressed backups. Each time the script is executed, temporary folders are removed and then recreated to avoid accumulation of artifacts from backing up process.

rm -rf /tmp/mongodump && mkdir /tmp/mongodump && echo "/tmp/mongodump recreated"
rm -rf /tmp/backups && mkdir /tmp/backups && echo "/tmp/backups recreated"

Next line is quite straightforward, though there are two things worth mentioning.

docker run --rm --link mongo_container:mongo \

Docker run command with --rm and --link switches is used. First one is responsible for cleaning the container files system after the container has done its job. The second one creates internal docker connection with selected container under specified alias. So the final result of the command is a temporary container connected to the one with the Mongo we want to back up.

Next three lines define environment variables storing credentials to our Azure Blob storage.

-e "STORAGE_ACCOUNT_NAME=#{BackupStorageAccountName}" \
-e "CONTAINER_NAME=#{BackupStorageContainer}" \
-e "STORAGE_ACCOUNT_KEY=#{BackupStorageKey}" \

Following two lines links temporary directories we have previously created with internal Docker container volumes. The host /tmp/mongodump directory is visible under /tmp path in Docker container. Files inside the volumes will be persisted even after the mongo_backuper container is removed.

-v /tmp/mongodump:/tmp \
-v /tmp/backups:/backups \

Final line before the backing up script itself selects the image we want to use for our container and instructs it to execute bash script with body defined by -c switch.

mongo_backuper bash -c \

And finally, after all 'infrastructure' for backing up is prepared we can have the job done. Let's examine the procedure:

Firstly, we want to dump our Mongo collections to the /tmp volume.

'mongodump -v --host mongo:27017 --db "foos_db" --collection "foos_collection_1" --out=/tmp && \
 mongodump -v --host mongo:27017 --db "foos_db" --collection "foos_collection_2" --out=/tmp && \
 mongodump -v --host mongo:27017 --db "foos_db" --collection "foos_collection_3" --out=/tmp && \

Then, we select the name for the backup archive. We are going to use current date to uniquely describe the backup.

BACKUP_NAME=backup.$(date "+%y_%m_%d_%H_%M").tar.gz && \

After that, we should prepare compressed tarball file, put it inside /backups volume...

 tar zcvf "/backups/$BACKUP_NAME" /tmp && \

...and in the end send it to the selected Azure Blob storage with blobxfer tool.

 blobxfer $STORAGE_ACCOUNT_NAME $CONTAINER_NAME /backups/$BACKUP_NAME --storageaccountkey $STORAGE_ACCOUNT_KEY --upload'

From that moment onwards consider your backup safe.

Restoring script

To restore the backup from the file located in Azure Blob storage one should use the following script:

#!/bin/bash

if [ -n "$1" ]; then
    rm -rf /tmp/restores && mkdir /tmp/restores && echo "/tmp/restores recreated"
    rm -rf /tmp/mongorestore && mkdir /tmp/mongorestore && echo "/tmp/mongorestore recreated"

    docker run --rm \
        --link mongo_container:mongo \
        -e "STORAGE_ACCOUNT_NAME=#{BackupStorageAccountName}" \
        -e "CONTAINER_NAME=#{BackupStorageContainer}" \
        -e "STORAGE_ACCOUNT_KEY=#{BackupStorageKey}" \
        -e "BACKUP_NAME=$1" \
        -v /tmp/restores:/restores \
        -v /tmp/mongorestore:/mongorestore \
        mongo_backuper bash -c \
         'blobxfer $STORAGE_ACCOUNT_NAME $CONTAINER_NAME /restores --download --remoteresource $BACKUP_NAME --storageaccountkey $STORAGE_ACCOUNT_KEY && \
          tar zxvf "/restores/$BACKUP_NAME" --directory /mongorestore && \
          mongorestore --host mongo:27017 --db "foos_db" --drop --collection "foos_collection_1" "/mongorestore/tmp/foos_db/foos_collection_1.bson" && \
          mongorestore --host mongo:27017 --db "foos_db" --drop --collection "foos_collection_2" "/mongorestore/tmp/foos_db/foos_collection_2.bson" && \
          mongorestore --host mongo:27017 --db "foos_db" --drop --collection "foos_collection_3" "/mongorestore/tmp/foos_db/foos_collection_3.bson"'
else
    echo "Usage: sudo ./restore_mongo.sh DESIRED_BACKUP_NAME.tar.gz" 
fi

Basically, the restoring process is reversed version of backing up one, but there are some nuances I will try to cover.

First of all, there is a parameter introduced. The aim of the parameter is to make the whole script more useful. We just have to specify the backup name and execute the procedure. If we forget about it, we will be reminded.

if [ -n "$1" ]; then    
    ...    
else
    echo "Usage: sudo ./restore_mongo.sh DESIRED_BACKUP_NAME.tar.gz" 
fi

Blobxfer is used to download the selected tarball from Azure Blob storage. Next step is to unpack the backup files to the prepared Docker volume. Mongorestore utility is used to bring backed up data back to life. As one can noticed, there is --drop switch used. The default behavior of mongorestore is to merge existing collection's entities with backed up ones - so if there is an entity with same id inside the current collection and the backup, the entity won't be restored. The --drop switch causes the collection to be removed before being restored.

mongorestore --host mongo:27017 --db "foos_db" --drop --collection "foos_collection_1" "/mongorestore/tmp/foos_db/foos_collection_1.bson" && \
mongorestore --host mongo:27017 --db "foos_db" --drop --collection "foos_collection_2" "/mongorestore/tmp/foos_db/foos_collection_2.bson" && \
mongorestore --host mongo:27017 --db "foos_db" --drop --collection "foos_collection_3" "/mongorestore/tmp/foos_db/foos_collection_3.bson"'

Automation

Now, when the scripts are ready, we would like to automate whole backing up process. It can be achived easily with the Linux Cron. Simply run the command:

sudo crontab -e

And add the following line to the opend file:

0 0 * * * /path/to/your/backup_mongo.sh >> /var/log/mongo_backups.log 2>&1

This line instructs Cron to execute backup_mongo.sh script each day at midnight. Moreover, Cron will log both standard output and standard error to mongo_backups.log file. More details how to configure Cron can be found here.

About the author:

Damian Krychowski

Damian is a software developer at BT Skyrise. He specializes in .NET technology, yet he likes challenges and willingly discovers other languages and platforms. Damian is a passionate engineer able to work on different levels of abstraction - ranging from embedded systems prototyping to cloud computing and architecture. Open source enthusiast.

Next Post Previous Post

blog comments powered by Disqus