Backing Up Your Data with Rclone

19 April 2024 | Mustafa Can Yücel | RClone

Importance of Backups

I believe it is unnecessary trying to explain the importance of backing up your data in 2024. So we will directly jump into the juicy stuff. There are many ways to back up your data, but one of the best ways is to use Rclone. It is a command line program that allows you to sync files and directories to and from various cloud storage providers.

Installing RClone

For Debian based systems, you can either install it via apt or download the binary from the official website. At the time of this post, the apt repo has version 1.60.1+dfsg-2+b5 and the latest version is 1.66.0. It is common for the apt repo to be a bit behind the latest version; Debian does not update the versions before testing them thoroughly; this is both a curse and a blessing. It is good because you can be sure that the version in the repo is stable, but it is bad because you may miss out on some new features. This was a tremendous boon in case of the last XZ vulnerability; Debian was not affected because they were using an older version of XZ. If you do not heard about this attack that almost broke the whole internet, I strongly suggest you to read about it (Ars Technica is a good starting point).

If you want to use the latest version, you can follow the instructions on the official page for the most up to date information. I prefer the apt version because it can be updated with the other apps regularly. If you are also okay with the version in the repo, you can install it with the following command:

sudo apt install rclone

Configuring RClone Backend

RClone supports ~70 different cloud storage providers. You can see the full list and how to configure them here. I will show you how to configure RClone with Google Drive (not Google Cloud Storage).

Google Drive Configuration (not Google Cloud Storage)

The most current information can be found on the official page. I will not reiterate the same content here; running rclone config walks you through the configuration process very well. Note that this process requires you to create a new Google Cloud app and few additional steps that has to be done (which are explained in the terminal). The only difference from the official page is that it will ask Use web browser to automatically authenticate rclone with remote? and since we are working on a headless server, you should answer no to this question. It will give you a link to authenticate your account and a code to enter. After you enter the code in your own computer, it will ask you to name the remote. You can name it whatever you want, but I suggest you to name it something that you can remember easily.

In the end, your configuration file should look like this:

[remoteName]
type = drive
client_id = <someid>.apps.googleusercontent.com
client_secret = <somesecret>
scope = drive.file
token = {"access_token":"<token>","token_type":"Bearer","refresh_token":"1//<data>","expiry":"<date>"}
team_drive =

In order to find your configuration file you can run the following command:

rclone config file

Once this configuration is completed, we can start creating the backup scripts.

Creating Backup scripts

The approach I like to follow is to create an individual script file for each backup job. Then we will create a master script that runs all the individual scripts in a given directory. This way, you can easily add or remove backup jobs without affecting the other jobs and changing the master script.

The master script will execute all the jobs (i.e. individual scripts) in a given directory. It will also collect the outputs of the individual scripts in addition to storing any errors encountered. In the end, it will send a Slack notification on the individual script's status, and if there are any errors, it will save an error log.

Directory Structure

We will create a directory structure like this:

.
    ├── jobs
    │   ├── job1.sh
    │   ├── job2.sh
    │   ├── job3.sh
    ├── log
    │   ├── errors.2024-03-15_10:41:21
    │   └── errors.2024-04-19_00:02:18
    └── master.sh

All the individual scripts will be in the jobs directory, and the error logs will be in the log directory. The master script will be in the root directory. The root directory can be anywhere you want; I prefer to put it in the home directory, under backup.

The Master Script

The master script will look like this:

#!/bin/bash

# Slack webhook URL
SLACK_WEBHOOK_URL="https://hooks.slack.com/services/T06XXXXXXXXX/B06XXXXXX/XXXXXXXXXXXXXX"

# Initialize the message variable
MESSAGE="🚀 Today's backup operation results arrived!\n"
ERRORS=""

# Function to execute a job (backup script) and append the result to the MESSAGE
execute_backup_script() {
    local script_path="$1"

    # Execute the script and capture the output and errors
    output=$(bash "$script_path" 2>&1)
    exit_status=$?

    # Check exit status
    if [ $exit_status -eq 0 ]; then
        MESSAGE="$MESSAGE\n✅ Success: $(basename "$script_path") backup completed."
    else
        MESSAGE="$MESSAGE\n❌ Failure: $(basename "$script_path") backup failed."
        ERRORS="$ERRORS\nFailure: $(basename "$script_path") backup failed. Output:\n$output"
    fi
}

# Loop through all the scripts in the 'jobs' subfolder and execute them
for script in /home/user/backups/jobs/*.sh; do
    execute_backup_script "$script"
done

# Write errors to log file, if any
if [ -n "$ERRORS" ]; then
    echo -e "$ERRORS" > "/home/user/backup/log/errors.$(date +"%Y-%m-%d_%H:%M:%S")"
fi

# wrap it up
MESSAGE="$MESSAGE\n\nHave a nice day!🎉"

# Send the accumulated message to Slack
curl -X POST -H 'Content-type: application/json' --data "{\"text\":\"$MESSAGE\"}" "$SLACK_WEBHOOK_URL"

Let's explain this script in a little more detail step by step:

Shebang line: #!/bin/bash indicates that this script should be run using the Bash shell.
Setting Slack webhook URL: The Slack webhook URL is set to a specific value. This URL is used to send messages to a specific Slack channel or user.
Initializing variables:
- MESSAGE: This variable holds the main message that will be sent to Slack. It starts with a rocket emoji and a header indicating that it's about backup operation results.
- ERRORS: This variable will accumulate any errors encountered during the backup operation.
execute_backup_script function: This function takes a script path as an argument, executes the script, captures its output and errors, and updates the MESSAGE and ERRORS variables accordingly.
Loop through backup scripts: It loops through all the .sh files in the /home/user/backup/jobs/ directory and executes each of them using the execute_backup_script function.
Write errors to log file: If there are any errors encountered during the execution of backup scripts, they are written to a log file in the /home/user/backup/log/ directory. The log file's name includes the current date and time.
Appending final message: A closing message "Have a nice day!🎉" is appended to the MESSAGE variable.
Sending message to Slack: Using curl, a POST request is sent to the Slack webhook URL with the MESSAGE as the payload. The message is formatted as JSON with a text field containing the message.

Once the script is ready, you can save it as master.sh in the root directory. Don't forget to make it executable:

chmod +x master.sh

Slack Integration

The above script uses a Slack Webhook URL to send messages to a Slack channel or user. You can create a new Slack app and set up an incoming webhook to get the URL. You can follow the official documentation to create a new app and set up a webhook. Once you have the webhook URL, you can replace the SLACK_WEBHOOK_URL variable in the script with your webhook URL.

Individual Backup Scripts

The content of a job script will depend on what kind of source data it is backing up:

File/Folder backups
SQL Database backups (other than SQLite)
SQLite backups
Docker Volumes
etc.

If you are backing up the data of your Docker containers, you should inspect the compose file or the official documentation to find out what kind of persistent data storage is used. We will give basic examples for each type of the most common storages.

One important point is that this guide assumes the files to be backed up will not be in use during the backup process. For this reason, we will execute the backup scripts at a time when the files are not expected being modified. This will work for services that have limited users who are mainly in similar time zones. If you need to back up files that are constantly in use or you have a large user base with diverse locations, you should consider using snapshotting or file locking (which we will cover in another post). For databases, we will use the database's built-in backup/dump tools.

All the scripts will be in the jobs directory and their names should be unique and descriptive of the backup job they perform since the master script will use the script names in the Slack message.

File/Folder Backups

As an example, let's consider the excellent note taking app Silverbullet, which stores your notes as markdown files in the directory that you have stated in your container configuration. Let's say our Silverbullet container is using the directory /var/lib/silverbullet to store the notes. We can create a backup script like this:

#!/bin/bash

# Set the paths
LOCAL_BACKUP_DIR="/var/lib/silverbullet"
REMOTE_DRIVE_NAME="gdrive"  # This should match the name you gave when configuring rclone

# Sync the backup directory to Google Drive
rclone sync "$LOCAL_BACKUP_DIR" "$REMOTE_DRIVE_NAME:Backups/Silverbullet"

The above script will always synchronize the /var/lib/silverbullet directory with the Backups/Silverbullet directory on Google Drive.

Excluding Files/Folders

If you want to exclude certain files or folders from the backup, you can use the --exclude flag with rclone sync. A common use case is excluding log files or temporary files that don't need to be backed up. These files and folders usually start with a dot (.) or have a specific extension. For example, to exclude all files and folders starting with a dot, you can use the following script:

#!/bin/bash
# Set the paths
LOCAL_BACKUP_DIR="/home/user/containers/radicale"
REMOTE_DRIVE_NAME="gdrive"  # This should match the name you gave when configuring rclone

# Sync the backup directory to Google Drive
rclone sync "$LOCAL_BACKUP_DIR" "$REMOTE_DRIVE_NAME:Backups/Radicale" --exclude '.*{/**,}'

For more fine-grained control over what to exclude, you can use multiple --exclude flags with different patterns. It is best to refer to the official documentation for more information on filtering and excluding files and folders.

PostgreSQL Database Backups

PostgreSQL has a built-in tool called pg_dump that can be used to create backups of databases. As an example, let's say we want to backup our WikiJS container data which has a database container named wikijs-db-1. We can create a backup script like this:

#!/bin/bash
# Set the date format for the backup file
DATE_FORMAT=$(date +"%Y%m%d%H%M%S")

# Dump PostgreSQL database
docker exec wikijs-db-1 pg_dump wiki -U wikijs -F c > wikibackup.dump

# Sync dump file to Google Drive using rclone
rclone copy wikibackup.dump "gdrive:Backups/WikiJS"

# Optional: Remove the local dump file if you want to save space
rm wikibackup.dump

This will create a dump file of the database in the wikibackup.dump file and then copy it to the Backups/WikiJS directory on Google Drive. You can also remove the local dump file after the backup is complete to save space. If you are not going to remove the local dump file, you can put it in a directory so that they can be easily identified and removed later.

Docker Volumes

There are different approaches to backing up Docker volumes; the one we will use is creating a temporary BusyBox container and copying the contents of the volume to a tarball. As an example, let's say we have a Hoarder container (which is a bookmark storage service) that uses a volume named hoarder_data for persistent storage. We can create a backup script like this:

#!/bin/bash
# Set the paths and variables
DOCKER_VOLUME="hoarder_data"
BACKUP_DIR="/home/user/backup/hoarder"
BACKUP_FILENAME="data_volume_backup.tar.gz"

# Create a temporary container to mount the volume and create a backup
docker run --rm -v "$DOCKER_VOLUME":/volume -v "$BACKUP_DIR":/backup busybox tar -czf "/backup/$BACKUP_FILENAME" -C /volume .

REMOTE_DRIVE_NAME="gdrive"  # This should match the name you gave when configuring rclone
rclone sync "$BACKUP_DIR" "$REMOTE_DRIVE_NAME:Backups/Hoarder"

Let's break down each part of the code:

Setting paths and variables:
- DOCKER_VOLUME: Specifies the Docker volume name. This volume will be used to back up data from.
- BACKUP_DIR: Specifies the directory where the backup will be stored on the host machine.
- BACKUP_FILENAME: Specifies the name of the backup file.
Creating a temporary container to mount the volume and create a backup:
- docker run: Command to run a Docker container.
- --rm: Flag to automatically remove the container after it exits.
- -v "$DOCKER_VOLUME":/volume: Mounts the Docker volume specified by DOCKER_VOLUME to the /volume directory inside the container.
- -v "$BACKUP_DIR":/backup: Mounts the backup directory on the host machine specified by BACKUP_DIR to the /backup directory inside the container.
- busybox: Specifies the Docker image to use for the container. In this case, it's a minimalistic BusyBox image.
- tar -czf "/backup/$BACKUP_FILENAME" -C /volume .: Runs the tar command inside the container to create a gzipped tarball of the contents of the mounted volume (/volume) and saves it to the specified backup directory with the specified filename.
Syncing the backup to a remote location using rclone:
- REMOTE_DRIVE_NAME: Specifies the name of the remote drive configured in rclone.
- rclone sync "$BACKUP_DIR" "$REMOTE_DRIVE_NAME:Backups/Hoarder": Uses rclone to synchronize the contents of the local backup directory (BACKUP_DIR) with the specified remote drive (REMOTE_DRIVE_NAME). The backups are synced to the directory Backups/Hoarder on the remote drive.

SQLite Backups

The easier but less efficient way to back up SQLite databases is to copy the database file to a backup location. As an example, let's say we have a container running a SQLite database for a simple blog application. We can create a backup script like this:

#!/bin/bash
# Set the paths and variables
SQLITE_DB_FILE="/var/lib/blog/blog.db"
BACKUP_DIR="/home/user/backup/blog"
BACKUP_FILENAME="blog.db"

# Copy the SQLite database file to the backup directory
cp "$SQLITE_DB_FILE" "$BACKUP_DIR/$BACKUP_FILENAME"

REMOTE_DRIVE_NAME="gdrive"  # This should match the name you gave when configuring rclone
rclone sync "$BACKUP_DIR" "$REMOTE_DRIVE_NAME:Backups/Blog"

Remember that in order to use this approach, the database file should not be in use during the backup process. If the database is constantly being written to, you should consider using SQLite's built-in backup functionality or a more sophisticated backup strategy.

A more advanced scenario is using SQLite backup functionality to create a backup file and then sync it to the cloud. SQLite provices a mechanism called Online Backup API. Below is an example Python script that uses this API to create a backup file:

import sqlite3
def backup_database(source_db, dest_db):
    source_conn = sqlite3.connect(source_db)
    dest_conn = sqlite3.connect(dest_db)

    # Initialize the backup process
    backup = sqlite3.backup(source_conn, dest_conn)

    # Perform the backup
    backup.step(-1)

    # Finalize the backup
    backup.finish()

    # Close connections
    source_conn.close()
    dest_conn.close()

# Example usage
source_db = 'source.db'
dest_db = 'backup.db'
backup_database(source_db, dest_db)

Scheduling Automated Backups

Once the job scripts are also completed, we should test the master script to see if it works as expected. If it does, we can schedule it to run automatically at a specific time using crontab. For testing, we simply run the master script (remember that all the scripts should be executable with chmod +x script_name.sh):

./master.sh

If no error occurs, you can add the script to crontab to run it automatically at a specific time. You can open the crontab file with the following command:

crontab -e

Then add the following line to run the master script every day at 3 AM:

0 3 * * * /home/user/backup/master.sh

This will run the master script every day at 3 AM. You can adjust the timing according to your needs. You can also add more jobs to the master script and schedule them to run at different times.

That's it! Once the master script is scheduled to run automatically, you can sit back and relax knowing that your data is being backed up regularly. You can also monitor the Slack channel for any notifications about the backup status. Remember to periodically check the backup logs and verify that the backups are being created successfully.

Restoring Backups

Restoring backups is as important as creating them. You should periodically test your backup and restore process to ensure that your data can be recovered in case of a disaster. The restore process will depend on the type of data you are backing up and the backup strategy you are using. For example, if you are backing up files/folders, you can simply copy the files back to their original location. If you are backing up databases, you can use the appropriate restore command for the database system you are using.

Restoring File/Folder Backups

To restore file/folder backups, you can simply copy the backed-up files/folders back to their original location. You can use the cp or rsync command to copy the files/folders from the backup location to the original location. Make sure to overwrite any existing files/folders with the backed-up data.

Restoring SQLite Database Backups

Simple File Backups

If you are using the simple file backup approach for SQLite databases, you can restore the database by copying the backup file back to the original location. You can use the cp command to copy the backup file to the original database file location. Make sure to overwrite the original database file with the backup file.

Using SQLite's Online Backup API

If you are using SQLite's Online Backup API to create backups, what we do to restore is just backup the backup file to the original database file. Below is an example Python script that uses the Online Backup API to restore a database (it is simply the same script above with the source and destination reversed):

# Python
import sqlite3
def restore_database(source_db, dest_db):
    source_conn = sqlite3.connect(source_db)
    dest_conn = sqlite3.connect(dest_db)

    # Initialize the restore process
    backup = sqlite3.backup(dest_conn, source_conn)

    # Perform the restore
    backup.step(-1)

    # Finalize the restore
    backup.finish()

    # Close connections
    source_conn.close()
    dest_conn.close()

# Example usage
source_db = 'backup.db'  # Backup file
dest_db = 'original.db'  # Original database file
restore_database(source_db, dest_db)

Restoring PostgreSQL Database Backups

To restore a PostgreSQL database backup created using pg_dump, you can use the pg_restore command. You can specify the backup file and the database name to restore the backup to the specified database:

pg_restore -U username -d dbname -C backup_file.dump

Restoring Docker Volumes

To restore a Docker volume backup creating using the script above, we need to reverse the process: extract the contents of the tarball to the volume. We can create a restore script like this:

#!/bin/bash
# Set the paths and variables
DOCKER_VOLUME="hoarder_data"
BACKUP_DIR="/home/user/backup/hoarder"
BACKUP_FILENAME="data_volume_backup.tar.gz"

# Extract the backup file to the Docker volume
docker run --rm -v "$DOCKER_VOLUME":/volume -v "$BACKUP_DIR":/backup busybox tar -xzf "/backup/$BACKUP_FILENAME" -C /volume

This script will create a temporary Docker container, mount the Docker volume specified by $DOCKER_VOLUME, and extract the contents of the backup file ($BACKUP_FILENAME) into that volume. After the restore process is complete, the temporary container is automatically removed (--rm flag).Replace the variables DOCKER_VOLUME, BACKUP_DIR, BACKUP_FILENAME with the appropriate values for your setup.

Make sure that the backup file (data_volume_backup.tar.gz) contains the correct data and structure for the Docker volume you're restoring to. Additionally, ensure that the Docker volume (hoarder_data in this case) exists before running the restore script. If it doesn't exist, you may need to create it beforehand using docker volume create.