Mustafa Can Yücel
blog-post-24

Backing Up Your Data Securely with Restic

Restic

Restic is a fast, secure, and efficient backup program that supports a variety of backends, including local storage, SFTP, S3, and more. It is designed to be easy to use, reliable, and secure. Restic uses strong encryption to protect your data and ensures that only you have access to your backups. In this post, we will cover how to install and configure Restic, create backup scripts, and schedule automated backups.

Installing Restic

On Debian, there’s a package called restic which can be installed from the official repos, e.g. with apt:

sudo apt install restic

Configuring Backends, i.e. 'Repositories'

The place where your backups will be saved is called a "repository". This is simply a directory containing a set of subdirectories and files created by restic to store your backups, some corresponding metadata and encryption keys. Restic supports a variety of backends, including local storage, SFTP, S3, and more:

  • Local storage: A directory on your local filesystem.
  • SFTP: A remote server accessible via SSH.
  • S3: Amazon S3-compatible object storage services.
  • Google Cloud Storage: Google Cloud Storage service (not Google Drive).
  • Backblaze B2: Backblaze B2 cloud storage service.
  • and many more; see official documentation for a complete list

In our previous post, we have used Google Drive (not to be confused with Google Storage Service). Restic, however, does not support Google Drive natively; yet it can use rsync to sync the backup repository with Google Drive. Nonetheless, we want our system to be as simple as possible, as there is power in simplicity. For this reason, we will use Amazon S3 as the backend for our backups. Amazon S3 is a highly durable and scalable object storage service that is widely used for storing backups and other data. It is also quite affordable, especially for small to medium-sized backups. If you have never registered an AWS account, it will give you a free tier for the first year, which is more than enough for our purposes.

AWS S3 Configuration

If you do not have an AWS account, you can create one here. Once you have an account, you can create an S3 bucket to store your backups. You can follow the official documentation to create an S3 bucket.

Configuring Access

There are different ways to authenticate Restic with AWS. The access key and secret key are the most common way to authenticate. However, creating these keys on a root user (i.e. the user that has infinite privileges) is a very bad security practice. Instead, we will create an IAM user that has only one permission; to read and write to the S3 bucket that we have created for the backups. By this way, even if the access key and secret key are compromised, the attacker will not be able to do anything other than reading and writing to the S3 bucket. To create an IAM user, follow the official documentation. Do not assign any permissions during creation. Once you have created the user, attach the following policy to this user:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::your-backup-bucket-name",
                "arn:aws:s3:::your-backup-bucket-name/*"
            ]
        }
    ]
}
Attaching this policy will allow the user to read, write, and delete objects in the S3 bucket. Replace your-backup-bucket-name with the name of the bucket you have created.

Once you have created the user, login as this user and create an access key and secret key. You can do this by clicking on the user, then clicking on the Security credentials tab, and then clicking on the Create access key button. Save the access key and secret key in a secure place, as you will not be able to see the secret key again. With this approach, we have the following advantages:

  • Principle of least privilege: The user will only have access to what is necessary for the backup task.
  • Easier to manage and rotate credentials: We can easily update or revoke access without affecting other parts of our AWS setup.
  • Better auditing: We can track actions performed specifically by this user.

Creating a Repo in Restic

Now we can create a repository in Restic. Remember that a "repository" is simply a directory containing a set of subdirectories and files created by restic to store your backups, some corresponding metadata and encryption keys. We will create a repository in the S3 bucket we have created, but we need to pass the credentials to Restic. We can do this by setting the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables. We do not want these to persist in our shell (we will explain why soon), so we will use the env command to pass these variables only for the current command:

export AWS_ACCESS_KEY_ID=MY_ACCESS_KEY
export AWS_SECRET_ACCESS_KEY=MY_SECRET_ACCESS_KEY
Now we can initialize a repository in the S3 bucket:
restic -r s3:s3.amazonaws.com/your-backup-bucket-name init
It will ask for a password to encrypt the repository. Choose a strong password and keep it in a secure place. You will need this password to restore your backups. Remember; if you lose this password, you will not be able to restore your backups.

Few Words About Repos

Restic uses a repository to store backups. A repository is a directory containing a set of subdirectories and files created by restic to store your backups, some corresponding metadata and encryption keys. The repository is encrypted with the password you provide when you initialize the repository. This means that even if someone gains access to your repository, they will not be able to read the contents without the password. This is why it is crucial to choose a strong password and keep it in a secure place.

Restic uses a content-addressable storage model. This means that each file is stored only once in the repository, and identical files are deduplicated. This can save a lot of space, especially if you have many similar files. Restic also supports snapshots (the contents of a directory at a specific point in time), which allow you to restore your data to a specific point in time. Snapshots are read-only, so they cannot be modified or deleted. This ensures that your backups are always safe and secure.

When restic encounters a file that has already been backed up, whether in the current backup or a previous one, it makes sure the file’s content is only stored once in the repository. To do so, it normally has to scan the entire content of the file. Because this can be very expensive, restic also uses a change detection rule based on file metadata to determine whether a file is likely unchanged since a previous backup. If it is, the file is not scanned again.

Snapshots can have one or more tags, which can be used to group snapshots together. This can be useful if you want to keep track of different versions of your data, or if you want to organize your snapshots in a specific way. For example, you could tag all snapshots of your home directory with the tag "home", and all snapshots of your work directory with the tag "work". This makes it easy to find and restore specific snapshots when you need them.

A repository can be thought as a vault storing your directories and files. You can put many directories and/or files into a single repository, and take them out whenever you need them. You need multiple repositories if you want to store directories and/or files in different vaults. This way, you can have multiple copies of the same directory and/or file in different vaults, adhering to the famous (infamous?) 3-2-1 backup rule. You can list all the snapshots in a repository with the following command:

restic -r s3:s3.amazonaws.com/your-backup-bucket-name snapshots
enter password for repository:
ID        Date                 Host    Tags   Directory        Size
-------------------------------------------------------------------------
40dc1520  2015-05-08 21:38:30  kasimir        /home/user/work  20.643GiB
79766175  2015-05-08 21:40:19  kasimir        /home/user/work  20.645GiB
bdbd3439  2015-05-08 21:45:17  luigi          /home/art        3.141GiB
590c8fc8  2015-05-08 21:47:38  kazik          /srv             580.200MiB
9f0bc19e  2015-05-08 21:46:11  luigi          /srv             572.180MiB
For a variety of things you can do with snapshots, see the official documentation.

Manual Backups

If you want to perform a manual backup, you simply use the `backup` command. Remember that this command requires the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to be set as environment variables:

restic -r s3:s3.amazonaws.com/your-backup-bucket-name backup /path/to/your/data
open repository
enter password for repository:
repository a14e5863 opened (version 2, compression level auto)
load index files
start scan on [/home/user/work]
start backup on [/home/user/work]
scan finished in 1.837s: 5307 files, 1.720 GiB

Files:        5307 new,     0 changed,     0 unmodified
Dirs:         1867 new,     0 changed,     0 unmodified
Added to the repository: 1.200 GiB (1.103 GiB stored)

processed 5307 files, 1.720 GiB in 0:12
snapshot 40dc1520 saved
The output shows that restic successfully created a backup of the directory in a short time. Each backup snapshot is given a unique hexadecimal identifier, in this case "40dc1520". Restic processed 1.720 GiB of data from the local directory. However, only 1.200 GiB was added to the repository, indicating that restic efficiently handled duplicate data. Further compression reduced the stored data to 1.103 GiB. Without the "--verbose" option, restic provides less detailed output but still displays a real-time status. It's important to note that this live status shows the number of processed files, not the amount of data transferred. The actual transferred volume may differ due to factors like de-duplication, potentially being lower or higher than the processed amount. If you run the backup command again, restic will create another snapshot of your data, but this time it's even faster and no new data was added to the repository (since all data is already there). This is because restic uses a content-addressable storage model, which means that identical files are only stored once in the repository. This can save a lot of space, especially if you have many similar files.

You can add multiple directories to the backup command, and restic will back them up in a single snapshot. For example, to back up both the /home/user/work and /srv directories, you would run:

restic -r s3:s3.amazonaws.com/your-backup-bucket-name backup /home/user/work /srv
If you wish, you can also create different snapshots for each directory by running the backup command separately for each directory. This can be useful if you want to restore only a specific directory without affecting the others.

Automatic Backups

All is good up to now, but nobody got time for manual backups. We want our backups to be automated. This is the reason we don't permanently add our keys as environment variables; we don't want to expose them on every shell we open as we will rarely need them. However, restic does not support scheduled backups natively. Instead, we will use an environment variables file, together with a script, then use systemd to schedule the script.

Keys and Secrets

First, we create a directory for our scripts and environment variables file. We will create the /etc/restic directory, and create a file named restic-env (or any other name you like) in this directory. This file will contain the following necessary environment variables:

export RESTIC_REPOSITORY='s3:s3.xxxregionxxx.amazonaws.com/xxxx'
export RESTIC_PASSWORD='xxxxxxxxx'
export AWS_ACCESS_KEY_ID='xxxxxxxxx'
export AWS_SECRET_ACCESS_KEY='xxxxxxxx'
export SLACK_WEBHOOK_URL='https://hooks.slack.com/services/xxx
The SLACK_WEBHOOK_URL is optional; it is used to send a message to a Slack channel after the backup is completed. You can create a webhook for your Slack channel here.

Important Note: In this simplified approach, we are storing the AWS access key and secret key in plain text. This is OK in our case for two reasons:

  • Our server is secure and only accessible by us.
  • Even if someone gains access to the server, they will not be able to do anything other than reading and writing to the S3 bucket, as we are using an IAM account that only has access to a single bucket.
Still, we will have some basic security measures in place; we will make the file readable only by the root user:
sudo chown root:root /etc/restic/restic-env
sudo chmod 600 /etc/restic/restic-env

Backup Script

The major difficulty in creating a backup script is that many containers do not use simple files; most of them has databases (e.g. MySQL, PostgreSQL, MongoDB) and other services running. Some applications will have specific backup scripts; you cannot just dump a dataabase or a volume. For this reason, it is imperative to check the documentation of the applications that will be backed up. In my case, we have the following applications that need to be set up, and the corresponding file/folders:

  • Caddy: Project folder (includes Caddyfile).
  • Memos: Project folder.
  • Calibre-web: We will backup the project folder, which also includes the library. It will take time only on the first backup, but the following snapshots will be faster thanks to deduplication.
  • Freshrss: Project folder
  • Paperless: Project folder. We set it up to use SQLite, so the directory-mounted volumes will include the database files as well.
  • Radicale: Project folder. This will include the DAV files within the directory-mounted volumes.
  • Searxng: Project folder.
  • Silverbullet: Project folder. All the notes are stored within the directory-mounted volumes as markdown files.
  • Vaultwarden: Project folder. All the encrypted vault files are stored within the directory-mounted volumes.
  • Firefly-III: Firefly requries a special backup script to be run. We will run this script, and it will create a single compressed file. We will backup this file.

As seen, we have many different types of backup implementations. For this, we will create a single extensible script that can handle all possible scenarios. We will create a script named `restic-backup.sh` in the `/usr/local/bin` directory. This script will contain the following:

#!/bin/bash

    set -e
    
    # Load environment variables
    source /etc/restic/restic-env
    
    # Initialize the message variable
    MESSAGE="🚀 Restic Backup Operation Results:\n"
    ERRORS=""
    
    # Function to execute backup and handle messaging
    execute_backup() {
        local dir="$1"
        local preprocess_cmd="$2"
        echo "Backing up $dir"
    
        # Run preprocessing command if provided
        if [ -n "$preprocess_cmd" ]; then
            echo "Running preprocessing command: $preprocess_cmd"
            eval "$preprocess_cmd"
            preprocess_status=$?
            if [ $preprocess_status -ne 0 ]; then
                MESSAGE="$MESSAGE\n❌ Failure: Preprocessing for $dir failed."
                ERRORS="$ERRORS\nFailure: Preprocessing for $dir failed. Command: $preprocess_cmd"
                return
            fi
        fi
    
        output=$(restic backup "$dir" 2>&1)
        exit_status=$?
    
        if [ $exit_status -eq 0 ]; then
            MESSAGE="$MESSAGE\n✅ Success: Backup of $dir completed."
            echo "✅ Success: Backup of $dir completed."
        else
            MESSAGE="$MESSAGE\n❌ Failure: Backup of $dir failed."
            ERRORS="$ERRORS\nFailure: Backup of $dir failed. Output:\n$output"
            echo "Failure: Backup of $dir failed. Output:\n$output"
        fi
    }
    
    # List of directories to backup with optional preprocessing commands
    declare -A BACKUP_DIRS
    BACKUP_DIRS["/home/user/containers/caddy"]=""
    BACKUP_DIRS["/home/user/containers/memos"]=""
    BACKUP_DIRS["/home/user/containers/calibreweb"]=""
    BACKUP_DIRS["/home/user/containers/freshrss"]=""
    BACKUP_DIRS["/home/user/containers/paperless"]=""
    BACKUP_DIRS["/home/user/containers/radicale"]=""
    BACKUP_DIRS["/home/user/containers/searxng"]=""
    BACKUP_DIRS["/home/user/containers/silverbullet"]=""
    BACKUP_DIRS["/home/user/containers/vaultwarden"]=""
    BACKUP_DIRS["/home/user/containers/firefly/backup"]="
        BACKUP_DIR=/home/user/containers/firefly/backup
        mkdir -p \$BACKUP_DIR
        BACKUP_FILE=\$BACKUP_DIR/firefly_backup_\$(date +%Y%m%d_%H%M%S).sql.gz
        sudo docker exec firefly_iii_db sh -c 'MYSQL_PWD=\$MYSQL_PASSWORD mariadb-dump -u\$MYSQL_USER \$MYSQL_DATABASE' | gzip > \"\$BACKUP_FILE\"
        cd \"\$BACKUP_DIR\" || exit
        ls -1t firefly_backup_*.sql.gz | tail -n +4 | xargs rm -f
    "
    
    # Perform backups
    for dir in "${!BACKUP_DIRS[@]}"; do
        execute_backup "$dir" "${BACKUP_DIRS[$dir]}"
    done
    
    # Run restic forget to maintain retention policy
    echo "Running forget command to maintain retention policy"
    forget_output=$(restic forget --keep-daily 7 --keep-weekly 4 --keep-monthly 6 2>&1)
    forget_status=$?
    
    if [ $forget_status -eq 0 ]; then
        MESSAGE="$MESSAGE\n✅ Success: Retention policy applied."
    else
        MESSAGE="$MESSAGE\n❌ Failure: Error applying retention policy."
        ERRORS="$ERRORS\nFailure: Error applying retention policy. Output:\n$forget_output"
    fi
    
    # Write errors to log file, if any
    if [ -n "$ERRORS" ]; then
        echo -e "$ERRORS" > "/home/user/backup/log/errors.$(date +"%Y-%m-%d_%H:%M:%S")"
        MESSAGE="$MESSAGE\n\n⚠️ Errors occurred during backup. Check the log file for details."
    fi
    
    # Wrap it up
    MESSAGE="$MESSAGE\n\nBackup operation completed. Have a nice day! 🎉"
    
    # Send the accumulated message to Slack
    if [ -n "$SLACK_WEBHOOK_URL" ]; then
        curl -X POST -H 'Content-type: application/json' --data "{\"text\":\"$MESSAGE\"}" "$SLACK_WEBHOOK_URL"
    else
        echo "SLACK_WEBHOOK_URL is not set. Skipping Slack notification."
    fi
    
    echo "Backup process completed"

Let's review this script step-by-step:

  1. The script starts by ensuring that any error will cause the script to exit immediately.
  2. It then sources the environment variables file to load the necessary variables.
  3. It initializes two main variables:
    • MESSAGE: This variable will accumulate the results of the backup operation and will be sent to Slack at the end.
    • ERRORS: This variable will accumulate any errors that occur during the backup operation.
  4. The heart of the script is the execute_backup function. It takes two parameters; the directory to back up, and an optional preprocessing command. Here is what it does:
    • It prints a message indicating that it is backing up the directory.
    • If a preprocessing command is provided, it runs this command. If the command fails, it adds an error message to the MESSAGE and ERRORS variables.
    • It runs the restic backup command on the directory. If the command succeeds, it adds a success message to the MESSAGE variable. If it fails, it adds an error message to the MESSAGE and ERRORS variables.
  5. Next, the script defines an associative array named BACKUP_DIRS. This array contains the directories to back up as keys and optional preprocessing commands as values. For each directory in the array, the script calls the execute_backup function.
  6. The script iterates through the BACKUP_DIRS array, calling execute_backup for each directory.
  7. After backing up all directories, the script runs the restic forget command to maintain the retention policy:
    • The forget command removes old snapshots according to the retention policy. In this case, it keeps 7 daily, 4 weekly, and 6 monthly snapshots.
    • If the forget command succeeds, it adds a success message to the MESSAGE variable. If it fails, it adds an error message to the MESSAGE and ERRORS variables.
  8. If any errors occurred during the backup operation, the script writes them to a log file, with a timestamp in the file name.
  9. The script then sends the accumulated message to a Slack channel using a webhook URL. If the SLACK_WEBHOOK_URL variable is not set, it skips the Slack notification.

Systemd Service File

Now that we have our backup script, we need to create a systemd service file to run it on a schedule. We will create a file named restic-backup.service in the /etc/systemd/system directory. This file will contain the following:

[Unit]
Description=Restic Backup Service
After=network.target

[Service]
Type=oneshot
ExecStart=/usr/bin/sudo /usr/local/bin/restic-backup.sh
User=root
Group=root

[Install]
WantedBy=multi-user.target
Note that we are using the sudo command to run the script as root. This is necessary because the script needs to access the restic binary and the environment variables file, which are owned by root. The script itself also has to run as root, so it has the necessary permissions to access these files, and to back up the directories owned by other users (mainly docker).

Timer File

Finally, we need to create a systemd timer file to schedule the backup service. We will create a file named restic-backup.timer in the /etc/systemd/system directory. This file will contain the following:

[Unit]
Description=Run Restic Backup nightly at 03:30

[Timer]
OnCalendar=*-*-* 03:30:00
Persistent=true

[Install]
WantedBy=timers.target
This timer file will run the backup service every night at 03:30. The Persistent=true option ensures that the timer will catch up on missed runs if the system was down at the scheduled time.

Enabling and Starting the Timer

Once you have created the timer file, you need to enable and start the timer:

sudo sysytemctl daemon-reload
sudo systemctl enable restic-backup.timer
sudo systemctl start restic-backup.timer
You can check the existing times and verify that the timer is running with the following command:
sudo systemctl list-timers --all

Restoring Backups

Restoring backups is as easy as creating them. To restore a backup, you need to know the snapshot ID of the backup you want to restore. You can list all snapshots in the repository with the following command:

restic -r s3:s3.amazonaws.com/your-backup-bucket-name snapshots

Once you have the snapshot ID, you can restore the backup with the following command:

restic -r s3:s3.amazonaws.com/your-backup-bucket-name restore snapshot-id --target /path/to/restore

Restic will restore the backup to the specified directory. If you want to restore the backup to its original location, you can omit the --target option. Restic will restore the backup to its original location by default.

Restic also supports restoring individual files or directories from a snapshot. You can use the following command to list the contents of a snapshot:

restic -r s3:s3.amazonaws.com/your-backup-bucket-name ls snapshot-id

Once you have the snapshot ID and the path of the file or directory you want to restore, you can use the following command to restore the file or directory:

restic -r s3:s3.amazonaws.com/your-backup-bucket-name restore snapshot-id --target /path/to/restore path/to/file-or-directory

Restic will restore the specified file or directory from the snapshot to the specified location. If you want to restore the file or directory to its original location, you can omit the --target option.

Restic also supports mounting snapshots as a FUSE filesystem. This allows you to browse the contents of a snapshot without restoring them. You can use the following command to mount a snapshot:

restic -r s3:s3.amazonaws.com/your-backup-bucket-name mount snapshot-id /path/to/mount

Restic will mount the snapshot to the specified directory. You can then browse the contents of the snapshot as if it were a regular filesystem. When you are done, you can unmount the snapshot with the following command:

fusermount -u /path/to/mount

Conclusions

Restic is a powerful and versatile backup tool that makes it easy to back up your data securely. By following the steps outlined in this post, you can set up automated backups to an Amazon S3 bucket, ensuring that your data is safe and secure. Restic's deduplication and encryption features make it an excellent choice for backing up your data, and its support for a variety of backends makes it easy to integrate with your existing infrastructure. Whether you are a home user looking to back up your personal files or a business looking to protect your critical data, Restic has you covered.