MikroTik Solutions

Yet Another Backup Script
Login

Yet Another Backup Script

There are many solutions for backing up RouterOS devices. This one is mine.

The script uses SSH to create and pull the backup files, which it then stores in Fossil, the version control system backing this very web site. That means you can get a web view of your RouterOS backups using the Files interface above, browse the history of your backups, get graphical diffs between versions of your backups, download Zip archives of your latest backups, and more.

Requirements

1. POSIX

The script was written and tested under Bash, though it should be portable to any other POSIX shell, needing only a change to the shebang line. It relies on other POSIX user environment stuff: cut, grep, tail, etc. It was written for and tested on a macOS system, but it should run just as well on Linux. It can be made to run on modern Windows systems via WSL, or on legacy Windows systems via Cygwin.

2. Command-Line SSH

The script was written for systems with the OpenSSH commands in mind. OpenSSH is installed by default on virtually every Linux and macOS system, and it's a first-party add-on from Microsoft for modern Windows versions. There may be third-party SSH command line implementations that use the same command formats this script relies on, but no attempt has been made to verify this.

3. Fossil

The Fossil version control system is designed with the same sensibility as RouterOS: self-contained, featureful, compact, and coherent. Fans of other VCSes are encouraged to give the script a try as-is before going in and hacking it to use your current VCS of choice. You might find that you enjoy Fossil for the same reasons you enjoy RouterOS.

4. The Script

You can download the latest version of the script here.

Alternately, you may wish to maintain a local fork of this repository so you can manage your local modifications without needing to overwrite them each time this script changes.

Preparation

There are a few one-time steps to use this:

  1. Set up SSH on each RouterOS box you want backed up, with host keys for automated login.

  2. Adjust the user-configurable variables at the top of the script:

    rdir is the Fossil repository directory. The default is "~/museum", being a place where one stores precious Fossils, but you are free to use anything else you like.

    repo is the name of the actual Fossil repository under $rdir where you want your backups stored. They're separate because you may have other Fossil repos under that directory; this script deals with just that one repo.

    bdir is where your local check-out of the backup repository lives. This can be anywhere else on the system. Wherever you want the current versions of your backups on your local machine, that's where you should point this script. Contrast it with the repository, which stores all historical versions; the local check-out is a changeable view into the repository.

  3. Run the script once by hand, passing the hostnames or IPs of the RouterOS boxes you want it to back up.

The final step will do the initial backup on those boxes — storing them in $bdir — then atomically commit that first set of backups durably to $repo.

Usage

Once the backup environment is set up, you can simply run this script without arguments each time you want another backup taken. It iterates over the per-host subdirectory names it finds in $bdir, updates them with fresh backups, and commits any substantial changes it finds to $repo.

For each backup past the initial one, it asks you to give a commit comment, which lets you document the reason for the backup. This eases historical exploration, as when you're trying to find out why a given setting changed. Examples:

I find it easiest to compose that comment inside my favorite text editor. Tell Fossil about that by setting the EDITOR environment variable appropriately. (e.g. export EDITOR=vim or export EDITOR=code)

This step does mean the script is unsuited to automated backups. If that bothers you, change the "fossil ci" line near the end, passing a canned or computed message via the -m flag.

You can pass one or more hostnames or IPs after the initial setup, in which case what you get is a limited backup of only the named hosts. I use this feature when carefully rolling out a new RouterOS version, since it gives me a diff between the old configuration and the upgraded one. I don't need complete backups taken of all the other RouterOS boxes; they're still current, so why burn the network time or clutter up their local file storage?

Compression

Fossil has a two-level compression scheme that keeps the repository size small. For uncompressed data like these backups, it computes a binary delta between versions and stores only the changes. It then applies data compression to the diff. You can expect your initial repository size to be about half the size of the check-out directory contents, then to grow slowly from there, not surpassing the check-out directory size until many backups have been taken.

(As of this writing, my main repo size is only 57% bigger than the checkout directory despite having taken 133 distinct backups, with an overall compression ratio over 6:1.)

Atop this, the script looks at the text backup diff to determine when a router's configuration hasn't changed since the last backup. When this happens, it removes all of the files created during the backup attempt — local and remote — to avoid ballooning your backup repo and router flash storage with redundant backups. This also eases diffs between versions: any change to the textual /export file (*.rsc) between versions will be as substantial as the change made.

(Beware the change detection risk.)

Diffs

The script produces a graphical diff as its last step before committing a backup to the repository, allowing you to abort a commit if you see changes you don't like. Common reasons are:

  1. You made a temporary change and forgot to revert it before committing.
  2. You made a change to one router and forgot to roll it out through the rest.
  3. Someone fat-fingered a change, and this feature caught the problem for you, allowing you to fix it.

Because there is no standard graphical diff program, the script's default behavior is to produce an HTML diff and open it in your web browser. Due to the nature of web browsers, Fossil doesn't block and wait for the browser to close before continuing, so after giving the browser time to open the temporary HTML output file, the script moves on to the "commit" step. Since that step does block, it means the script waits on the user to provide a commit message, per above.

The script will instead use another graphical diff tool if one is configured. Popular options for this are:

The method of configuring each is similar. Assuming they're in the command path, it's usually as simple as:

$ fossil set gdiff-command p4merge

If you give that command from within this script's check-out directory — $bdir above — it affects only that one Fossil repository. If you want the setting to affect all Fossil repos on the machine, you can run it from anywhere, giving the "-g" flag to make it affect the global Fossil configuration.

It is because of this step that the tool produces "terse" format /export output: to avoid noisy semantic-free whitespace deltas in the diff output. The common case is that you added something to the configuration for an item that normally shows as multiple lines in "/export" output, but because it's emitted at the start of the first line, it causes that line to wrap, which causes the next line to wrap, and so on, giving the false impression in the diff that the change had a broader impact than it actually did. We want to see just the lone addition, not all of the cascading whitespace changes that the lone addition subsequently caused.

Restoring from Backup

The simplest option is to upload the binary backup from $bdir/HOSTNAME/config.bin to the router, then use the documented procedure to load it.

BEWARE: This is only safe to do on the same box the backup was taken on. If you restore a binary backup made on one box to another, at the very least you will end up duplicating MAC addresses; a binary backup restores everything that is configurable! If you do this on the same type of device but of a different generation — e.g. two RB4011s manufactured years apart — restoring a binary backup may cause further errors due to differences in the underlying hardware. If the two boxes aren't even of the same type (e.g. RB4011 to RB5009) you're virtually guaranteed to have problems with the resulting configuration.

For these and many other reasons, it is often better to upload the export.rsc text backup file instead, then try the "/system reset-configuration run-after-reset" feature. If that works, apply any supplemental backup elements.

If that also doesn't work, then you're stuck with the manual restoration method:

  1. If the router is functioning, do a full reset so you're starting from a clean slate. (BEWARE: If you take this step, the RouterOS box needs to be accessible via WinBox, MAC-TELNET, RoMON, or similar. Don't do this if you're trying to restore from a remote machine!)

  2. Connect to the router via its MAC address using either WinBox or MAC-Telnet.

  3. If you were using the default "admin" user account, set its password from wherever you have it saved.

    If instead you took my advice below, create the “full” capability login user with the credentials you previously saved. Don't delete the default admin user just yet; we'll keep it as a secondary access path until we're sure everything's working right. If you have a router-specific SSH key, upload the public half to Files and import it for that user. Also load the saved SSH host keys and import them with "/ip ssh import-host-key". Try SSHing in now.

  4. Upload the certificates you backed up per the instructions below. Run "/cert import" to bring the keys in, unlocking them with the PEM passphrase you used. Attach them to the necessary services: TLS certs on the www-ssl and api-ssl services, IPsec certs on the tunnels, etc.

  5. Since we aren't using the "run-after-reset" feature, the export.rsc file likely will not apply directly due to configuration conflicts. If nothing else, the default configuration for many RouterOS boxes comes with a bridge already set up, which will conflict with the one in the export.rsc file. Edit it to suit, then upload it to the router's filesystem root, open a CLI window, and say "/import" to run it. You don't need any arguments to the command: RouterOS finds the file by extension and runs it.

  6. If the RSC file imported without error, reboot to be sure it all works as it did at the time of the last backup.

  7. If your full-capability login user is not the stock "admin" user, double-check that the account works properly, then delete the admin user.

Why Both Binary & Text Backups?

The script uses both the RouterOS /system/backup feature and the /export command, yielding binary and text backups, respectively.

RouterOS binary backups include the device's full configuration, but for various reasons they don't always restore properly. I've seen a number of different cases:

  1. You're trying to restore onto a different hardware platform.

    That's a hard truth to learn, since it means one needing to upgrade a RouterOS device or replace a dead one can't just buy the best-available replacement and restore the old one's binary config to it.

  2. The RouterOS version the backup was made on is too different from the one you're restoring on.

    It shouldn't surprise you that RouterOS 7.x won't restore a RouterOS 6.x binary backup, but it can happen even between closer-spaced versions. Beta versions tend to be worst, to the point that I've seen binary restoration fail between two back-to-back RouterOS betas.

  3. The backup was in some way bad. A successful restoration of a bad backup results in the same bad configuration. It's all-or-nothing.

Text backups solve these problems. The /export format is fairly compatible between versions, and where there are incompatible differences, you can edit the commands to make them work under the new OS version. The same is true among different hardware: if the old box said something about "sfp1" but it's now called "sfpplus1" on the new box, you can edit the commands involved to make it work. Even if you have to rebuild the configuration piecemeal, with a text backup, you at least have a guide to what you did on the prior device and can work through it step-by-step.

Text backups are also version-control friendly. They allow for diffs from one backup to the next, helping the administrator remember what they did since the last backup, understand the changes, and make an informed decision about choosing to commit them as the new version. If a given change turns out to be bad, the version history gives you the option to go back and see what changed since the last time it was known to work, and even roll back to that version if necessary.

Unfortunately, text backups made by RouterOS don't include everything. Thus binary and text.

Supplementing the Text Backup

Because both of RouterOS's backup methods have problems, I recommend that you rely on the binary backup only in cases where you need to recover the device you took the backup on to a prior known-good configuration. For all other cases, you should supplement the text backup with the elements that are normally only backed up via the binary method, stored separately. Then if it comes time to do a restore, you can try the binary method, and if that doesn't work, you have the option to rebuild the configuration with a combination of the text backup and the individual elements you had the foresight to store separately.

Before we get down into those details, we come to the question, "Where should I keep these supplementary backup elements?" Why, in the Fossil repository managed by this script, of course! The script only treats the top-level subdirectory names it finds in the check-out directory as RouterOS box hostnames if they contain a config.bin file, being the RouterOS binary configuration backup. Every other file or directory at the top level of the check-out — $bdir above — it ignores. That gives you the freedom to store any type of file you want in the repository this script manages without interfering with its operation.

In addition to storing files with version control, Fossil includes a wiki, where you might document things like backup and restoration procedures. For instance, you might have a preferred method for creating certificates, which you document in a wiki article. Then since it's stored in the same Fossil repo as your backups, you could add instructions for loading certificate backups.

This combination of features allows the repo to serve as an all-round disaster recovery kit.

Things to consider including in the kit:

X.509 Certificates

Export a copy of all X.509 certificates via the RouterOS export function, then download them somewhere safe. I suggest the "certs/" subdirectory of the backup repository.

NOTE: It is absolutely necessary to give the "export-passphrase" parameter to this command if you want the resulting PEM file to include the private half of the key. Without a passphrase, RouterOS gives you only the public half.

SSH Keys

For user keys, you normally upload these keys from some other location (e.g. ~/.ssh/id_*.pub) which should be backed up by other means. However, storing them in this script's backup repo isn't a terrible idea.

More important than this are the SSH host keys, without which your SSH clients will begin complaining that someone's trying to spoof the connection after you restore from a text backup. For instance, you may have run "/ip ssh set strong-crypto=yes" (as I recommend) and then ran the subsequently-necessary "/ip ssh regenerate-host-keys" command. You need to export those regenerated SSH host keys via "/ip ssh export-host-key" and download them somewhere safe. Storing them in this script's backup repo is a perfectly sensible plan.

Users

I normally have only one "real" user on the router, a "full" capability user named something other than "admin" to frustrate bots and script kiddies. I delete the default "admin" user as soon as I have this one set up.

You might choose to store this user name and password in a wiki document in the backup repo. Personally, I keep it in a password manager instead.

Backing up the Backup

Fossil repositories are designed for long-term readability, not as an encrypted data store. That means that anyone that can read the repo file this script manages can pull the sensitive information out of your backups. To the extent that you go beyond what this script provides, all of that supplemental information will be in plaintext as well.

While there are ways to get Fossil to encrypt a live Fossil repo file, I think it's better to just make sure it's stored somewhere safe, such as under the encrypted home directory of the one user that needs to maintain the backups.

If you want belt-and-suspenders, I recommend one of two methods:

SSH

Set up autosync with another machine over SSH. Using the variable names from the backup script:

$ scp $repo otherbox:museum/
$ cd $bdir
$ fossil sync ssh://otherbox/museum/routeros-backups.fossil

The first time you do that, the only thing that will happen is that the sync link will be set up, but from that point on, each time the script commits a change to the local repository, Fossil will autosync the change to otherbox over SSH for you.

Repository Backup

You could instead back up the entire RouterOS backup repository with some other method. I highly recommend studying the relevant Fossil documentation page on this. There's a lot of subtlety you might neglect to consider otherwise.

Down at the bottom of that doc is a method for encrypted off-site backups, which pairs well with this topic.

Using Another VCS

This script relies on no irreplaceable Fossil-specific features. The script makes several calls to Fossil, but each one is readily replaced with a call to any other version control system you prefer.

Git is the most popular VCS at the moment, but if you are one of its fans, do realize that Fossil provides everything Git does that matters for this application without bringing along all of its unnecessary complexities. You are unlikely to have a better justification than simple inertia for using it; none of Git's unique capabilities matter for this application. For instance, no sensible network engineer will publish sensitive router backups on GitHub, and even if one were so foolish as to do so, there is all but zero chance that anyone would fork that repository, make improvements, and push a PR for the repo. Git can be used on a private LAN instead, without involving the public Git hosting services, but what does that buy you over Fossil?

A better alternative would be Subversion, being small and coherent like Fossil, though not nearly so self-contained, featureful, or easy to administer.

Another alternative that would work here, which is more in line with the simplicity of Fossil without all of its features, is Mercurial.

For day-to-day use, Fossil is as easy to use as CVS while being nearly as powerful as Git.

Known Weaknesses

Periodic Cleanup

The current version of this script doesn't purge old binary backups under Files on each RouterOS box. It's possible to run this script enough times that you fill the flash, preventing operations like RouterOS version upgrades from working.

Modifying the script to automatically purge old backups wouldn't be difficult. I simply haven't bothered since I run this tool only at need, not periodically, so they don't build up to that point very fast. Even on a small RouterOS box like a hEX, with its piddlin' 16 MiB of flash, you can do 10 or so backups before you start causing this problem.

Automatic cleanup would matter if you ran this script periodically. Patches thoughtfully considered.

Change Detection

The script attempts to detect whether there are meaningful changes to the router configuration before doing a commit in order to avoid ballooning the repo with unnecessary backups. It does this by looking at diffs of the text backup file, which means it can be fooled if you make a change that affects the binary backup file alone. The list of supplemental data is a fair guide to the sort of changes you can make to a RouterOS box that will show up only in the binary diff. If you change something on that list and nothing else, this script is likely to decide you haven't made any substantial changes to the configuration and skip the commit.

This problem doesn't always occur. For instance, adding a certificate is likely to require a change the text configuration as well, since you will need to reference that new certificate from somewhere, else why create it? Possible answer: you're using the RouterOS box as a CA, and you've added the certificate only to copy it to another box, in which case you're back in the soup.

While the script can certainly detect when the binary backup file changes, RouterOS seems to include pointless tiny changes each time, so we cannot distinguish "it changed" from "it needs to be backed up."

I see multiple possible solutions:

  1. Rip the change detection out and commit a backup each time. Then, either rely on the user to know when a backup needs doing, or cope with the fact that the repo will grow on each backup even if there is no substantial change in the versions. A mitigating factor is that this scheme has strong compression to control the repo size growth.

  2. Add a "--force" flag to make the script skip the change detection for a particular run, given when the user knows it needs to be done without regard to the script's default criteria for skipping it.

  3. Make it a practice to change something in the configuration that shows up in the /export output each time you make a change that only shows up in the binary backup. For instance, if you update a certificate for one of your IPSec clients, you could document the change in the comment= on that client's /interface entry, causing this script to consider the change substantial and thus allow a commit.

License

This work is © 2022-2024 by Warren Young and is licensed under CC BY-NC-SA 4.0