D 2022-06-30T10:46:03.980
L Using\sfail2ban\swith\sRemote\ssyslog
N text/x-markdown
P 30122939e4d3820523b31bc5f31bdc9de965ab91dfccff87379ae81cbbf99603
U tangent
W 19067
## Motivation

The popular [`fail2ban`](https://www.fail2ban.org/) tool monitors log messages for suspicious activity, then issues firewall commands to temporarily ban those hosts. After a configurable period of time, it automatically un-bans them.

RouterOS has logs and a suitable command interface to its firewall, but [it's impractical to run `fail2ban` directly on the router](#docker). This guide shows what I believe to be a superior method: set up a machine on the network capable of running `fail2ban` in a way that lets it monitor the logs on the router's behalf and manage its firewall from afar.

In this article, we're going to set `fail2ban` up to react to failed SSH connections to the router. You can use this mechanism to react to anything RouterOS can produce a sensible log message about, so long as you can produce a regex that will match the necessary elements of the message. For instance, it would be easy to extend the examples below to make it ban IPs attempting to break in via WinBox, WebFig, or your VPN technology of choice.


## Preliminaries

I developed and tested this with a macOS server running [Homebrew](https://brew.sh/), so to keep things simple I will use its commands and paths. The basic idea will work on most any Linux or BSD box, but the installation commands and configuration file locations will differ. You're expected to be able to map these trivial details to match your local system's quirks. The ideas are the hard part, and they transfer nicely as long as you can hold up your end.

The examples use the [TEST-NET-1](https://datatracker.ietf.org/doc/html/rfc5737) IP address scheme:

| IP         | Usage                                   |
|------------|-----------------------------------------|
| 192.0.2.1  | RouterOS box                            |
| 192.0.2.99 | Server running `rsyslog` and `fail2ban` |

Adjust the IPs as necessary.


## Set Up

### 1. Install & Start `rsyslog`

macOS doesn't have a `syslog` server built-in, but [it's easy to add one](/wiki?name=Remote%20Log%20Server). Follow those instructions to make your router send log messages to the host that will run `fail2ban`.

The default configuration file for the Homebrew `rsyslog` package contains:

```
# minimal config file for receiving logs over UDP port 10514
$ModLoad imudp
$UDPServerRun 10514
*.* /usr/local/var/log/rsyslog-remote.log
```

That will work perfectly well for us as-is. I show it only because we'll refer back to these configurable values below.


### 2. Set Up the Router

Now that the `fail2ban` host is receiving log messages from the router(s), you also need to add a firewall filter chain to each monitored router for use in step 3 below:

``` shell-session
> /ip/firewall/filter
> add action=jump chain=input jump-target=fail2ban
> add action=return chain=fail2ban
```

We put this in the "input" chain because we're reacting to SSH connections to the router in this example.

Move the "jump" action to the position in the firewall's input chain where you want `fail2ban` ban actions to occur. The correct position depends on your local firewall configuration, which I can't predict. The purpose of this chain is to give `fail2ban` a named position for the ban actions to be inserted. Think of it like a bookmark: wherever you put it, that's where the automated ban/unban actions will occur. Without this, we'd have to hard-code some brittle rule in the ban action like "place-before=42".


### 3. Install, Configure & Start `fail2ban`

Say:

``` shell-session
% brew install fail2ban
```

Before we can start it, we have to tell `fail2ban` what files to look at and what to do when it finds a suspicious log line.

The following paths are relative to your `fail2ban` configuration directory, which is `/usr/local/etc/fail2ban` under Homebrew, but may be elsewhere on other OSes, such as `/etc/fail2ban`.


#### The Filter

Put the following into `filter.d/routeros-rsyslog-sshd.conf`:

``` ini
[Definition]

_router     = (<F-ROUTER>[a-zA-Z0-9.-]+</F-ROUTER>)
failregex   = ^\s?<_router> .* login failure for user .* from <HOST> via ssh
```

This matches `rsyslog` messages such as:

```
2022-04-12T02:51:22.573113-06:00 officeswitch.lan system,error,critical login failure for user admin from 192.0.2.88 via ssh
```

`fail2ban` automatically removes the timestamp at the beginning, but it doesn't remove the following space, which is why our `failregex` allows for a single optional whitespace character at the beginning of the pattern. Immediately following that, we match either an IPv4 address or an ASCII host name and store it in the `F-ROUTER` variable for later use. This part of the regex won't match IPv6 addresses, domain names with odd characters, etc.

The regex then skips over the message's tag list and finds the actual failure message, including the host that caused it.

It's possible to extend the `failregex` to multiple possible matches, but that's a topic for [the `fail2ban` documentation](https://www.fail2ban.org/wiki/index.php/MANUAL_0_8).


#### The Action

Now we have to teach `fail2ban` how to apply the firewall rule changes to RouterOS. Put the following into `actions.d/routeros.conf`:

``` ini
[Init]

# SSH credentials to use to log into the router
user   = admin
port   = 22
pubkey = /var/root/.ssh/id_routeros

# SSH connection command
ssh = /usr/bin/ssh
cmd = <ssh> -i <pubkey> -p <port> <user>@<F-ROUTER>

# What to do on ban.
action = drop
chain  = fail2ban

# Command-shortening aliases
iff     = /ip/firewall/filter
what    = src-address="<ip>" chain="<chain>"
addwhat = <what> dst-port="<port>" proto="tcp" action="<action>"


[Definition]

actionban   = <cmd> '<iff> add <addwhat> place-before=0'
actionunban = <cmd> '<iff> remove numbers=[find <what>]'
```

**Notes:**

1. Change the "`user`" value if your router's full-capability admin user is called something other than "admin".

1. Change the "`port`" value if you've got the RouterOS SSH service listening on a nonstandard port. I recommend doing so on externally-facing routers since it reduces script kiddie and bot noise in your logs considerably. Note that this value is used both for logging into the router to send the RouterOS ban/unban commands and also in the firewall filter rules themselves.

1. The proper name and location of the public key to use for the login depends on how `fail2ban-server` runs. The scheme in the example is typical of the Homebrew macOS build, but on a Linux box, you'll likely need something like `/root/.ssh/id_routeros` instead.

    Remember that the OpenSSH client is very picky about which key files it will use: if you arrange for `fail2ban` to run as user `bob` instead of as `root`, it will refuse to use anything other than Bob's SSH keys. A nice side benefit of this necessity is that it lets you generate an SSH key specifically for your RouterOS boxes, rather than reuse your OS's default SSH key.

1. You may wish to change the "action" to "reject" or "tarpit". See [the RouterOS filter docs](https://help.mikrotik.com/docs/display/ROS/Filter) for details.

The "chain" value matches the filter chain we set up above. If you change one, change the other to match.

You shouldn't have to change anything else.


### The Jail

Now we can tie the two elements above together. Put the following into the `jail.local` file at the top of the `fail2ban` configuration directory. It may or may not already exist. You know you're in the right location if there's a `jail.conf` file, which you **do not** modify.

``` ini
[routeros-rsyslog-sshd]

action     = routeros
backend    = polling
enabled    = true
ignoreself = true
logpath    = /usr/local/var/log/rsyslog-remote.log
```

Change the `logpath` variable if your `syslog` server puts the log information somewhere else.

The `ignoreself` setting is **critical**. It causes `fail2ban` to ignore failures originating from the host running `fail2ban-server`. Without this, `fail2ban` could never unban its host IP, since that requires SSH access. Worse, if you'd disabled all other management interfaces on the router (e.g. WinBox) you could end up locking yourself out of the router entirely.

You may have configured your RouterOS firewall to accept all packets from the `fail2ban` host ahead of this chain to prevent this very sort of lockout problem, but if so, the ban is useless, so you might as well set `ignoreself` in that case as well. There's no value slowing the router down by giving it useless firewall filters.


#### Final Steps

With all of this in place, you can start it running:

``` shell-session
% sudo brew services start fail2ban
% sudo fail2ban-client status
```

The ban/unban actions likely won't work to start with. Although you may have been managing your routers over SSH before now, you probably haven't been doing so as the local `root` user, as `fail2ban` does. You will likely have to issue a command like the following against each router you're monitoring:

``` shell-session
% sudo ssh -i /var/root/.ssh/id_routeros admin@officeswitch.lan
```

Until you do that and say "yes" to the resulting question, the router won't be in the local `root` user's known-hosts file, so `fail2ban` won't be able to automatically log in and send RouterOS commands.

Do this step even if you don't think it's necessary: it tests that the SSH login from the local `root` account works as you think it should.

That accomplished, you can test that the ban/unban actions actually send the proper RouterOS commands:

``` shell-session
% sudo fail2ban-client set routeros-rsyslog-sshd banip 1.2.3.4
% sudo fail2ban-client set routeros-rsyslog-sshd unbanip 1.2.3.4
```

After the first command, you should find a "drop" rule for IP 1.2.3.4 in the `fail2ban` chain of your router, which the the second should remove. If you don't see that happen, check `fail2ban.log` to see whether it's seeing the router's complaint in the log, and if so, where it's getting hung up trying to apply the ban.

You may be wondering why we're doing all of this through `sudo` even though that's normally frowned-on in the Homebrew world. The reason is that we want `fail2ban` to run on system startup, not as a user service. The package will let you run the commands without `sudo`, but you'll end up with a `fail2ban-server` process running as your normal user, which won't start until you log into your Mac, and it might not have permission to do what it needs to. Worse, you may end up with a conflict where you have two `fail2ban` servers, one running as `root`, the other as your normal user, creating a conflict.



## <a id="selinux"></a>Integrating with SELinux

Although I said above I'm focusing on the macOS + Homebrew case, I tested the instructions on a CentOS box as well. For the most part, the translation is trivial, but I did run into a serious problem with getting `fail2ban` to run SSH commands, since modern Red Hat type systems put a lot of security restrictions on background services. For entirely sensible reasons, the default configuration of the stock `fail2ban` package won't allow it to launch `/usr/bin/ssh`, and if you allow that, you then run into problems with it opening the SSH key files in the root user's home directory.

The following SELinux module will grant the necessary permissions:

```
module f2b-ros-ssh 1.0;

require {
    type fail2ban_t;
    type ssh_exec_t;
    type ssh_port_t;
    type ssh_home_t;
    type admin_home_t;
    class file { execute execute_no_trans getattr map open read };
    class dir { getattr search };
    class tcp_socket name_connect;
}

allow fail2ban_t admin_home_t:file read;
allow fail2ban_t ssh_exec_t:file { execute execute_no_trans getattr map open read };
allow fail2ban_t ssh_home_t:dir search;
allow fail2ban_t ssh_home_t:dir getattr;
allow fail2ban_t ssh_home_t:file open;
allow fail2ban_t ssh_home_t:file { getattr read };
allow fail2ban_t ssh_port_t:tcp_socket name_connect;
```

Place that into a file called `f2b-ros-ssh.te` somewhere, then run these commands to compile and load it:

``` shell-session
$ checkmodule -Mmo f2b-ros-ssh.mod f2b-ros-ssh.te
$ semodule_package -o f2b-ros-ssh.pp -m f2b-ros-ssh.mod
$ sudo semodule -i f2b-ros-ssh.pp
```

That having been done, you should no longer be getting action execution errors in `/var/log/fail2ban.log`.


## <a id="docker"></a>Musings on Docker

[MikroTik added Docker support to RouterOS in 7.4beta4](https://help.mikrotik.com/docs/display/ROS/Container), which in principle allows you to run `fail2ban` directly on the router, instead of remotely monitoring syslog output. I haven't bothered trying this, because I cannot see that it's practical. For some routers, there's a single fatal limitation that takes them out of the running, and for others, an unhappy concatenation of weak workarounds yields to the same end result:

1. The [Docker cross-compilation tool chain](https://docs.docker.com/buildx/working-with-buildx/) doesn't support any MIPS CPU type, ruling out a large fraction of the RouterOS devices from the outset. Docker also doesn't support obsolete platforms like 32-bit PowerPC and the TILE CPU architecture, ruling out even more MikroTik products. For our purposes, only routers with ARM or x86 CPUs are in the running from the start.

2. If you're able get past problem #1, you then have to build a Docker container — or find one pre-built — that has juuuust the right Python run-time environment to support fail2ban. Historically speaking, fail2ban is a fairly portable tool, but any given version runs on only a subset of Python versions; roughly, those contemporaneous with the release.

    The practical path out of this trap is to start with an existing Linux distribution that has a version of `fail2ban` ported to it, which brings us to the next problem.

3. Python is not terribly resource efficient. A Linux distro and all of the dependencies needed to run `fail2ban` may exceed the persistent storage space and/or free RAM available on your router. For instance, [this image](https://hub.docker.com/r/crazymax/fail2ban) will take about half the space on the broad class of MikroTik routers with 128 MiB of storage space. This includes all current members of [the CCR2004 line](https://mikrotik.com/products/group/ethernet-routers?filter&s=c&search=ccr2004) and the [RB3011](https://mikrotik.com/product/RB3011UiAS-RM). It even includes otherwise high-end products like the [CCR2116](https://mikrotik.com/product/ccr2116_12g_4splus) and [CCR2216](https://mikrotik.com/product/ccr2216_1g_12xs_2xq). If we step across into MikroTik's SOHO WiFi router range, we find that most of them don't have even 128 MiB of storage space. The only two that do are the [hAP ac³](https://mikrotik.com/product/hap_ac3) and the [Audience](https://mikrotik.com/product/audience).

    With the `fail2ban` taking up over half your storage space, you might not have enough left over to update the host system's RouterOS without running out of storage space!

    There are only a few RouterOS models with more than 128 MiB of storage space. In MikroTik's current router product line, only the [RB4011](https://mikrotik.com/product/rb4011igs_rm), [its wireless cousin](https://mikrotik.com/product/rb4011igs_5hacq2hnd_in), and the [RB5009](https://mikrotik.com/product/rb5009ug_s_in) are big enough to comfortably host such a container on the SoC's built-in flash. Add to that the "Dude edition" of the [RB1100](https://mikrotik.com/product/rb1100ahx4), which comes with an m.2 SSD; that's a pretty premium to pay just to run `fail2ban`.

4. You need even more space on-device for the logs and a syslog server, since you still have to redirect log data into the container, else `fail2ban` will have no input to crawl through. That requirement is likely to exclude all of the 128 MiB routers all by itself. For the routers left in the running after that bottle-necking requirement, you're then back to problem #2 at the top of [the `rsyslog` article](/wiki?name=Remote%20Log%20Server): storing logs on flash storage is likely to materially shorten the service life of the device. Only the swappable m.2 SSD in the RB1100 is immune from this.

5. If you can fix all of the above, you're left with the fact that Docker purposefully isolates the container from its host. In order for the `fail2ban` container to reach out and issue "`/ip/firewall/filter`" commands on the RouterOS host, you have to burn even more space in the container by installing an SSH client and configuring it to connect out to the host. It works, but it calls into question the logic behind the initial wish to run `fail2ban` on the router itself: surely you were hoping to avoid all this `rsyslog` and SSH stuff? Sorry, it can't be avoided.

With the lone exception of the RB1100Dx4 and its substantial m.2 SSD, I think all of the above forces you to the x86 version of RouterOS or to CHR. Sites that can justify the hardware expense for that likely have an underutilized Linux box sitting around, or at least a VM hypervisor or Docker host that can much more easily host a small `fail2ban` host than the router.

`fail2ban` was written with the assumption of a big server-class CPU, plenty of RAM, and ludicrous amounts of storage to host logs with. Don't fight the design.


## <a id="windows"></a>Can I Run This on Windows?

While I see no reason it would be impossible to run `rsyslog` on Windows, there are [many alternatives](https://duckduckgo.com/?q=syslog+windows) that will work just as well for this. You just need one that logs to a plain text file. (As opposed to the Windows event log, a database, etc.)

Getting `fail2ban` running on Windows is trickier.

Superficially, the easiest method on modern versions of Windows is via [WSL](https://docs.microsoft.com/windows/wsl/), since the most common Linux distro fror WSL is Ubuntu, and it has a `fail2ban` package.

The main problem with this is that even with the vast improvement that WSL2 is over WSL1, it's still primarily an interactive user environment. Getting background services to run under WSL remains a massive PITA. [This method](https://superuser.com/a/1506722/14927) looks like the simplest way to solve the problem, but [this one](https://github.com/arkane-systems/genie) might be more robust.

I suspect it's easier to use [Cygwin](https://cygwin.com/) for this purpose instead since it has the `cygrunsrv` facility to run background services via the normal Windows mechanism. The main problem with this option is that there are no ready-made Cygwin packages for `fail2ban`, so you'd have to install it from source. Since `fail2ban` is based on Python, this shouldn't be especially difficult.

Setting up a proper Linux distro under Hyper-V might be easier in the end.

Or, press a Raspberry Pi into service as a remote logging host.
Z aca418a0da6ded64bac3d13ff304a56c