MikroTik Solutions: Update of ”Using fail2ban with Remote syslog”

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview

Artifact ID:	8f8722907351edf02dee7ee6dc4bc02d591d5471ee0b10d7ded009f3329cd6d9
Page Name:	Using fail2ban with Remote syslog
Date:	2022-06-30 10:46:03
Original User:	tangent
Mimetype:	text/x-markdown
Parent:	30122939e4d3820523b31bc5f31bdc9de965ab91dfccff87379ae81cbbf99603 (diff)
Next	b63ae177e446b2a150640b6392d78b728773351948b1c33d55b70c4ecb25e903

Content

Motivation

The popular fail2ban tool monitors log messages for suspicious activity, then issues firewall commands to temporarily ban those hosts. After a configurable period of time, it automatically un-bans them.

RouterOS has logs and a suitable command interface to its firewall, but it's impractical to run fail2ban directly on the router. This guide shows what I believe to be a superior method: set up a machine on the network capable of running fail2ban in a way that lets it monitor the logs on the router's behalf and manage its firewall from afar.

In this article, we're going to set fail2ban up to react to failed SSH connections to the router. You can use this mechanism to react to anything RouterOS can produce a sensible log message about, so long as you can produce a regex that will match the necessary elements of the message. For instance, it would be easy to extend the examples below to make it ban IPs attempting to break in via WinBox, WebFig, or your VPN technology of choice.

Preliminaries

I developed and tested this with a macOS server running Homebrew, so to keep things simple I will use its commands and paths. The basic idea will work on most any Linux or BSD box, but the installation commands and configuration file locations will differ. You're expected to be able to map these trivial details to match your local system's quirks. The ideas are the hard part, and they transfer nicely as long as you can hold up your end.

The examples use the TEST-NET-1 IP address scheme:

IP	Usage
192.0.2.1	RouterOS box
192.0.2.99	Server running `rsyslog` and `fail2ban`

Adjust the IPs as necessary.

Set Up

1. Install & Start `rsyslog`

macOS doesn't have a syslog server built-in, but it's easy to add one. Follow those instructions to make your router send log messages to the host that will run fail2ban.

The default configuration file for the Homebrew rsyslog package contains:

# minimal config file for receiving logs over UDP port 10514
$ModLoad imudp
$UDPServerRun 10514
*.* /usr/local/var/log/rsyslog-remote.log

That will work perfectly well for us as-is. I show it only because we'll refer back to these configurable values below.

2. Set Up the Router

Now that the fail2ban host is receiving log messages from the router(s), you also need to add a firewall filter chain to each monitored router for use in step 3 below:

> /ip/firewall/filter
> add action=jump chain=input jump-target=fail2ban
> add action=return chain=fail2ban

We put this in the "input" chain because we're reacting to SSH connections to the router in this example.

Move the "jump" action to the position in the firewall's input chain where you want fail2ban ban actions to occur. The correct position depends on your local firewall configuration, which I can't predict. The purpose of this chain is to give fail2ban a named position for the ban actions to be inserted. Think of it like a bookmark: wherever you put it, that's where the automated ban/unban actions will occur. Without this, we'd have to hard-code some brittle rule in the ban action like "place-before=42".

3. Install, Configure & Start `fail2ban`

Say:

% brew install fail2ban

Before we can start it, we have to tell fail2ban what files to look at and what to do when it finds a suspicious log line.

The following paths are relative to your fail2ban configuration directory, which is /usr/local/etc/fail2ban under Homebrew, but may be elsewhere on other OSes, such as /etc/fail2ban.

The Filter

Put the following into filter.d/routeros-rsyslog-sshd.conf:

[Definition]

_router     = (<F-ROUTER>[a-zA-Z0-9.-]+</F-ROUTER>)
failregex   = ^\s?<_router> .* login failure for user .* from <HOST> via ssh

This matches rsyslog messages such as:

2022-04-12T02:51:22.573113-06:00 officeswitch.lan system,error,critical login failure for user admin from 192.0.2.88 via ssh

fail2ban automatically removes the timestamp at the beginning, but it doesn't remove the following space, which is why our failregex allows for a single optional whitespace character at the beginning of the pattern. Immediately following that, we match either an IPv4 address or an ASCII host name and store it in the F-ROUTER variable for later use. This part of the regex won't match IPv6 addresses, domain names with odd characters, etc.

The regex then skips over the message's tag list and finds the actual failure message, including the host that caused it.

It's possible to extend the failregex to multiple possible matches, but that's a topic for the fail2ban documentation.

The Action

Now we have to teach fail2ban how to apply the firewall rule changes to RouterOS. Put the following into actions.d/routeros.conf:

[Init]

# SSH credentials to use to log into the router
user   = admin
port   = 22
pubkey = /var/root/.ssh/id_routeros

# SSH connection command
ssh = /usr/bin/ssh
cmd = <ssh> -i <pubkey> -p <port> <user>@<F-ROUTER>

# What to do on ban.
action = drop
chain  = fail2ban

# Command-shortening aliases
iff     = /ip/firewall/filter
what    = src-address="<ip>" chain="<chain>"
addwhat = <what> dst-port="<port>" proto="tcp" action="<action>"


[Definition]

actionban   = <cmd> '<iff> add <addwhat> place-before=0'
actionunban = <cmd> '<iff> remove numbers=[find <what>]'

Notes:

Change the "user" value if your router's full-capability admin user is called something other than "admin".
Change the "port" value if you've got the RouterOS SSH service listening on a nonstandard port. I recommend doing so on externally-facing routers since it reduces script kiddie and bot noise in your logs considerably. Note that this value is used both for logging into the router to send the RouterOS ban/unban commands and also in the firewall filter rules themselves.
The proper name and location of the public key to use for the login depends on how fail2ban-server runs. The scheme in the example is typical of the Homebrew macOS build, but on a Linux box, you'll likely need something like /root/.ssh/id_routeros instead.

Remember that the OpenSSH client is very picky about which key files it will use: if you arrange for fail2ban to run as user bob instead of as root, it will refuse to use anything other than Bob's SSH keys. A nice side benefit of this necessity is that it lets you generate an SSH key specifically for your RouterOS boxes, rather than reuse your OS's default SSH key.
You may wish to change the "action" to "reject" or "tarpit". See the RouterOS filter docs for details.

The "chain" value matches the filter chain we set up above. If you change one, change the other to match.

You shouldn't have to change anything else.

The Jail

Now we can tie the two elements above together. Put the following into the jail.local file at the top of the fail2ban configuration directory. It may or may not already exist. You know you're in the right location if there's a jail.conf file, which you do not modify.

[routeros-rsyslog-sshd]

action     = routeros
backend    = polling
enabled    = true
ignoreself = true
logpath    = /usr/local/var/log/rsyslog-remote.log

Change the logpath variable if your syslog server puts the log information somewhere else.

The ignoreself setting is critical. It causes fail2ban to ignore failures originating from the host running fail2ban-server. Without this, fail2ban could never unban its host IP, since that requires SSH access. Worse, if you'd disabled all other management interfaces on the router (e.g. WinBox) you could end up locking yourself out of the router entirely.

You may have configured your RouterOS firewall to accept all packets from the fail2ban host ahead of this chain to prevent this very sort of lockout problem, but if so, the ban is useless, so you might as well set ignoreself in that case as well. There's no value slowing the router down by giving it useless firewall filters.

Final Steps

With all of this in place, you can start it running:

% sudo brew services start fail2ban
% sudo fail2ban-client status

The ban/unban actions likely won't work to start with. Although you may have been managing your routers over SSH before now, you probably haven't been doing so as the local root user, as fail2ban does. You will likely have to issue a command like the following against each router you're monitoring:

% sudo ssh -i /var/root/.ssh/id_routeros admin@officeswitch.lan

Until you do that and say "yes" to the resulting question, the router won't be in the local root user's known-hosts file, so fail2ban won't be able to automatically log in and send RouterOS commands.

Do this step even if you don't think it's necessary: it tests that the SSH login from the local root account works as you think it should.

That accomplished, you can test that the ban/unban actions actually send the proper RouterOS commands:

% sudo fail2ban-client set routeros-rsyslog-sshd banip 1.2.3.4
% sudo fail2ban-client set routeros-rsyslog-sshd unbanip 1.2.3.4

After the first command, you should find a "drop" rule for IP 1.2.3.4 in the fail2ban chain of your router, which the the second should remove. If you don't see that happen, check fail2ban.log to see whether it's seeing the router's complaint in the log, and if so, where it's getting hung up trying to apply the ban.

You may be wondering why we're doing all of this through sudo even though that's normally frowned-on in the Homebrew world. The reason is that we want fail2ban to run on system startup, not as a user service. The package will let you run the commands without sudo, but you'll end up with a fail2ban-server process running as your normal user, which won't start until you log into your Mac, and it might not have permission to do what it needs to. Worse, you may end up with a conflict where you have two fail2ban servers, one running as root, the other as your normal user, creating a conflict.

Integrating with SELinux

Although I said above I'm focusing on the macOS + Homebrew case, I tested the instructions on a CentOS box as well. For the most part, the translation is trivial, but I did run into a serious problem with getting fail2ban to run SSH commands, since modern Red Hat type systems put a lot of security restrictions on background services. For entirely sensible reasons, the default configuration of the stock fail2ban package won't allow it to launch /usr/bin/ssh, and if you allow that, you then run into problems with it opening the SSH key files in the root user's home directory.

The following SELinux module will grant the necessary permissions:

module f2b-ros-ssh 1.0;

require {
    type fail2ban_t;
    type ssh_exec_t;
    type ssh_port_t;
    type ssh_home_t;
    type admin_home_t;
    class file { execute execute_no_trans getattr map open read };
    class dir { getattr search };
    class tcp_socket name_connect;
}

allow fail2ban_t admin_home_t:file read;
allow fail2ban_t ssh_exec_t:file { execute execute_no_trans getattr map open read };
allow fail2ban_t ssh_home_t:dir search;
allow fail2ban_t ssh_home_t:dir getattr;
allow fail2ban_t ssh_home_t:file open;
allow fail2ban_t ssh_home_t:file { getattr read };
allow fail2ban_t ssh_port_t:tcp_socket name_connect;

Place that into a file called f2b-ros-ssh.te somewhere, then run these commands to compile and load it:

$ checkmodule -Mmo f2b-ros-ssh.mod f2b-ros-ssh.te
$ semodule_package -o f2b-ros-ssh.pp -m f2b-ros-ssh.mod
$ sudo semodule -i f2b-ros-ssh.pp

That having been done, you should no longer be getting action execution errors in /var/log/fail2ban.log.

Musings on Docker

MikroTik added Docker support to RouterOS in 7.4beta4, which in principle allows you to run fail2ban directly on the router, instead of remotely monitoring syslog output. I haven't bothered trying this, because I cannot see that it's practical. For some routers, there's a single fatal limitation that takes them out of the running, and for others, an unhappy concatenation of weak workarounds yields to the same end result:

The Docker cross-compilation tool chain doesn't support any MIPS CPU type, ruling out a large fraction of the RouterOS devices from the outset. Docker also doesn't support obsolete platforms like 32-bit PowerPC and the TILE CPU architecture, ruling out even more MikroTik products. For our purposes, only routers with ARM or x86 CPUs are in the running from the start.
If you're able get past problem #1, you then have to build a Docker container — or find one pre-built — that has juuuust the right Python run-time environment to support fail2ban. Historically speaking, fail2ban is a fairly portable tool, but any given version runs on only a subset of Python versions; roughly, those contemporaneous with the release.

The practical path out of this trap is to start with an existing Linux distribution that has a version of fail2ban ported to it, which brings us to the next problem.
Python is not terribly resource efficient. A Linux distro and all of the dependencies needed to run fail2ban may exceed the persistent storage space and/or free RAM available on your router. For instance, this image will take about half the space on the broad class of MikroTik routers with 128 MiB of storage space. This includes all current members of the CCR2004 line and the RB3011. It even includes otherwise high-end products like the CCR2116 and CCR2216. If we step across into MikroTik's SOHO WiFi router range, we find that most of them don't have even 128 MiB of storage space. The only two that do are the hAP ac³ and the Audience.

With the fail2ban taking up over half your storage space, you might not have enough left over to update the host system's RouterOS without running out of storage space!

There are only a few RouterOS models with more than 128 MiB of storage space. In MikroTik's current router product line, only the RB4011, its wireless cousin, and the RB5009 are big enough to comfortably host such a container on the SoC's built-in flash. Add to that the "Dude edition" of the RB1100, which comes with an m.2 SSD; that's a pretty premium to pay just to run fail2ban.
You need even more space on-device for the logs and a syslog server, since you still have to redirect log data into the container, else fail2ban will have no input to crawl through. That requirement is likely to exclude all of the 128 MiB routers all by itself. For the routers left in the running after that bottle-necking requirement, you're then back to problem #2 at the top of the rsyslog article: storing logs on flash storage is likely to materially shorten the service life of the device. Only the swappable m.2 SSD in the RB1100 is immune from this.
If you can fix all of the above, you're left with the fact that Docker purposefully isolates the container from its host. In order for the fail2ban container to reach out and issue "/ip/firewall/filter" commands on the RouterOS host, you have to burn even more space in the container by installing an SSH client and configuring it to connect out to the host. It works, but it calls into question the logic behind the initial wish to run fail2ban on the router itself: surely you were hoping to avoid all this rsyslog and SSH stuff? Sorry, it can't be avoided.

With the lone exception of the RB1100Dx4 and its substantial m.2 SSD, I think all of the above forces you to the x86 version of RouterOS or to CHR. Sites that can justify the hardware expense for that likely have an underutilized Linux box sitting around, or at least a VM hypervisor or Docker host that can much more easily host a small fail2ban host than the router.

fail2ban was written with the assumption of a big server-class CPU, plenty of RAM, and ludicrous amounts of storage to host logs with. Don't fight the design.

Can I Run This on Windows?

While I see no reason it would be impossible to run rsyslog on Windows, there are many alternatives that will work just as well for this. You just need one that logs to a plain text file. (As opposed to the Windows event log, a database, etc.)

Getting fail2ban running on Windows is trickier.

Superficially, the easiest method on modern versions of Windows is via WSL, since the most common Linux distro fror WSL is Ubuntu, and it has a fail2ban package.

The main problem with this is that even with the vast improvement that WSL2 is over WSL1, it's still primarily an interactive user environment. Getting background services to run under WSL remains a massive PITA. This method looks like the simplest way to solve the problem, but this one might be more robust.

I suspect it's easier to use Cygwin for this purpose instead since it has the cygrunsrv facility to run background services via the normal Windows mechanism. The main problem with this option is that there are no ready-made Cygwin packages for fail2ban, so you'd have to install it from source. Since fail2ban is based on Python, this shouldn't be especially difficult.

Setting up a proper Linux distro under Hyper-V might be easier in the end.

Or, press a Raspberry Pi into service as a remote logging host.

Update of ”Using fail2ban with Remote syslog”