Podman Solutions: Update of ”User-Based Privilege Separation”

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview

Artifact ID:	db9379b7ed84d30bdd01199abf288afc1a3c55a8c887d04c1d25f568b58176b1
Page Name:	User-Based Privilege Separation
Date:	2025-07-01 04:10:16
Original User:	tangent
Mimetype:	text/x-markdown
Parent:	d80ccb57e2f6ab41ad1361161d2d00afc4a9f13277b89ede50994b46949a028e (diff)
Next	142373e726369e6008675116bbf2c53ee28abdb78df04462257afa0dddc2c96a

Content

Motivation

There is an ancient practice in the Unix world where each service gets its own “user.” The practice is so old that when it was a new idea, these “system users” got intermixed with the real human kind, and you’d end up with different system user IDs on each box, depending on the order the users were created. Eventually, OSes began reserving some number of the low-numbered IDs¹ for themselves, starting the real human users past that limit.

Podman obsoletes all of this.

Do not combine the two.

Why? Read on.

User Namespaces

“Linux container” is a wrapper term for a bunch of disconnected technologies which tools like Podman combine into a useful whole. It is useful to think of this assortment of underlying features as if there had been a concerted effort to add containerization to the Linux kernel, but the fact is that the pieces were added separately over a span of years, in some cases for purposes quite separate from what we now think of as containerization.

I bring this up because it can be important to understand the elements, as in this specific case, where Linux’s namespaces feature functionally obsoletes the old “system users” practice. In brief, user namespaces — userns for short — provide the same benefit: isolating privilege based on user ID.

Default Behavior

Consider this:

$ id=$(podman run --rm -d alpine sleep 60)
$ podman top $id user huser
USER        HUSER
root        501

The first command merely starts a dummy container for us to examine, which will disappear a minute after we started it.

The second then tells Podman to report the user IDs involved, which shows this rootless container running under my host-side user ID² even as it appears to be running as root inside.

Already we have most of the protection afforded by the ancient “system users” concept. The assortment of technologies brought to bear by Podman under the label “containerization” ensures that this sleep 60 container cannot…

…access the host-side user’s home directory

To allow that, we would have had to pass something like --volume $HOME:/home/host --workdir /home/host, as tools like Distrobox go out of their way to do, on purpose.

…send signals to host-side processes

Unless you tell it otherwise, Podman puts each container into a separate pidns, which you can see with:

$ podman run --rm -it alpine ps -eaf
PID   USER     TIME  COMMAND
    1 root      0:00 ps -eaf

We’re running as a fake “root” user in this instance, and we gave ps the “show me everything” flags, yet the only process we see is the one for ps itself. Also note that it appears to be PID 1, whereas the real PID 1 on my the container runner is the /usr/lib/systemd/systemd instance owned by the CoreOS based podman machine it runs under.³

…communicate with background processes on the host

This one isn’t hard-and-fast. Rootless Podman’s default configuration blocks some of the common IPC methods:

old-school System V IPC is blocked by running each container in a separate ipcns by default
Unix domain sockets appear in the filesystem, so the prior point applies: sockets not mapped through with --volume are invisible to the container
localhost sockets are inaccessible by virtue of the default --network=pasta on rootless containers; you must give --network=host to override that

But not all! There are two other major ways Linux background processes may allow IPC:

listening on 0.0.0.0/::0 opens access to containers via the host’s public IP, which it may discover via the host.containers.internal entry that Podman puts in /etc/hosts⁴
abstract sockets bypass the filesystem namespace but not the network namespace, so they may be visible or not, depending on how you set up your container⁵

If you are looking at the above list and thinking “Aha, Podman isn’t so great after all!” please do realize that the ancient “system users” concept doesn’t block these IPC channels, either.

Automatic Unique User Namespace

Everything above applies to the straight-line default --userns=host case.⁶ When our goal is to gain isolation akin to the ancient system users concept, the flag’s other possible values are useful.

Most directly on-point for the purposes of this article is --userns=auto. This is a Podman-specific extension⁷ which makes use of preexisting features in Linux, subordinate UIDs and GIDs. Essentially, this manufactures a per-container user on the fly, one which has no connection to the host-side user.

Boom! 💥 System users are fully obsolete now.

Life in a World Without System Users

If you happen to be a macOS or Windows user, Podman sets up a background “machine” for you, a hidden VM running their customized version of Fedora CoreOS.⁸ On such a host, try this:

$ podman machine ssh
core@localhost:~$ wc -l /etc/passwd
       3 /etc/passwd
core@localhost:~$ exit
$ grep -v '^#' /etc/passwd | wc -l
     130 /etc/passwd

CoreOS has only 3 users defined: root, unbound, and core, and the only reason there are even three is that unbound is a classic “system user,” isolating privilege within the podman machine by the old ways, doubtless to avoid needing to set up multiple containers with access gated using more modern mechanisms.

My macOS host is designed on more…let us be charitable and say “classic” lines. Despite being a single-user box, it has 130 users defined! All but a few are system users, owing to the fact that macOS has a development history traceable back to BSD Unix in the early 1980s. Thankfully, it is missing classic system users like sendmail, and most of the system users it does define are named with a leading underscore to distinguish them, yet one cannot help but draw a valuable distinction here.

Podman CoreOS doesn’t need these throwbacks. It has namespaces and all the rest of the elements that make up Linux containerization.

Rootless by Default

One huge reason for the longstanding popularity of the system users concept is that classic Unix (and then Linux) servers started all daemons as root as part of the boot process, even if the service had no need for root privileges. When there was good cause to start as root — as with Apache binding to port 80 — a well-designed daemon would drop root privilege as soon as it could.

And what would it drop to? The system user you configured it to use, of course.

While all of that can still be done in the modern Podman world, the pressures that made it the primary path no longer exist.

First systemd came along with the concept of user services, allowing service startup to be delayed until the system reached multi-user stage, or even until after the user logged in. Second, because such services run under a regular user account, the damage they can do is inherently limited. Then Podman came along and added all of what we’re discussing in this article.

Podman’s Quadlet feature lets us combine both capabilities: start a service in the background on boot as a normal user, but under an isolated userns. This not only provides every bit of the security the ancient system user concept was meant to provide, it provides more.

Secure by Default, Porous by Configuration

Linux namespaces are not an all-or-nothing proposition.

One major aspect of this is that there are multiple namespaces, allowing you to erect certain barriers while leaving others down. Podman makes good use of this itself in its eponymous ”pod” feature. By default, containers in a pod share a network namespace while having different pidns and userns, allowing them to communicate via TCP and UDP but not interfere with each other otherwise.

Another aspect is that each namespace is configurable through Podman, giving you a measure of control over the “dimensions” of each barrier. This is not the place to get into details; suffice it to say that your choice is generally not between having the barrier or not. Search the docs for “namespace” to get an idea of the level of control Podman gives you over this aspect of its internal operation.

License

^{^} Commonly either 500 or 999, the stock behavior for macOS and mainstream Linux distros.
^{^} UID 501 the first regular user ID under macOS, showing the split between system users and human users discussed at the top of this article.
^{^} And the container certainly cannot see PID 1 on the true host in my case, macOS’s /sbin/launchd.
^{^} …which might not exist, as with the podman machine case.
^{^} This is a particular worry with containers since old versions of containerd used an abstract socket, as does DBus to this day. This can allow powerful effects which are off-topic for this article, so let me simply say that you should avoid use of --network=host if a primary goal of your use of containerization is improved security.
^{^} Full details of the default are more complicated.
^{^} Docker has a vaguely similar feature, being the --userns-remap flag on the background container engine. The primary negative consequence of this design is that it affects all containers on that system. Podman's daemonless nature allows every container can arrange UID remapping separately, per each container's needs.
^{^} One may say podman machine init on a Linux host as well, if one would like to follow along without moving over to a macOS or Windows box.