D 2025-07-01T04:10:16.701
L User-Based\sPrivilege\sSeparation
N text/x-markdown
P d80ccb57e2f6ab41ad1361161d2d00afc4a9f13277b89ede50994b46949a028e
U tangent
W 11441
## Motivation
There is an ancient practice in the Unix world where each service gets its own “user.” The practice is so old that when it was a new idea, these “system users” got intermixed with the real human kind, and you’d end up with different system user IDs on each box, depending on the order the users were created. Eventually, OSes began reserving some number of the low-numbered IDs(^Commonly either 500 or 999, the stock behavior for macOS and mainstream Linux distros.) for themselves, starting the real human users past that limit.
Podman obsoletes all of this.
Do not combine the two.
Why? Read on.
## User Namespaces
“Linux container” is a wrapper term for a bunch of disconnected technologies which tools like Podman combine into a useful whole. It is useful to think of this assortment of underlying features as if there had been a concerted effort to add containerization to the Linux kernel, but the fact is that the pieces were added separately over a span of years, in some cases for purposes quite separate from what we now think of as containerization.
I bring this up because it can be important to understand the elements, as in this specific case, where Linux’s [namespaces] feature functionally obsoletes the old “system users” practice. In brief, user namespaces — [userns] for short — provide the same benefit: isolating privilege based on user ID.
[namespaces]: https://www.man7.org/linux/man-pages/man7/namespaces.7.html
[userns]: https://man7.org/linux/man-pages/man7/user_namespaces.7.html
## Default Behavior
Consider this:
``` shell
$ id=$(podman run --rm -d alpine sleep 60)
$ podman top $id user huser
USER HUSER
root 501
```
The first command merely starts a dummy container for us to examine, which will disappear a minute after we started it.
The second then tells Podman to report the user IDs involved, which shows this rootless container running under my host-side user ID(^UID 501 the first regular user ID under macOS, showing the split between system users and human users discussed at the top of this article.) even as it appears to be running as `root` inside.
Already we have most of the protection afforded by the ancient “system users” concept. The assortment of technologies brought to bear by Podman under the label “containerization” ensures that this `sleep 60` container cannot…
### …access the host-side user’s home directory
To allow that, we would have had to pass something like `--volume $HOME:/home/host --workdir /home/host`, as tools like [Distrobox] go out of their way to do, on purpose.
[Distrobox]: https://distrobox.it/
### …send signals to host-side processes
Unless you tell it otherwise, Podman puts each container into a separate [pidns], which you can see with:
``` shell
$ podman run --rm -it alpine ps -eaf
PID USER TIME COMMAND
1 root 0:00 ps -eaf
```
We’re running as a fake “root” user in this instance, and we gave `ps` the “show me everything” flags, yet the only process we see is the one for `ps` itself. Also note that it appears to be PID 1, whereas the real PID 1 on my the container runner is the `/usr/lib/systemd/systemd` instance owned by the CoreOS based `podman machine` it runs under.(^And the container _certainly_ cannot see PID 1 on the true host in my case, macOS’s `/sbin/launchd`.)
[pidns]: https://www.man7.org/linux/man-pages/man7/pid_namespaces.7.html
### …communicate with background processes on the host
This one isn’t hard-and-fast. Rootless Podman’s default configuration blocks *some* of the common IPC methods:
* **old-school System V IPC** is blocked by running each container in a separate [ipcns] by default
* **Unix domain sockets** appear in the filesystem, so the prior point applies: sockets not mapped through with `--volume` are invisible to the container
* **localhost sockets** are inaccessible by virtue of the default `--network=pasta` on rootless containers; you must give `--network=host` to override that
But not all! There are two other major ways Linux background processes may allow IPC:
* **listening on 0.0.0.0/::0** opens access to containers via the host’s public IP, which it may discover via the `host.containers.internal` entry that Podman puts in `/etc/hosts`(^…which might not exist, as with the `podman machine` case.)
* **[abstract sockets][abssock]** bypass the filesystem namespace but not the network namespace, so they may be visible or not, depending on how you set up your container(^This is a particular worry with containers since old versions of `containerd` used an abstract socket, as does DBus to this day. This can allow powerful effects which are off-topic for this article, so let me simply say that you should avoid use of `--network=host` if a primary goal of your use of containerization is improved security.)
If you are looking at the above list and thinking “Aha, Podman isn’t so great after all!” please do realize that the ancient “system users” concept doesn’t block these IPC channels, either.
[abssock]: https://www.man7.org/linux/man-pages/man7/unix.7.html
[ipcns]: https://www.man7.org/linux/man-pages/man7/ipc_namespaces.7.html
## Automatic Unique User Namespace
Everything above applies to the straight-line default `--userns=host` case.(^Full details of the default are [more complicated](https://docs.podman.io/en/latest/markdown/podman-run.1.html#userns-mode).) When our goal is to gain isolation akin to the ancient system users concept, the flag’s other possible values are useful.
Most directly on-point for the purposes of this article is `--userns=auto`. This is a Podman-specific extension(^Docker has a [vaguely similar feature][dsub], being the `--userns-remap` flag on the background container engine. The primary negative consequence of this design is that it affects all containers on that system. Podman's daemonless nature allows every container can arrange UID remapping separately, per each container's needs.) which makes use of preexisting features in Linux, [subordinate UIDs][subuid] and [GIDs][subgid]. Essentially, this manufactures a per-container user on the fly, one which has no connection to the host-side user.
Boom! 💥 System users are fully obsolete now.
[dsub]: https://docs.docker.com/engine/security/userns-remap/
[subgid]: https://www.man7.org/linux/man-pages/man5/subgid.5.html
[subuid]: https://www.man7.org/linux/man-pages/man5/subuid.5.html
## Life in a World Without System Users
If you happen to be a macOS or Windows user, Podman sets up a background “machine” for you, a hidden VM running their customized version of Fedora CoreOS.(^One may say `podman machine init` on a Linux host as well, if one would like to follow along without moving over to a macOS or Windows box.) On such a host, try this:
``` shell
$ podman machine ssh
core@localhost:~$ wc -l /etc/passwd
3 /etc/passwd
core@localhost:~$ exit
$ grep -v '^#' /etc/passwd | wc -l
130 /etc/passwd
```
CoreOS has only 3 users defined: `root`, `unbound`, and `core`, and the only reason there are even three is that `unbound` is a classic “system user,” isolating privilege within the `podman machine` by the old ways, doubtless to avoid needing to set up multiple containers with access gated using more modern mechanisms.
My macOS host is designed on more…let us be charitable and say “classic” lines. Despite being a single-user box, it has **130** users defined! All but a few are system users, owing to the fact that macOS has a development history traceable back to BSD Unix in the early 1980s. Thankfully, it is missing classic system users like `sendmail`, and most of the system users it _does_ define are named with a leading underscore to distinguish them, yet one cannot help but draw a valuable distinction here.
Podman CoreOS doesn’t need these throwbacks. It has namespaces and all the rest of the elements that make up Linux containerization.
## Rootless by Default
One huge reason for the longstanding popularity of the system users concept is that classic Unix (and then Linux) servers started all daemons as root as part of the boot process, even if the service had no need for root privileges. When there _was_ good cause to start as root — as with Apache binding to port 80 — a well-designed daemon would drop root privilege as soon as it could.
And what would it drop **to**? The system user you configured it to use, of course.
While all of that can still be done in the modern Podman world, the pressures that made it the primary path no longer exist.
First `systemd` came along with the concept of [user services][usvc], allowing service startup to be delayed until the system reached multi-user stage, or even until after the user logged in. Second, because such services run under a regular user account, the damage they can do is inherently limited. Then Podman came along and added all of what we’re discussing in this article.
Podman’s [Quadlet] feature lets us combine both capabilities: start a service in the background on boot as a normal user, but under an isolated [userns]. This not only provides every bit of the security the ancient system user concept was meant to provide, it provides more.
[Quadlet]: https://docs.podman.io/en/latest/markdown/podman-systemd.unit.5.html
[usvc]: https://wiki.archlinux.org/title/Systemd/User
## Secure by Default, Porous by Configuration
Linux namespaces are not an all-or-nothing proposition.
One major aspect of this is that there are multiple namespaces, allowing you to erect certain barriers while leaving others down. Podman makes good use of this itself in its eponymous ”pod” feature. By default, containers in a pod share a network namespace while having different pidns and userns, allowing them to communicate via TCP and UDP but not interfere with each other otherwise.
Another aspect is that each namespace is configurable through Podman, giving you a measure of control over the “dimensions” of each barrier. This is not the place to get into details; suffice it to say that your choice is generally not between having the barrier or not. Search the docs for “namespace” to get an idea of the level of control Podman gives you over this aspect of its internal operation.
## License
This work is © 2025 by Warren Young and is licensed under CC BY-NC-SA 4.0