Podman Solutions: Update of ”User-Based Privilege Separation”

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview

Artifact ID:	217af228c38f2e06f6300a60c2bb5b04b4b2571e460095c73e9c2304b5c10738
Page Name:	User-Based Privilege Separation
Date:	2025-07-01 02:27:40
Original User:	tangent
Mimetype:	text/x-markdown
Next	e9d9640ad9e9127816f35eb24d093a76426b695235ddc0fa5edb5c60948e75a7

Content

Motivation

There is an ancient practice in the Unix world where each service gets its own "user." The practice is so old that when it was a new idea, these "system users" got intermixed with the real human kind, and you'd end up with different system user IDs on each box, depending on the order the users were created. Eventually, OSes began reserving some number of the low-numbered IDs (typically 500 or 1000) for themselves, starting the real human users past that limit.

Podman obsoletes all of this.

Do not combine the two.

Why? Read on.

User Namespaces

"Linux container" is a wrapper term for a bunch of disconnected technologies which tools like Podman combine into a useful whole. It is useful to think of this assortment of underlying features as if there had been a concerted effort to add containerization to the Linux kernel, but the fact is that the pieces were added separately over a span of years, in some cases for purposes quite separate from what we now think of as containerization.

I bring this up because it can be important to understand the elements, as in this specific case, where Linux's namespaces feature functionally obsoletes the old "system users" practice. In brief, user namespaces — userns for short — provide the same benefit: isolating privilege based on user ID.

Podman takes that further with the concept of subordinate UIDs and GIDs.

Default Behavior

Consider this:

$ id=$(podman run --rm -d alpine sleep 60)
$ podman top $id user huser
USER        HUSER
root        501

The first command merely starts a dummy container for us to examine, which will disappear a minute after we started it.

The second then tells Podman to report the user IDs involved, which shows this rootless container running under my host-side user ID as 501¹ even as it appears to be running as root inside.

Yet already we have most of the protection afforded by the ancient "system users" concept because of the assortment of technologies brought to bear by Podman under the label "containerization". This sleep 60 container cannot…

…access my home directory

To allow that, we would have had to pass something like --volume $HOME:/home/host:Z --workdir /home/host, as tools like Distrobox go out of their way to do, on purpose.

…send signals to my processes

Unless you tell it otherwise, Podman puts each container into a separate pidns, which you can see with:

$ podman run --rm -it alpine ps -eaf
PID   USER     TIME  COMMAND
    1 root      0:00 ps -eaf

We're running as a fake "root" user in this instance, and we gave ps the "show me everything" flags, yet the only process we see is the one for ps itself. Also note that it appears to be PID 1, whereas the real PID 1 on my host is /sbin/launchd, this being a Mac.

…communicate with my background processes

This one isn't hard-and-fast. Only some of the paths are blocked off by rootless Podman's default configuration:

old-school System V IPC is blocked off by having each container run in a separate ipcns by default
Unix domain sockets appear in the filesystem, so the prior point applies: if you don't map it through with --volume, the container can't see it
localhost sockets are blocked off by the default --network=pasta for rootless containers; you have to give --network=host to override that

There are two other major ways Linux background processes may allow IPC, however:

listening on 0.0.0.0 allows use of the host.containers.internal entry that Podman puts in /etc/hosts to give the container access to the host's public IP²
abstract sockets bypass the filesystem namespace by not the network namespace, so they may be visible or not, depending on how you set up your container³

^{^} That's the first regular user ID under macOS, showing the split between system users and human users discussed at the top of this article.
^{^} …which might not exist, as with the podman machine case.
^{^} This is a particular worry with containers since old versions of containerd used an abstract socket, as does DBus to this day. This can allow powerful effects which are off-topic for this article, so let me simply say that you should avoid use of --network=host if a primary goal of your use of containerization is improved security.