Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Artifact ID: | 217af228c38f2e06f6300a60c2bb5b04b4b2571e460095c73e9c2304b5c10738 |
---|---|
Page Name: | User-Based Privilege Separation |
Date: | 2025-07-01 02:27:40 |
Original User: | tangent |
Mimetype: | text/x-markdown |
Next | e9d9640ad9e9127816f35eb24d093a76426b695235ddc0fa5edb5c60948e75a7 |
Motivation
There is an ancient practice in the Unix world where each service gets its own "user." The practice is so old that when it was a new idea, these "system users" got intermixed with the real human kind, and you'd end up with different system user IDs on each box, depending on the order the users were created. Eventually, OSes began reserving some number of the low-numbered IDs (typically 500 or 1000) for themselves, starting the real human users past that limit.
Podman obsoletes all of this.
Do not combine the two.
Why? Read on.
User Namespaces
"Linux container" is a wrapper term for a bunch of disconnected technologies which tools like Podman combine into a useful whole. It is useful to think of this assortment of underlying features as if there had been a concerted effort to add containerization to the Linux kernel, but the fact is that the pieces were added separately over a span of years, in some cases for purposes quite separate from what we now think of as containerization.
I bring this up because it can be important to understand the elements, as in this specific case, where Linux's namespaces feature functionally obsoletes the old "system users" practice. In brief, user namespaces — userns for short — provide the same benefit: isolating privilege based on user ID.
Podman takes that further with the concept of subordinate UIDs and GIDs.
Default Behavior
Consider this:
$ id=$(podman run --rm -d alpine sleep 60)
$ podman top $id user huser
USER HUSER
root 501
The first command merely starts a dummy container for us to examine, which will disappear a minute after we started it.
The second then tells Podman to report the user IDs involved, which shows this rootless container running under my host-side user ID as 5011 even as it appears to be running as root
inside.
Yet already we have most of the protection afforded by the ancient "system users" concept because of the assortment of technologies brought to bear by Podman under the label "containerization". This sleep 60
container cannot…
…access my home directory
To allow that, we would have had to pass something like --volume $HOME:/home/host:Z --workdir /home/host
, as tools like Distrobox go out of their way to do, on purpose.
…send signals to my processes
Unless you tell it otherwise, Podman puts each container into a separate pidns, which you can see with:
$ podman run --rm -it alpine ps -eaf
PID USER TIME COMMAND
1 root 0:00 ps -eaf
We're running as a fake "root" user in this instance, and we gave ps
the "show me everything" flags, yet the only process we see is the one for ps
itself. Also note that it appears to be PID 1, whereas the real PID 1 on my host is /sbin/launchd
, this being a Mac.
…communicate with my background processes
This one isn't hard-and-fast. Only some of the paths are blocked off by rootless Podman's default configuration:
- old-school System V IPC is blocked off by having each container run in a separate ipcns by default
- Unix domain sockets appear in the filesystem, so the prior point applies: if you don't map it through with
--volume
, the container can't see it - localhost sockets are blocked off by the default
--network=pasta
for rootless containers; you have to give--network=host
to override that
There are two other major ways Linux background processes may allow IPC, however:
- listening on 0.0.0.0 allows use of the
host.containers.internal
entry that Podman puts in/etc/hosts
to give the container access to the host's public IP2 - abstract sockets bypass the filesystem namespace by not the network namespace, so they may be visible or not, depending on how you set up your container3
- ^ That's the first regular user ID under macOS, showing the split between system users and human users discussed at the top of this article.
- ^
…which might not exist, as with the
podman machine
case. - ^
This is a particular worry with containers since old versions of
containerd
used an abstract socket, as does DBus to this day. This can allow powerful effects which are off-topic for this article, so let me simply say that you should avoid use of--network=host
if a primary goal of your use of containerization is improved security.