Using volumes with rootless podman, explained

Podman is the new tool for running containers. It’s daemonless (unlike docker) and it’s designed to play a bit nicer in the Linux ecosystem, from the ground up.

Podman is architected like classic Linux tools – it’s lightweight, it doesn’t ask for more permissions than it needs, and it cooperates willingly with SELinux. (Unlike some of us!)

However, you might have realised that some commands which worked fine in Docker, simply don’t work in Podman; doubly so, when SELinux is in charge.

You’re tearing your hair out (if you have any).

You get all sorts of errors about permissions, and…. you desperately want to avoid disabling SELinux, but you just end up reaching for setenforce 0 and running everything with sudo.

Stop! 🚏 No, I mean stop. 🛑 We can fix this.

You don't have to SELinux if you run as root - meme

Just say no.

In this article I’ll go through how to share a volume from your host machine with podman, when you’re running rootless containers.

I’ll also share a podman command you can use which really helps you in these situations.

Rootless podman vs rootful at a glance

This table shows you the 4 main rootless/rootful operating modes of Podman, and how they really work in practice.

  • The first column is how you’re running podman. Do you run the command as the root user (e.g. sudo podman run....), or not?

  • The second column is the user that the container process is running as. You can set this with podman using the -u option - e.g. podman run -u root...

  • The third column shows who the process is really running as, on the host. That is, if you run ps -ef on the host, and find your containerised process, this is the UID that you will see in the process list.

Running podman as.. With container process running as.. The actual UID visible on the host is…
root root 0
root non-root 0
non-root root Your UID
non-root non-root A non-root UID

You need to understand how rootless podman works

Rootless podman (running Podman as a non-root user) needs to do some gymnastics to get the same container experience you’re familiar with from docker, but without requiring root.

Juggler icon

Podman juggles UIDs

Source: monkik/Flaticon

When you run rootless podman, it uses a user namespace to map between the user IDs in the container and the user IDs on your host.

What are user namespaces?

User namespaces are a Linux feature that allow you to wrap and isolate a process, so that it appears to be running under its own, different set of security identities – such as user IDs and group IDs.

From the manpage:

A process’s user and group IDs can be different inside and outside a user namespace.

For example, a user namespace allows a containerised process to run as one user inside the user namespace, like UID 200. But then outside the user namespace (container), it’s actually running as a different UID completely (e.g. 100199).

Check out this excellent article by Dan Walsh on how user namespaces work with Podman.

In Podman’s user namespace, there is a new set of user IDs and group IDs, which are separate from the UIDs and GIDs on your host.

Illustration of mapping of User IDs between a parent and user namepsace

By using a user namespace, and using a map of UIDs, Podman can make a container process can appear to run as user 200 inside a container, but actually it’s running as a different user ID on the host.

You can see the actual mappings by viewing the file /proc/self/uid_map inside your container.

The Linux manpages have lots more detail on how this works. Check out man 7 user_namespaces

If you’ve got a few minutes, and you’re looking for a snappy video that explains this well, check out this video from Red Hat on rootless Podman:

Rootless containers share the same user namespace

All rootless containers run by you, are run inside the same user namespace. The engineers behind Podman explain this in this article on the ‘behind-the-scenes’ process of Podman:

All rootless containers must be run in the same user namespace. If they are not, some things (like sharing the network namespace from another container) would be impossible.

By using the same user namespace, your containers can share resources with each other, without needing to ask for root privileges.

It uses this user namespace to mount filesystems, or run a container which accesses more than one user ID (UID) or group ID (GID).

This mapping is fine for most situations, except when the container needs to be able to share something with the host, like a volume.

But - here’s the important thing:

When the container runs, any volumes which are shared with it, will appear inside the user namespace as owned by root/root.

When the container runs, any volumes which are shared with it, will appear inside the user namespace as owned by root/root. Because the mapping will map your UID on the host (e.g. 1000) as root (0) in the container.

Here’s an example of what I mean. I run rootless podman as a non-root user (user 200), and mount a volume from my host:

$ podman run --user 200 -it -v $(pwd)/myfolder:/mnt/myfolder:Z busybox

Now inside the container, look at the permissions on the mounted directory:

~ $ ls -al /mnt
total 16
drwxr-xr-t    3 root     root          4096 Jan  3 15:50 .
dr-xr-xr-x   14 root     root          4096 Jan  3 15:50 ..
drwxr-xr-x    2 root     root          4096 Jan  3 15:50 public

The directory is owned by root – not user “200”, or my user ID.

This means that if you’re running your container process as a non-root user, it won’t be able to write to that directory.

So how do we change the owner of the directory in the container, so the user can write to it?

And how can we troubleshoot and run commands in that same user namespace, when things go wrong – without having to start a container?

This is where podman unshare comes in.

podman unshare lets you run a command in the same user namespace as your containers

podman unshare runs a command in Podman’s modified user namespace. It’s intended to be run with rootless podman (where you run podman as a non-root user).

Since Podman uses user namespaces to make the magic of different user IDs and group IDs work, it would be helpful if we had a way of accessing this same user namespace, when we need to fix or investigate something.

This is where podman unshare comes in. From the manpage for podman-unshare:

podman unshare is useful for troubleshooting unprivileged operations and for manually clearing storage and other data related to images and containers.

It uses the unshare command, part of the Linux kernel. unshare is one of the commands that actually makes user namespaces possible, and therefore containers.

From the Linux manpage for unshare:

unshare - run program in new namespaces
The unshare command creates new namespaces … and then executes the specified program.

Why is it called "unshare"?

The reason it’s called unshare is nothing to do with unsharing of files, or making sure your parents don’t see your embarrassing photos on Facebook.

It’s because it runs the program in a namespace which is separate or “un-shared” from its parent process.

If you’re new to all this, unshare is not exactly a helpfully-named command, but that’s Linux history for you. ;)

When you use podman unshare, you’re effectively jumping out of your normal namespace, and into the Podman user namespace.

Out of Space - Melody line

Out of (user) namespace

Source: Ukulele Go

It’s like executing the unshare command as the Podman process. So you can execute a command in the same user namespace, such as changing file permissions, or browsing your filesystem, if you need to.

You can check out Dan Walsh’s article on Opensource.com for more information on how rootless Podman works.

So now we know what podman unshare does, how is it relevant to sharing a directory?

How to allow a rootless podman container to write to a volume

So this brings us on to the task in hand. If you want to run rootless podman to run containers, but you want to share a directory from your host, how do you do this?

I do this on my laptop, because I run Nexus, the artifact repository, in a container. I want it to save its artifacts to a location on my host machine, so that if the container dies, I don’t have to download everything again.

Putting it all together, here are the steps:

  • Determine which user account is running the process within your container.

  • Enter Podman’s user namespace, and grant this user permissions to write to your directory.

  • Mount the volume when you run the container, add the proper SELinux label to allow the container user to write.

  • Check, and double-check ;-)

Get the UID of the container user first

First you need to know which UID the container is running as.

The user is specified in the Dockerfile of the image you’re running (in the USER line).

Or, when you’re running the container, you can set the user explicitly using the podman run --user <id> option.

Use podman unshare chown to grant the container user ID permissions to write to your directory

Next we need to change the UID/GID of the volume directory in the rootless Podman user namespace, to make it the same as the UID/GID of the container user.

In my case, the nexus container runs as UID 200. I can see how my rootless container would “see” the filesystem using podman unshare:

$ podman unshare ls -al /home/tom/myshares
drwx--x--x. 58 root root   4096 Dec 25 13:45 .
drwx--x--x.  8 root root   4096 Nov 18 16:00 ..
drwxrwxr-x.  6 root root   4096 May 18  2020 nexus2

As you can see, the nexus2 directory is owned by root.

This isn’t good, because when my container starts, it will want to write to that directory. And since the user running nexus in the container is user 200 and not root, it will fail.

So I use the chown command to set the UID/GID of the shared directory, and I run this command in my Podman user namespace:

podman unshare chown 200:200 -R /home/tom/myshares/nexus2

So far so good.

How can we check that things are working?

Check that the permissions are OK, by listing the directory inside the container

Now I want to make sure that when my Nexus container starts, it will be able to write to that folder.

So I run the rootless container. Using the -v option, I mount my volume as /sonatype-work in the container. And I override the entrypoint, so that I launch straight into a shell (/bin/sh):

podman run -it --rm --name nexus2 \
    -v /home/tom/myshares/nexus2:/sonatype-work:Z \
    sonatype/nexus /bin/sh

Note that I add the :Z flag to the volume. This tells Podman to label the volume content as “private unshared” with SELinux. This label allows the container to write to the volume, but doesn’t allow the volume to be shared with other containers. This syntax is also used in the docker command.

Once I’m inside the container, I can check that I now “own” the directory as the nexus user. If I own the directory, then I should be able to write to it:

$ ls -al / | grep sonatype-work
drwxr-xr-x.  15 nexus nexus 4096 Sep 27 13:29 sonatype-work

This ls command shows me that the sonatype-work directory now appears to be owned by nexus/nexus when inside the container. Result!

When I next start the container, the nexus user will have permissions to write to this directory.

Or, you could just run your process as the root user inside the container

Another option is to run as the root user inside the container. (Yes, rather confusingly, you can run a rootless container, and set the container user to root, and the process will run as your unprivileged user on the host)

To run your process as the root user in the container, use -u root:

podman run -it --rm --name nexus2 \
    -v /home/tom/myshares/nexus2:/sonatype-work:Z \
    -u root \
    sonatype/nexus /bin/sh

Don’t forget to add the :Z suffix.

Now when I enter my sh shell, I will be root inside the container.

Running as root in the container is a bit more portable, but it’s cheating. We really shouldn’t be running anything as root, even inside a container. For this reason, this solution isn’t recommended for production setups.

TLDR

You’ve skipped this whole article and that makes me a little sad but I’m not surprised.

But, if you really want the super-condensed TLDR:

  • Rootless Podman uses user namespaces to run container processes

  • podman unshare allows you to run a command inside the Podman user namespace

  • Change the owner of the directory inside this namespace, to ensure your container user can write to it.

Comments

Got any thoughts on what you've just read? Anything wrong, or no longer correct? Sign in with your GitHub account to leave a comment.

(All comments get added as Issues in our GitHub repo here, using the comments tool Utterances)