Rootless podman containers under system accounts, managed and enabled at boot with systemd

While you can run containers as root on the host, or run rootless containers as your regular user (either as uid 0 or any another), sometimes it’s nice to create specific users to run one or more containers. This provides neat separation and can also improve security posture.

We also want those containers to act as regular system services; managed with systemd to auto-restart and be enabled on boot.

This assumes you’ve just installed Fedora (or RHEL/CentOS 8+) server and have a local user with sudo privileges. First, let’s also install some SELinux tools.

sudo dnf install -y /usr/sbin/semanage

Setting up the system user

Let’s create our system user, placing their home dir under /var/lib. For the purposes of this example I’m using a service account of busybox but this can be anything unique on the box. Note, if you prefer to have a real shell, then swap /bin/false with /bin/bash or other.

export SERVICE="busybox"

sudo useradd -r -m -d "/var/lib/${SERVICE}" -s /bin/false "${SERVICE}"

In order for our user to run containers automatically on boot, we need to enable systemd linger support. This will ensure that a user manager is run for the user at boot and kept around after logouts.

sudo loginctl enable-linger "${SERVICE}"

Configure homedir for containers

Next, we create a data directory to be passed in as a volume to the container (some containers may require more, but this is a good start).

sudo -H -u "${SERVICE}" bash -c "mkdir ~/data"

We need to set some SELinux context on the home directory, otherwise rootless containers won’t run. This will change the service account’s directory under /var/lib from var_lib_t to user_home_dir_t. It also sets the data directory to be of type container_file_t so that containers will be able to access it (technically this isn’t necessary if you use the :z or :Z flag for the volume when running the container, but I’m keeping it in for broader context).

sudo semanage fcontext -a -t user_home_dir_t \
"/var/lib/${SERVICE}(/.+)?"

sudo semanage fcontext -a -t container_file_t \
"/var/lib/${SERVICE}/data(/.+)?"

sudo restorecon -Frv /var/lib/"${SERVICE}"

Enable rootless containers

By default, system users do not get any subuid ranges which means it will not be able to run rootless containers. Setting this up is done manually with a little bit of bash magic.

NEW_SUBUID=$(($(tail -1 /etc/subuid |awk -F ":" '{print $2}')+65536))
NEW_SUBGID=$(($(tail -1 /etc/subgid |awk -F ":" '{print $2}')+65536))

sudo usermod \
--add-subuids ${NEW_SUBUID}-$((${NEW_SUBUID}+65535)) \
--add-subgids ${NEW_SUBGID}-$((${NEW_SUBGID}+65535)) \
  "${SERVICE}"

Great! Now we have our system user ready to go.

Switch to system user

Let’s switch to our system user (note this is slightly more complicated as we have /bin/false as the shell, so this puts us in the right homedir).

sudo -H -u "${SERVICE}" bash -c 'cd; bash'

Running a rootless container

We have a dedicated user which can run rootless containers, so when we start a container, we can tell it to run as root with the --user 0:0 option (or -u 0:0 for short). This way the process in the container will be actually run as our system user on the host.

OK, now let’s run a container! Note we are running this in --detached (-d for short) mode so that it runs in the background. We’re also enabling interactive mode with --interactive (-i for short) and allocating a pseudo terminal with --terminal (-t for short) which is required for busybox to work. You may recall from earlier posts that the :z option after the volume sets an SELinux context on the data directory, explicit to this container via MCS labels.

podman run -u 0:0 -dit --name busybox -v data:/data:z busybox

Do a simple test to make sure we can connect to the running container.

podman exec busybox sh -c 'echo -n "In this container, I am ";id -un'

You should see that you are root

In this container, I am root

Managing and enabling the container with systemd

OK, so we can create a dedicated user on the host, and we can run a container, great! But how do we get that non-root user to automatically start their container on boot? Enter systemd.

In order to interact with systemd, we must ensure XDG_RUNTIME_DIR is set (this is because we switched user, if we ssh in instead, it will be set up for us, but our system user has no shell so that’s not possible).

export XDG_RUNTIME_DIR=/run/user/"$(id -u)"

You should be able to connect to systemd now, let’s test it.

systemctl --user

Remember when we created the account we enabled linger support? That’s critical when running containers without an actual login.

Let’s make the user systemd directory.

mkdir -p ~/.config/systemd/user/

Use podman to generate a systemd service file.

podman generate systemd --restart-policy always --name busybox > \
~/.config/systemd/user/container-busybox.service

sed -i s/^KillMode=.*/KillMode=control-group/ \
~/.config/systemd/user/container-busybox.service

Next, we reload systemd so that it can see the new service.

systemctl --user daemon-reload

Now we are able to interact with the container using systemd. Let’s enable it on boot and check the status!

systemctl --user enable --now container-busybox
systemctl --user status container-busybox

On boot, this service should auto-start and can be managed via systemd.

So the final step is to reboot the host, switch back to the user and ensure the container is running.

16 thoughts on “Rootless podman containers under system accounts, managed and enabled at boot with systemd

  1. This is neat, I like it.
    Question. How about CI/CD flow here. Assuming I have a gitlab-ci and can execute SSH via pubkey with that service user. If I execute podman rm, then pull fresh and run fresh the image, will the systemd still work after reboot (and autostart) ?

  2. It should, so long as the name of your container is the same as that’s what systemd uses (check the service file you create).

    -c

  3. Very helpful article, thanks!

    For many containers, having the systemd services scattered over just as many users and home dirs is cumbersome to manage. Do you know if its possible to leverage the systemd User= and Group= directives to run rootless containers from the systemd –system instance? Or would be the preferred approach to just run all containers under the same (non-root-) user?

  4. Hi Kilian, I’m not sure about that but is is probably possible to run as a different user. If I get some time I’ll test it out and post an update.

  5. Thanks! This was very helpful in getting a rootless podman deployment of keycloak up and running.

  6. This post is brilliant. I have found pieces of this info all over, but here it all is in one place. Thank you!

  7. Hi, I followed your recommendations until systemd units. To persist container across reboots I put a user systemd unit in user’s folder ~/.config/systemd/user/podman-restart.service

    This is a slightly modified version of system’s podman-restart, I only added “Environment=PODMAN_SYSTEMD_UNIT=%n” line. Did not need XGD related variables to work. The original system starts rootful containers at boot (/usr/bin/podman $LOGGING start –all –filter restart-policy=always), but cannot start other user’s containers.

    Still, systemd units for standalone containers are deprecated (podman generate systemd), and quadlet architecture should be used. This should be noted in this post.

  8. FYI:

    Add the “-F” option to the “useradd” command and you can skip the whole manual subuid/subgid bit.

    Also, since the Podman systemd generation is deprecated now, maybe update the article to state that since Podman 3.3 you can use “systemctl –user enable podman-restart” (once) and all containers with the proper restart option will restart automagically.

Leave a Reply

Your email address will not be published. Required fields are marked *