Files
volt-vmm/docs/landlock-caps-implementation.md
Karl Clinger 40ed108dd5 Volt VMM (Neutron Stardust): source-available under AGPSL v5.0
KVM-based microVMM for the Volt platform:
- Sub-second VM boot times
- Minimal memory footprint
- Landlock LSM + seccomp security
- Virtio device support
- Custom kernel management

Copyright (c) Armored Gates LLC. All rights reserved.
Licensed under AGPSL v5.0
2026-03-21 01:04:35 -05:00

7.8 KiB
Raw Permalink Blame History

Landlock & Capability Dropping Implementation

Date: 2026-03-08
Status: Implemented and tested

Overview

Volt VMM now implements three security hardening layers applied after all privileged setup is complete (KVM, TAP, sockets) but before the vCPU run loop:

  1. Landlock filesystem sandbox (kernel 5.13+, optional, default-enabled)
  2. Linux capability dropping (always)
  3. Seccomp-BPF syscall filtering (always, was already implemented)

Architecture

┌─────────────────────────────────────────────────────────────┐
│  Layer 5: Seccomp-BPF (always unless --no-seccomp)          │
│           72 syscalls allowed, KILL_PROCESS on violation     │
├─────────────────────────────────────────────────────────────┤
│  Layer 4: Landlock (optional, kernel 5.13+)                 │
│           Filesystem path restrictions                       │
├─────────────────────────────────────────────────────────────┤
│  Layer 3: Capability dropping (always)                      │
│           All ambient, bounding, and effective caps dropped  │
├─────────────────────────────────────────────────────────────┤
│  Layer 2: PR_SET_NO_NEW_PRIVS (always)                      │
│           Prevents privilege escalation via execve            │
├─────────────────────────────────────────────────────────────┤
│  Layer 1: KVM isolation (inherent)                          │
│           Hardware virtualization boundary                    │
└─────────────────────────────────────────────────────────────┘

Files

File Purpose
vmm/src/security/mod.rs Module root, apply_security() entrypoint, shared types
vmm/src/security/capabilities.rs drop_capabilities() — prctl + capset
vmm/src/security/landlock.rs apply_landlock() — Landlock ruleset builder
vmm/src/security/seccomp.rs apply_seccomp_filter() — seccomp-bpf (pre-existing)

Part 1: Capability Dropping

Implementation (capabilities.rs)

The drop_capabilities() function performs four operations:

  1. prctl(PR_SET_NO_NEW_PRIVS, 1) — prevents privilege escalation via execve. Required by both Landlock and seccomp.

  2. prctl(PR_CAP_AMBIENT, PR_CAP_AMBIENT_CLEAR_ALL) — clears all ambient capabilities. Gracefully handles EINVAL on kernels without ambient cap support.

  3. prctl(PR_CAPBSET_DROP, cap) — iterates over all capability numbers (063) and drops each from the bounding set. Handles EPERM (expected when running as non-root) and EINVAL (cap doesn't exist) gracefully.

  4. capset() syscall — clears the permitted, effective, and inheritable capability sets using the v3 capability API (two 32-bit words). Handles EPERM for non-root processes.

Error Handling

  • Running as non-root: EPERM on PR_CAPBSET_DROP and capset is logged as debug/warning but not treated as fatal, since the process is already unprivileged.
  • All other errors are fatal.

Part 2: Landlock Filesystem Sandboxing

Implementation (landlock.rs)

Uses the landlock crate (v0.4.4) which provides a safe Rust API over the Landlock syscalls with automatic ABI version negotiation.

Allowed Paths

Path Access Purpose
Kernel image Read-only Boot the VM
Initrd (if specified) Read-only Initial ramdisk
Disk images (--rootfs) Read-write VM storage
API socket directory RW + MakeSock Unix socket API
/dev/kvm RW + IoctlDev KVM device
/dev/net/tun RW + IoctlDev TAP networking
/dev/vhost-net RW + IoctlDev vhost-net (if present)
/proc/self Read-only Process info, fd access
Extra --landlock-rule paths User-specified Hotplug, custom

ABI Compatibility

  • Target ABI: V5 (kernel 6.10+, includes IoctlDev)
  • Minimum: V1 (kernel 5.13+)
  • Mode: Best-effort — the crate automatically strips unsupported features
  • Unavailable: Logs a warning and continues without filesystem sandboxing

On kernel 6.1 (like our test system), the sandbox is "partially enforced" because some V5 features (like IoctlDev from ABI V5) are unavailable. Core filesystem restrictions are still active.

CLI Flags

# Disable Landlock entirely
volt-vmm --kernel vmlinux -m 256M --no-landlock

# Add extra paths for hotplug or shared data
volt-vmm --kernel vmlinux -m 256M \
  --landlock-rule /tmp/hotplug:rw \
  --landlock-rule /data/shared:ro

Rule format: path:access where access is:

  • ro, r, read — read-only
  • rw, w, write, readwrite — full access

Application Order

The security layers are applied in this order in main.rs:

1. All initialization complete (KVM, memory, kernel, devices, API socket)
2. Landlock applied (needs landlock syscalls, sets PR_SET_NO_NEW_PRIVS)
3. Capabilities dropped (needs prctl, capset)
4. Seccomp applied (locks down syscalls, uses TSYNC for all threads)
5. vCPU run loop starts

This ordering is critical: Landlock and capability syscalls must be available before seccomp restricts the syscall set.

Testing

Test Results (kernel 6.1.0-42-amd64)

# Minimal kernel — boots successfully
$ timeout 10 ./target/release/volt-vmm --kernel comparison/kernels/minimal-hello.elf -m 128M
  INFO Applying Landlock filesystem sandbox
  WARN Landlock sandbox partially enforced (kernel may not support all features)
  INFO Dropping Linux capabilities
  INFO All capabilities dropped successfully
  INFO Applying seccomp-bpf filter (72 syscalls allowed)
  INFO Seccomp filter active
  Hello from minimal kernel!
  OK

# Full Linux kernel — boots successfully
$ timeout 10 ./target/release/volt-vmm --kernel kernels/vmlinux -m 256M
  INFO Applying Landlock filesystem sandbox
  WARN Landlock sandbox partially enforced
  INFO Dropping Linux capabilities
  INFO All capabilities dropped successfully
  INFO Applying seccomp-bpf filter (72 syscalls allowed)
  [kernel boot messages, VFS panic due to no rootfs — expected]

# --no-landlock flag works
$ volt-vmm --kernel ... -m 128M --no-landlock
  WARN Landlock disabled via --no-landlock
  INFO Dropping Linux capabilities
  INFO All capabilities dropped successfully

# --landlock-rule flag works
$ volt-vmm --kernel ... -m 128M --landlock-rule /tmp:rw
  DEBUG Landlock: user rule rw access to /tmp

Dependencies Added

# vmm/Cargo.toml
landlock = "0.4"   # Landlock LSM helpers (crates.io, MIT/Apache-2.0)

No other new dependencies — libc was already present for the prctl/capset calls.

Future Improvements

  1. Network restrictions — Landlock ABI V4 (kernel 6.7+) supports TCP port filtering. Could restrict API socket to specific ports.

  2. IPC scoping — Landlock ABI V6 (kernel 6.12+) can scope signals and abstract Unix sockets.

  3. Root-mode bounding set — When running as root, the full bounding set can be dropped. Currently gracefully skips on EPERM.

  4. seccomp + Landlock integration test — Verify that the seccomp allowlist includes all syscalls needed after Landlock is active (it does, since Landlock is applied first, but a regression test would be good).