KVM-based microVMM for the Volt platform: - Sub-second VM boot times - Minimal memory footprint - Landlock LSM + seccomp security - Virtio device support - Custom kernel management Copyright (c) Armored Gates LLC. All rights reserved. Licensed under AGPSL v5.0
7.8 KiB
Landlock & Capability Dropping Implementation
Date: 2026-03-08
Status: Implemented and tested
Overview
Volt VMM now implements three security hardening layers applied after all privileged setup is complete (KVM, TAP, sockets) but before the vCPU run loop:
- Landlock filesystem sandbox (kernel 5.13+, optional, default-enabled)
- Linux capability dropping (always)
- Seccomp-BPF syscall filtering (always, was already implemented)
Architecture
┌─────────────────────────────────────────────────────────────┐
│ Layer 5: Seccomp-BPF (always unless --no-seccomp) │
│ 72 syscalls allowed, KILL_PROCESS on violation │
├─────────────────────────────────────────────────────────────┤
│ Layer 4: Landlock (optional, kernel 5.13+) │
│ Filesystem path restrictions │
├─────────────────────────────────────────────────────────────┤
│ Layer 3: Capability dropping (always) │
│ All ambient, bounding, and effective caps dropped │
├─────────────────────────────────────────────────────────────┤
│ Layer 2: PR_SET_NO_NEW_PRIVS (always) │
│ Prevents privilege escalation via execve │
├─────────────────────────────────────────────────────────────┤
│ Layer 1: KVM isolation (inherent) │
│ Hardware virtualization boundary │
└─────────────────────────────────────────────────────────────┘
Files
| File | Purpose |
|---|---|
vmm/src/security/mod.rs |
Module root, apply_security() entrypoint, shared types |
vmm/src/security/capabilities.rs |
drop_capabilities() — prctl + capset |
vmm/src/security/landlock.rs |
apply_landlock() — Landlock ruleset builder |
vmm/src/security/seccomp.rs |
apply_seccomp_filter() — seccomp-bpf (pre-existing) |
Part 1: Capability Dropping
Implementation (capabilities.rs)
The drop_capabilities() function performs four operations:
-
prctl(PR_SET_NO_NEW_PRIVS, 1)— prevents privilege escalation via execve. Required by both Landlock and seccomp. -
prctl(PR_CAP_AMBIENT, PR_CAP_AMBIENT_CLEAR_ALL)— clears all ambient capabilities. Gracefully handles EINVAL on kernels without ambient cap support. -
prctl(PR_CAPBSET_DROP, cap)— iterates over all capability numbers (0–63) and drops each from the bounding set. Handles EPERM (expected when running as non-root) and EINVAL (cap doesn't exist) gracefully. -
capset()syscall — clears the permitted, effective, and inheritable capability sets using the v3 capability API (two 32-bit words). Handles EPERM for non-root processes.
Error Handling
- Running as non-root: EPERM on
PR_CAPBSET_DROPandcapsetis logged as debug/warning but not treated as fatal, since the process is already unprivileged. - All other errors are fatal.
Part 2: Landlock Filesystem Sandboxing
Implementation (landlock.rs)
Uses the landlock crate (v0.4.4) which provides a safe Rust API over the
Landlock syscalls with automatic ABI version negotiation.
Allowed Paths
| Path | Access | Purpose |
|---|---|---|
| Kernel image | Read-only | Boot the VM |
| Initrd (if specified) | Read-only | Initial ramdisk |
| Disk images (--rootfs) | Read-write | VM storage |
| API socket directory | RW + MakeSock | Unix socket API |
/dev/kvm |
RW + IoctlDev | KVM device |
/dev/net/tun |
RW + IoctlDev | TAP networking |
/dev/vhost-net |
RW + IoctlDev | vhost-net (if present) |
/proc/self |
Read-only | Process info, fd access |
Extra --landlock-rule paths |
User-specified | Hotplug, custom |
ABI Compatibility
- Target ABI: V5 (kernel 6.10+, includes
IoctlDev) - Minimum: V1 (kernel 5.13+)
- Mode: Best-effort — the crate automatically strips unsupported features
- Unavailable: Logs a warning and continues without filesystem sandboxing
On kernel 6.1 (like our test system), the sandbox is "partially enforced" because
some V5 features (like IoctlDev from ABI V5) are unavailable. Core filesystem
restrictions are still active.
CLI Flags
# Disable Landlock entirely
volt-vmm --kernel vmlinux -m 256M --no-landlock
# Add extra paths for hotplug or shared data
volt-vmm --kernel vmlinux -m 256M \
--landlock-rule /tmp/hotplug:rw \
--landlock-rule /data/shared:ro
Rule format: path:access where access is:
ro,r,read— read-onlyrw,w,write,readwrite— full access
Application Order
The security layers are applied in this order in main.rs:
1. All initialization complete (KVM, memory, kernel, devices, API socket)
2. Landlock applied (needs landlock syscalls, sets PR_SET_NO_NEW_PRIVS)
3. Capabilities dropped (needs prctl, capset)
4. Seccomp applied (locks down syscalls, uses TSYNC for all threads)
5. vCPU run loop starts
This ordering is critical: Landlock and capability syscalls must be available before seccomp restricts the syscall set.
Testing
Test Results (kernel 6.1.0-42-amd64)
# Minimal kernel — boots successfully
$ timeout 10 ./target/release/volt-vmm --kernel comparison/kernels/minimal-hello.elf -m 128M
INFO Applying Landlock filesystem sandbox
WARN Landlock sandbox partially enforced (kernel may not support all features)
INFO Dropping Linux capabilities
INFO All capabilities dropped successfully
INFO Applying seccomp-bpf filter (72 syscalls allowed)
INFO Seccomp filter active
Hello from minimal kernel!
OK
# Full Linux kernel — boots successfully
$ timeout 10 ./target/release/volt-vmm --kernel kernels/vmlinux -m 256M
INFO Applying Landlock filesystem sandbox
WARN Landlock sandbox partially enforced
INFO Dropping Linux capabilities
INFO All capabilities dropped successfully
INFO Applying seccomp-bpf filter (72 syscalls allowed)
[kernel boot messages, VFS panic due to no rootfs — expected]
# --no-landlock flag works
$ volt-vmm --kernel ... -m 128M --no-landlock
WARN Landlock disabled via --no-landlock
INFO Dropping Linux capabilities
INFO All capabilities dropped successfully
# --landlock-rule flag works
$ volt-vmm --kernel ... -m 128M --landlock-rule /tmp:rw
DEBUG Landlock: user rule rw access to /tmp
Dependencies Added
# vmm/Cargo.toml
landlock = "0.4" # Landlock LSM helpers (crates.io, MIT/Apache-2.0)
No other new dependencies — libc was already present for the prctl/capset calls.
Future Improvements
-
Network restrictions — Landlock ABI V4 (kernel 6.7+) supports TCP port filtering. Could restrict API socket to specific ports.
-
IPC scoping — Landlock ABI V6 (kernel 6.12+) can scope signals and abstract Unix sockets.
-
Root-mode bounding set — When running as root, the full bounding set can be dropped. Currently gracefully skips on EPERM.
-
seccomp + Landlock integration test — Verify that the seccomp allowlist includes all syscalls needed after Landlock is active (it does, since Landlock is applied first, but a regression test would be good).