# Landlock & Capability Dropping Implementation **Date:** 2026-03-08 **Status:** Implemented and tested ## Overview Volt VMM now implements three security hardening layers applied after all privileged setup is complete (KVM, TAP, sockets) but before the vCPU run loop: 1. **Landlock filesystem sandbox** (kernel 5.13+, optional, default-enabled) 2. **Linux capability dropping** (always) 3. **Seccomp-BPF syscall filtering** (always, was already implemented) ## Architecture ```text ┌─────────────────────────────────────────────────────────────┐ │ Layer 5: Seccomp-BPF (always unless --no-seccomp) │ │ 72 syscalls allowed, KILL_PROCESS on violation │ ├─────────────────────────────────────────────────────────────┤ │ Layer 4: Landlock (optional, kernel 5.13+) │ │ Filesystem path restrictions │ ├─────────────────────────────────────────────────────────────┤ │ Layer 3: Capability dropping (always) │ │ All ambient, bounding, and effective caps dropped │ ├─────────────────────────────────────────────────────────────┤ │ Layer 2: PR_SET_NO_NEW_PRIVS (always) │ │ Prevents privilege escalation via execve │ ├─────────────────────────────────────────────────────────────┤ │ Layer 1: KVM isolation (inherent) │ │ Hardware virtualization boundary │ └─────────────────────────────────────────────────────────────┘ ``` ## Files | File | Purpose | |------|---------| | `vmm/src/security/mod.rs` | Module root, `apply_security()` entrypoint, shared types | | `vmm/src/security/capabilities.rs` | `drop_capabilities()` — prctl + capset | | `vmm/src/security/landlock.rs` | `apply_landlock()` — Landlock ruleset builder | | `vmm/src/security/seccomp.rs` | `apply_seccomp_filter()` — seccomp-bpf (pre-existing) | ## Part 1: Capability Dropping ### Implementation (`capabilities.rs`) The `drop_capabilities()` function performs four operations: 1. **`prctl(PR_SET_NO_NEW_PRIVS, 1)`** — prevents privilege escalation via execve. Required by both Landlock and seccomp. 2. **`prctl(PR_CAP_AMBIENT, PR_CAP_AMBIENT_CLEAR_ALL)`** — clears all ambient capabilities. Gracefully handles EINVAL on kernels without ambient cap support. 3. **`prctl(PR_CAPBSET_DROP, cap)`** — iterates over all capability numbers (0–63) and drops each from the bounding set. Handles EPERM (expected when running as non-root) and EINVAL (cap doesn't exist) gracefully. 4. **`capset()` syscall** — clears the permitted, effective, and inheritable capability sets using the v3 capability API (two 32-bit words). Handles EPERM for non-root processes. ### Error Handling - Running as non-root: EPERM on `PR_CAPBSET_DROP` and `capset` is logged as debug/warning but not treated as fatal, since the process is already unprivileged. - All other errors are fatal. ## Part 2: Landlock Filesystem Sandboxing ### Implementation (`landlock.rs`) Uses the `landlock` crate (v0.4.4) which provides a safe Rust API over the Landlock syscalls with automatic ABI version negotiation. ### Allowed Paths | Path | Access | Purpose | |------|--------|---------| | Kernel image | Read-only | Boot the VM | | Initrd (if specified) | Read-only | Initial ramdisk | | Disk images (--rootfs) | Read-write | VM storage | | API socket directory | RW + MakeSock | Unix socket API | | `/dev/kvm` | RW + IoctlDev | KVM device | | `/dev/net/tun` | RW + IoctlDev | TAP networking | | `/dev/vhost-net` | RW + IoctlDev | vhost-net (if present) | | `/proc/self` | Read-only | Process info, fd access | | Extra `--landlock-rule` paths | User-specified | Hotplug, custom | ### ABI Compatibility - **Target ABI:** V5 (kernel 6.10+, includes `IoctlDev`) - **Minimum:** V1 (kernel 5.13+) - **Mode:** Best-effort — the crate automatically strips unsupported features - **Unavailable:** Logs a warning and continues without filesystem sandboxing On kernel 6.1 (like our test system), the sandbox is "partially enforced" because some V5 features (like `IoctlDev` from ABI V5) are unavailable. Core filesystem restrictions are still active. ### CLI Flags ```bash # Disable Landlock entirely volt-vmm --kernel vmlinux -m 256M --no-landlock # Add extra paths for hotplug or shared data volt-vmm --kernel vmlinux -m 256M \ --landlock-rule /tmp/hotplug:rw \ --landlock-rule /data/shared:ro ``` Rule format: `path:access` where access is: - `ro`, `r`, `read` — read-only - `rw`, `w`, `write`, `readwrite` — full access ### Application Order The security layers are applied in this order in `main.rs`: ``` 1. All initialization complete (KVM, memory, kernel, devices, API socket) 2. Landlock applied (needs landlock syscalls, sets PR_SET_NO_NEW_PRIVS) 3. Capabilities dropped (needs prctl, capset) 4. Seccomp applied (locks down syscalls, uses TSYNC for all threads) 5. vCPU run loop starts ``` This ordering is critical: Landlock and capability syscalls must be available before seccomp restricts the syscall set. ## Testing ### Test Results (kernel 6.1.0-42-amd64) ``` # Minimal kernel — boots successfully $ timeout 10 ./target/release/volt-vmm --kernel comparison/kernels/minimal-hello.elf -m 128M INFO Applying Landlock filesystem sandbox WARN Landlock sandbox partially enforced (kernel may not support all features) INFO Dropping Linux capabilities INFO All capabilities dropped successfully INFO Applying seccomp-bpf filter (72 syscalls allowed) INFO Seccomp filter active Hello from minimal kernel! OK # Full Linux kernel — boots successfully $ timeout 10 ./target/release/volt-vmm --kernel kernels/vmlinux -m 256M INFO Applying Landlock filesystem sandbox WARN Landlock sandbox partially enforced INFO Dropping Linux capabilities INFO All capabilities dropped successfully INFO Applying seccomp-bpf filter (72 syscalls allowed) [kernel boot messages, VFS panic due to no rootfs — expected] # --no-landlock flag works $ volt-vmm --kernel ... -m 128M --no-landlock WARN Landlock disabled via --no-landlock INFO Dropping Linux capabilities INFO All capabilities dropped successfully # --landlock-rule flag works $ volt-vmm --kernel ... -m 128M --landlock-rule /tmp:rw DEBUG Landlock: user rule rw access to /tmp ``` ## Dependencies Added ```toml # vmm/Cargo.toml landlock = "0.4" # Landlock LSM helpers (crates.io, MIT/Apache-2.0) ``` No other new dependencies — `libc` was already present for the prctl/capset calls. ## Future Improvements 1. **Network restrictions** — Landlock ABI V4 (kernel 6.7+) supports TCP port filtering. Could restrict API socket to specific ports. 2. **IPC scoping** — Landlock ABI V6 (kernel 6.12+) can scope signals and abstract Unix sockets. 3. **Root-mode bounding set** — When running as root, the full bounding set can be dropped. Currently gracefully skips on EPERM. 4. **seccomp + Landlock integration test** — Verify that the seccomp allowlist includes all syscalls needed after Landlock is active (it does, since Landlock is applied first, but a regression test would be good).