Volt VMM (Neutron Stardust): source-available under AGPSL v5.0

KVM-based microVMM for the Volt platform:
- Sub-second VM boot times
- Minimal memory footprint
- Landlock LSM + seccomp security
- Virtio device support
- Custom kernel management

Copyright (c) Armored Gates LLC. All rights reserved.
Licensed under AGPSL v5.0
This commit is contained in:
Karl Clinger
2026-03-21 01:04:35 -05:00
commit 40ed108dd5
143 changed files with 50300 additions and 0 deletions

View File

@@ -0,0 +1,192 @@
# Landlock & Capability Dropping Implementation
**Date:** 2026-03-08
**Status:** Implemented and tested
## Overview
Volt VMM now implements three security hardening layers applied after all
privileged setup is complete (KVM, TAP, sockets) but before the vCPU run loop:
1. **Landlock filesystem sandbox** (kernel 5.13+, optional, default-enabled)
2. **Linux capability dropping** (always)
3. **Seccomp-BPF syscall filtering** (always, was already implemented)
## Architecture
```text
┌─────────────────────────────────────────────────────────────┐
│ Layer 5: Seccomp-BPF (always unless --no-seccomp) │
│ 72 syscalls allowed, KILL_PROCESS on violation │
├─────────────────────────────────────────────────────────────┤
│ Layer 4: Landlock (optional, kernel 5.13+) │
│ Filesystem path restrictions │
├─────────────────────────────────────────────────────────────┤
│ Layer 3: Capability dropping (always) │
│ All ambient, bounding, and effective caps dropped │
├─────────────────────────────────────────────────────────────┤
│ Layer 2: PR_SET_NO_NEW_PRIVS (always) │
│ Prevents privilege escalation via execve │
├─────────────────────────────────────────────────────────────┤
│ Layer 1: KVM isolation (inherent) │
│ Hardware virtualization boundary │
└─────────────────────────────────────────────────────────────┘
```
## Files
| File | Purpose |
|------|---------|
| `vmm/src/security/mod.rs` | Module root, `apply_security()` entrypoint, shared types |
| `vmm/src/security/capabilities.rs` | `drop_capabilities()` — prctl + capset |
| `vmm/src/security/landlock.rs` | `apply_landlock()` — Landlock ruleset builder |
| `vmm/src/security/seccomp.rs` | `apply_seccomp_filter()` — seccomp-bpf (pre-existing) |
## Part 1: Capability Dropping
### Implementation (`capabilities.rs`)
The `drop_capabilities()` function performs four operations:
1. **`prctl(PR_SET_NO_NEW_PRIVS, 1)`** — prevents privilege escalation via execve.
Required by both Landlock and seccomp.
2. **`prctl(PR_CAP_AMBIENT, PR_CAP_AMBIENT_CLEAR_ALL)`** — clears all ambient
capabilities. Gracefully handles EINVAL on kernels without ambient cap support.
3. **`prctl(PR_CAPBSET_DROP, cap)`** — iterates over all capability numbers (063)
and drops each from the bounding set. Handles EPERM (expected when running
as non-root) and EINVAL (cap doesn't exist) gracefully.
4. **`capset()` syscall** — clears the permitted, effective, and inheritable
capability sets using the v3 capability API (two 32-bit words). Handles EPERM
for non-root processes.
### Error Handling
- Running as non-root: EPERM on `PR_CAPBSET_DROP` and `capset` is logged as
debug/warning but not treated as fatal, since the process is already unprivileged.
- All other errors are fatal.
## Part 2: Landlock Filesystem Sandboxing
### Implementation (`landlock.rs`)
Uses the `landlock` crate (v0.4.4) which provides a safe Rust API over the
Landlock syscalls with automatic ABI version negotiation.
### Allowed Paths
| Path | Access | Purpose |
|------|--------|---------|
| Kernel image | Read-only | Boot the VM |
| Initrd (if specified) | Read-only | Initial ramdisk |
| Disk images (--rootfs) | Read-write | VM storage |
| API socket directory | RW + MakeSock | Unix socket API |
| `/dev/kvm` | RW + IoctlDev | KVM device |
| `/dev/net/tun` | RW + IoctlDev | TAP networking |
| `/dev/vhost-net` | RW + IoctlDev | vhost-net (if present) |
| `/proc/self` | Read-only | Process info, fd access |
| Extra `--landlock-rule` paths | User-specified | Hotplug, custom |
### ABI Compatibility
- **Target ABI:** V5 (kernel 6.10+, includes `IoctlDev`)
- **Minimum:** V1 (kernel 5.13+)
- **Mode:** Best-effort — the crate automatically strips unsupported features
- **Unavailable:** Logs a warning and continues without filesystem sandboxing
On kernel 6.1 (like our test system), the sandbox is "partially enforced" because
some V5 features (like `IoctlDev` from ABI V5) are unavailable. Core filesystem
restrictions are still active.
### CLI Flags
```bash
# Disable Landlock entirely
volt-vmm --kernel vmlinux -m 256M --no-landlock
# Add extra paths for hotplug or shared data
volt-vmm --kernel vmlinux -m 256M \
--landlock-rule /tmp/hotplug:rw \
--landlock-rule /data/shared:ro
```
Rule format: `path:access` where access is:
- `ro`, `r`, `read` — read-only
- `rw`, `w`, `write`, `readwrite` — full access
### Application Order
The security layers are applied in this order in `main.rs`:
```
1. All initialization complete (KVM, memory, kernel, devices, API socket)
2. Landlock applied (needs landlock syscalls, sets PR_SET_NO_NEW_PRIVS)
3. Capabilities dropped (needs prctl, capset)
4. Seccomp applied (locks down syscalls, uses TSYNC for all threads)
5. vCPU run loop starts
```
This ordering is critical: Landlock and capability syscalls must be available
before seccomp restricts the syscall set.
## Testing
### Test Results (kernel 6.1.0-42-amd64)
```
# Minimal kernel — boots successfully
$ timeout 10 ./target/release/volt-vmm --kernel comparison/kernels/minimal-hello.elf -m 128M
INFO Applying Landlock filesystem sandbox
WARN Landlock sandbox partially enforced (kernel may not support all features)
INFO Dropping Linux capabilities
INFO All capabilities dropped successfully
INFO Applying seccomp-bpf filter (72 syscalls allowed)
INFO Seccomp filter active
Hello from minimal kernel!
OK
# Full Linux kernel — boots successfully
$ timeout 10 ./target/release/volt-vmm --kernel kernels/vmlinux -m 256M
INFO Applying Landlock filesystem sandbox
WARN Landlock sandbox partially enforced
INFO Dropping Linux capabilities
INFO All capabilities dropped successfully
INFO Applying seccomp-bpf filter (72 syscalls allowed)
[kernel boot messages, VFS panic due to no rootfs — expected]
# --no-landlock flag works
$ volt-vmm --kernel ... -m 128M --no-landlock
WARN Landlock disabled via --no-landlock
INFO Dropping Linux capabilities
INFO All capabilities dropped successfully
# --landlock-rule flag works
$ volt-vmm --kernel ... -m 128M --landlock-rule /tmp:rw
DEBUG Landlock: user rule rw access to /tmp
```
## Dependencies Added
```toml
# vmm/Cargo.toml
landlock = "0.4" # Landlock LSM helpers (crates.io, MIT/Apache-2.0)
```
No other new dependencies — `libc` was already present for the prctl/capset calls.
## Future Improvements
1. **Network restrictions** — Landlock ABI V4 (kernel 6.7+) supports TCP port
filtering. Could restrict API socket to specific ports.
2. **IPC scoping** — Landlock ABI V6 (kernel 6.12+) can scope signals and
abstract Unix sockets.
3. **Root-mode bounding set** — When running as root, the full bounding set
can be dropped. Currently gracefully skips on EPERM.
4. **seccomp + Landlock integration test** — Verify that the seccomp allowlist
includes all syscalls needed after Landlock is active (it does, since Landlock
is applied first, but a regression test would be good).