Files
volt-vmm/docs/phase3-seccomp-fix.md
Karl Clinger 40ed108dd5 Volt VMM (Neutron Stardust): source-available under AGPSL v5.0
KVM-based microVMM for the Volt platform:
- Sub-second VM boot times
- Minimal memory footprint
- Landlock LSM + seccomp security
- Virtio device support
- Custom kernel management

Copyright (c) Armored Gates LLC. All rights reserved.
Licensed under AGPSL v5.0
2026-03-21 01:04:35 -05:00

5.3 KiB

Phase 3: Seccomp Allowlist Audit & Fix

Status: COMPLETE

Summary

The seccomp-bpf allowlist and Landlock configuration were audited for correctness. The VM already booted successfully with security features enabled — the Phase 2 implementation included the necessary syscalls. Two additional syscalls (fallocate, ftruncate) were added for production robustness.

Findings

Seccomp Filter

The Phase 2 seccomp allowlist (76 syscalls) already included all syscalls needed for virtio-blk I/O processing:

Syscall Purpose Status at Phase 2
pread64 Positional read for block I/O Already present
pwrite64 Positional write for block I/O Already present
lseek File seeking for FileBackend Already present
fdatasync Data sync for flush operations Already present
fstat File metadata for disk size Already present
fsync Full sync for flush operations Already present
readv/writev Scatter-gather I/O Already present
madvise Memory advisory for guest mem Already present
mremap Memory remapping Already present
eventfd2 Event notification for virtio Already present
timerfd_create Timer fd creation Already present
timerfd_settime Timer configuration Already present
ppoll Polling for events Already present
epoll_ctl Epoll event management Already present
epoll_wait Epoll event waiting Already present
epoll_create1 Epoll instance creation Already present

Syscalls Added in Phase 3

Two additional syscalls were added for production robustness:

Syscall Purpose Why Added
fallocate Pre-allocate disk space Needed for CoW disk backends, qcow2 expansion, and Stellarium CAS storage
ftruncate Resize files Needed for disk resize operations and FileBackend::create()

Landlock Configuration

The Landlock filesystem sandbox was verified correct:

  • Kernel image: Read-only access
  • Rootfs disk: Read-write access (including Truncate flag)
  • Device nodes: /dev/kvm, /dev/net/tun, /dev/vhost-net with IoctlDev
  • /proc/self: Read-only access for fd management
  • Stellarium volumes: Read-write access when --volume is used
  • API socket directory: Socket creation + removal access

Landlock reports "partially enforced" on kernel 6.1 because the code targets ABI V5 (kernel 6.10+) and falls back gracefully. This is expected and correct.

Syscall Trace Analysis

Using strace -f on the secured VMM, the following 17 unique syscalls were observed during steady-state operation (all in the allowlist):

close, epoll_ctl, epoll_wait, exit_group, fsync, futex, ioctl,
lseek, mprotect, munmap, read, recvfrom, rt_sigreturn,
sched_yield, sendto, sigaltstack, write

No SIGSYS signals were generated. No syscalls returned ENOSYS.

Test Results

With Security (Seccomp + Landlock)

$ ./target/release/volt-vmm \
    --kernel comparison/firecracker/vmlinux.bin \
    --rootfs comparison/rootfs.ext4 \
    --memory 128M --cpus 1 --net-backend none

Seccomp filter active: 78 syscalls allowed, all others → KILL_PROCESS
Landlock sandbox partially enforced
VM READY - BOOT TEST PASSED

Without Security (baseline)

$ ./target/release/volt-vmm \
    --kernel comparison/firecracker/vmlinux.bin \
    --rootfs comparison/rootfs.ext4 \
    --memory 128M --cpus 1 --net-backend none \
    --no-seccomp --no-landlock

VM READY - BOOT TEST PASSED

Both modes produce identical boot results. Tested 3 consecutive runs — all passed.

Final Allowlist (78 syscalls)

File I/O (14)

read, write, openat, close, fstat, lseek, pread64, pwrite64, readv, writev, fsync, fdatasync, fallocate★, ftruncate

Memory (6)

mmap, mprotect, munmap, brk, madvise, mremap

KVM/Device (1)

ioctl

Threading (7)

clone, clone3, futex, set_robust_list, sched_yield, sched_getaffinity, rseq

Signals (4)

rt_sigaction, rt_sigprocmask, rt_sigreturn, sigaltstack

Networking (16)

accept4, bind, listen, socket, connect, recvfrom, sendto, recvmsg, sendmsg, shutdown, getsockname, getpeername, setsockopt, getsockopt, epoll_create1, epoll_ctl, epoll_wait, ppoll

Process (7)

exit, exit_group, getpid, gettid, prctl, arch_prctl, prlimit64, tgkill

Timers (3)

clock_gettime, nanosleep, clock_nanosleep

Misc (18)

getrandom, eventfd2, timerfd_create, timerfd_settime, pipe2, dup, dup2, fcntl, statx, newfstatat, access, readlinkat, getcwd, unlink, unlinkat, mkdir, mkdirat

★ = Added in Phase 3

Phase 2 Handoff Note

The Phase 2 handoff described the VM stalling with "Failed to enable 64-bit or 32-bit DMA" when security was enabled. This issue appears to have been resolved during Phase 2 development — the final committed code includes all necessary syscalls for virtio-blk I/O. The DMA warning message is a kernel-level log that appears in both secured and unsecured boots (it's a virtio-mmio driver message, not a Volt error) and does not prevent boot completion.