KVM-based microVMM for the Volt platform: - Sub-second VM boot times - Minimal memory footprint - Landlock LSM + seccomp security - Virtio device support - Custom kernel management Copyright (c) Armored Gates LLC. All rights reserved. Licensed under AGPSL v5.0
5.3 KiB
Phase 3: Seccomp Allowlist Audit & Fix
Status: ✅ COMPLETE
Summary
The seccomp-bpf allowlist and Landlock configuration were audited for correctness.
The VM already booted successfully with security features enabled — the Phase 2
implementation included the necessary syscalls. Two additional syscalls (fallocate,
ftruncate) were added for production robustness.
Findings
Seccomp Filter
The Phase 2 seccomp allowlist (76 syscalls) already included all syscalls needed for virtio-blk I/O processing:
| Syscall | Purpose | Status at Phase 2 |
|---|---|---|
pread64 |
Positional read for block I/O | ✅ Already present |
pwrite64 |
Positional write for block I/O | ✅ Already present |
lseek |
File seeking for FileBackend | ✅ Already present |
fdatasync |
Data sync for flush operations | ✅ Already present |
fstat |
File metadata for disk size | ✅ Already present |
fsync |
Full sync for flush operations | ✅ Already present |
readv/writev |
Scatter-gather I/O | ✅ Already present |
madvise |
Memory advisory for guest mem | ✅ Already present |
mremap |
Memory remapping | ✅ Already present |
eventfd2 |
Event notification for virtio | ✅ Already present |
timerfd_create |
Timer fd creation | ✅ Already present |
timerfd_settime |
Timer configuration | ✅ Already present |
ppoll |
Polling for events | ✅ Already present |
epoll_ctl |
Epoll event management | ✅ Already present |
epoll_wait |
Epoll event waiting | ✅ Already present |
epoll_create1 |
Epoll instance creation | ✅ Already present |
Syscalls Added in Phase 3
Two additional syscalls were added for production robustness:
| Syscall | Purpose | Why Added |
|---|---|---|
fallocate |
Pre-allocate disk space | Needed for CoW disk backends, qcow2 expansion, and Stellarium CAS storage |
ftruncate |
Resize files | Needed for disk resize operations and FileBackend::create() |
Landlock Configuration
The Landlock filesystem sandbox was verified correct:
- Kernel image: Read-only access ✅
- Rootfs disk: Read-write access (including
Truncateflag) ✅ - Device nodes:
/dev/kvm,/dev/net/tun,/dev/vhost-netwithIoctlDev✅ /proc/self: Read-only access for fd management ✅- Stellarium volumes: Read-write access when
--volumeis used ✅ - API socket directory: Socket creation + removal access ✅
Landlock reports "partially enforced" on kernel 6.1 because the code targets ABI V5 (kernel 6.10+) and falls back gracefully. This is expected and correct.
Syscall Trace Analysis
Using strace -f on the secured VMM, the following 17 unique syscalls were
observed during steady-state operation (all in the allowlist):
close, epoll_ctl, epoll_wait, exit_group, fsync, futex, ioctl,
lseek, mprotect, munmap, read, recvfrom, rt_sigreturn,
sched_yield, sendto, sigaltstack, write
No SIGSYS signals were generated. No syscalls returned ENOSYS.
Test Results
With Security (Seccomp + Landlock)
$ ./target/release/volt-vmm \
--kernel comparison/firecracker/vmlinux.bin \
--rootfs comparison/rootfs.ext4 \
--memory 128M --cpus 1 --net-backend none
Seccomp filter active: 78 syscalls allowed, all others → KILL_PROCESS
Landlock sandbox partially enforced
VM READY - BOOT TEST PASSED
Without Security (baseline)
$ ./target/release/volt-vmm \
--kernel comparison/firecracker/vmlinux.bin \
--rootfs comparison/rootfs.ext4 \
--memory 128M --cpus 1 --net-backend none \
--no-seccomp --no-landlock
VM READY - BOOT TEST PASSED
Both modes produce identical boot results. Tested 3 consecutive runs — all passed.
Final Allowlist (78 syscalls)
File I/O (14)
read, write, openat, close, fstat, lseek, pread64, pwrite64,
readv, writev, fsync, fdatasync, fallocate★, ftruncate★
Memory (6)
mmap, mprotect, munmap, brk, madvise, mremap
KVM/Device (1)
ioctl
Threading (7)
clone, clone3, futex, set_robust_list, sched_yield, sched_getaffinity, rseq
Signals (4)
rt_sigaction, rt_sigprocmask, rt_sigreturn, sigaltstack
Networking (16)
accept4, bind, listen, socket, connect, recvfrom, sendto,
recvmsg, sendmsg, shutdown, getsockname, getpeername, setsockopt,
getsockopt, epoll_create1, epoll_ctl, epoll_wait, ppoll
Process (7)
exit, exit_group, getpid, gettid, prctl, arch_prctl, prlimit64, tgkill
Timers (3)
clock_gettime, nanosleep, clock_nanosleep
Misc (18)
getrandom, eventfd2, timerfd_create, timerfd_settime, pipe2,
dup, dup2, fcntl, statx, newfstatat, access, readlinkat,
getcwd, unlink, unlinkat, mkdir, mkdirat
★ = Added in Phase 3
Phase 2 Handoff Note
The Phase 2 handoff described the VM stalling with "Failed to enable 64-bit or 32-bit DMA" when security was enabled. This issue appears to have been resolved during Phase 2 development — the final committed code includes all necessary syscalls for virtio-blk I/O. The DMA warning message is a kernel-level log that appears in both secured and unsecured boots (it's a virtio-mmio driver message, not a Volt error) and does not prevent boot completion.