# Phase 3: Seccomp Allowlist Audit & Fix ## Status: ✅ COMPLETE ## Summary The seccomp-bpf allowlist and Landlock configuration were audited for correctness. **The VM already booted successfully with security features enabled** — the Phase 2 implementation included the necessary syscalls. Two additional syscalls (`fallocate`, `ftruncate`) were added for production robustness. ## Findings ### Seccomp Filter The Phase 2 seccomp allowlist (76 syscalls) already included all syscalls needed for virtio-blk I/O processing: | Syscall | Purpose | Status at Phase 2 | |---------|---------|-------------------| | `pread64` | Positional read for block I/O | ✅ Already present | | `pwrite64` | Positional write for block I/O | ✅ Already present | | `lseek` | File seeking for FileBackend | ✅ Already present | | `fdatasync` | Data sync for flush operations | ✅ Already present | | `fstat` | File metadata for disk size | ✅ Already present | | `fsync` | Full sync for flush operations | ✅ Already present | | `readv`/`writev` | Scatter-gather I/O | ✅ Already present | | `madvise` | Memory advisory for guest mem | ✅ Already present | | `mremap` | Memory remapping | ✅ Already present | | `eventfd2` | Event notification for virtio | ✅ Already present | | `timerfd_create` | Timer fd creation | ✅ Already present | | `timerfd_settime` | Timer configuration | ✅ Already present | | `ppoll` | Polling for events | ✅ Already present | | `epoll_ctl` | Epoll event management | ✅ Already present | | `epoll_wait` | Epoll event waiting | ✅ Already present | | `epoll_create1` | Epoll instance creation | ✅ Already present | ### Syscalls Added in Phase 3 Two additional syscalls were added for production robustness: | Syscall | Purpose | Why Added | |---------|---------|-----------| | `fallocate` | Pre-allocate disk space | Needed for CoW disk backends, qcow2 expansion, and Stellarium CAS storage | | `ftruncate` | Resize files | Needed for disk resize operations and FileBackend::create() | ### Landlock Configuration The Landlock filesystem sandbox was verified correct: - **Kernel image**: Read-only access ✅ - **Rootfs disk**: Read-write access (including `Truncate` flag) ✅ - **Device nodes**: `/dev/kvm`, `/dev/net/tun`, `/dev/vhost-net` with `IoctlDev` ✅ - **`/proc/self`**: Read-only access for fd management ✅ - **Stellarium volumes**: Read-write access when `--volume` is used ✅ - **API socket directory**: Socket creation + removal access ✅ Landlock reports "partially enforced" on kernel 6.1 because the code targets ABI V5 (kernel 6.10+) and falls back gracefully. This is expected and correct. ### Syscall Trace Analysis Using `strace -f` on the secured VMM, the following 17 unique syscalls were observed during steady-state operation (all in the allowlist): ``` close, epoll_ctl, epoll_wait, exit_group, fsync, futex, ioctl, lseek, mprotect, munmap, read, recvfrom, rt_sigreturn, sched_yield, sendto, sigaltstack, write ``` No `SIGSYS` signals were generated. No syscalls returned `ENOSYS`. ## Test Results ### With Security (Seccomp + Landlock) ``` $ ./target/release/volt-vmm \ --kernel comparison/firecracker/vmlinux.bin \ --rootfs comparison/rootfs.ext4 \ --memory 128M --cpus 1 --net-backend none Seccomp filter active: 78 syscalls allowed, all others → KILL_PROCESS Landlock sandbox partially enforced VM READY - BOOT TEST PASSED ``` ### Without Security (baseline) ``` $ ./target/release/volt-vmm \ --kernel comparison/firecracker/vmlinux.bin \ --rootfs comparison/rootfs.ext4 \ --memory 128M --cpus 1 --net-backend none \ --no-seccomp --no-landlock VM READY - BOOT TEST PASSED ``` Both modes produce identical boot results. Tested 3 consecutive runs — all passed. ## Final Allowlist (78 syscalls) ### File I/O (14) `read`, `write`, `openat`, `close`, `fstat`, `lseek`, `pread64`, `pwrite64`, `readv`, `writev`, `fsync`, `fdatasync`, `fallocate`★, `ftruncate`★ ### Memory (6) `mmap`, `mprotect`, `munmap`, `brk`, `madvise`, `mremap` ### KVM/Device (1) `ioctl` ### Threading (7) `clone`, `clone3`, `futex`, `set_robust_list`, `sched_yield`, `sched_getaffinity`, `rseq` ### Signals (4) `rt_sigaction`, `rt_sigprocmask`, `rt_sigreturn`, `sigaltstack` ### Networking (16) `accept4`, `bind`, `listen`, `socket`, `connect`, `recvfrom`, `sendto`, `recvmsg`, `sendmsg`, `shutdown`, `getsockname`, `getpeername`, `setsockopt`, `getsockopt`, `epoll_create1`, `epoll_ctl`, `epoll_wait`, `ppoll` ### Process (7) `exit`, `exit_group`, `getpid`, `gettid`, `prctl`, `arch_prctl`, `prlimit64`, `tgkill` ### Timers (3) `clock_gettime`, `nanosleep`, `clock_nanosleep` ### Misc (18) `getrandom`, `eventfd2`, `timerfd_create`, `timerfd_settime`, `pipe2`, `dup`, `dup2`, `fcntl`, `statx`, `newfstatat`, `access`, `readlinkat`, `getcwd`, `unlink`, `unlinkat`, `mkdir`, `mkdirat` ★ = Added in Phase 3 ## Phase 2 Handoff Note The Phase 2 handoff described the VM stalling with "Failed to enable 64-bit or 32-bit DMA" when security was enabled. This issue appears to have been resolved during Phase 2 development — the final committed code includes all necessary syscalls for virtio-blk I/O. The DMA warning message is a kernel-level log that appears in both secured and unsecured boots (it's a virtio-mmio driver message, not a Volt error) and does not prevent boot completion.