KVM-based microVMM for the Volt platform: - Sub-second VM boot times - Minimal memory footprint - Landlock LSM + seccomp security - Virtio device support - Custom kernel management Copyright (c) Armored Gates LLC. All rights reserved. Licensed under AGPSL v5.0
145 lines
5.3 KiB
Markdown
145 lines
5.3 KiB
Markdown
# Phase 3: Seccomp Allowlist Audit & Fix
|
|
|
|
## Status: ✅ COMPLETE
|
|
|
|
## Summary
|
|
|
|
The seccomp-bpf allowlist and Landlock configuration were audited for correctness.
|
|
**The VM already booted successfully with security features enabled** — the Phase 2
|
|
implementation included the necessary syscalls. Two additional syscalls (`fallocate`,
|
|
`ftruncate`) were added for production robustness.
|
|
|
|
## Findings
|
|
|
|
### Seccomp Filter
|
|
|
|
The Phase 2 seccomp allowlist (76 syscalls) already included all syscalls needed
|
|
for virtio-blk I/O processing:
|
|
|
|
| Syscall | Purpose | Status at Phase 2 |
|
|
|---------|---------|-------------------|
|
|
| `pread64` | Positional read for block I/O | ✅ Already present |
|
|
| `pwrite64` | Positional write for block I/O | ✅ Already present |
|
|
| `lseek` | File seeking for FileBackend | ✅ Already present |
|
|
| `fdatasync` | Data sync for flush operations | ✅ Already present |
|
|
| `fstat` | File metadata for disk size | ✅ Already present |
|
|
| `fsync` | Full sync for flush operations | ✅ Already present |
|
|
| `readv`/`writev` | Scatter-gather I/O | ✅ Already present |
|
|
| `madvise` | Memory advisory for guest mem | ✅ Already present |
|
|
| `mremap` | Memory remapping | ✅ Already present |
|
|
| `eventfd2` | Event notification for virtio | ✅ Already present |
|
|
| `timerfd_create` | Timer fd creation | ✅ Already present |
|
|
| `timerfd_settime` | Timer configuration | ✅ Already present |
|
|
| `ppoll` | Polling for events | ✅ Already present |
|
|
| `epoll_ctl` | Epoll event management | ✅ Already present |
|
|
| `epoll_wait` | Epoll event waiting | ✅ Already present |
|
|
| `epoll_create1` | Epoll instance creation | ✅ Already present |
|
|
|
|
### Syscalls Added in Phase 3
|
|
|
|
Two additional syscalls were added for production robustness:
|
|
|
|
| Syscall | Purpose | Why Added |
|
|
|---------|---------|-----------|
|
|
| `fallocate` | Pre-allocate disk space | Needed for CoW disk backends, qcow2 expansion, and Stellarium CAS storage |
|
|
| `ftruncate` | Resize files | Needed for disk resize operations and FileBackend::create() |
|
|
|
|
### Landlock Configuration
|
|
|
|
The Landlock filesystem sandbox was verified correct:
|
|
|
|
- **Kernel image**: Read-only access ✅
|
|
- **Rootfs disk**: Read-write access (including `Truncate` flag) ✅
|
|
- **Device nodes**: `/dev/kvm`, `/dev/net/tun`, `/dev/vhost-net` with `IoctlDev` ✅
|
|
- **`/proc/self`**: Read-only access for fd management ✅
|
|
- **Stellarium volumes**: Read-write access when `--volume` is used ✅
|
|
- **API socket directory**: Socket creation + removal access ✅
|
|
|
|
Landlock reports "partially enforced" on kernel 6.1 because the code targets
|
|
ABI V5 (kernel 6.10+) and falls back gracefully. This is expected and correct.
|
|
|
|
### Syscall Trace Analysis
|
|
|
|
Using `strace -f` on the secured VMM, the following 17 unique syscalls were
|
|
observed during steady-state operation (all in the allowlist):
|
|
|
|
```
|
|
close, epoll_ctl, epoll_wait, exit_group, fsync, futex, ioctl,
|
|
lseek, mprotect, munmap, read, recvfrom, rt_sigreturn,
|
|
sched_yield, sendto, sigaltstack, write
|
|
```
|
|
|
|
No `SIGSYS` signals were generated. No syscalls returned `ENOSYS`.
|
|
|
|
## Test Results
|
|
|
|
### With Security (Seccomp + Landlock)
|
|
```
|
|
$ ./target/release/volt-vmm \
|
|
--kernel comparison/firecracker/vmlinux.bin \
|
|
--rootfs comparison/rootfs.ext4 \
|
|
--memory 128M --cpus 1 --net-backend none
|
|
|
|
Seccomp filter active: 78 syscalls allowed, all others → KILL_PROCESS
|
|
Landlock sandbox partially enforced
|
|
VM READY - BOOT TEST PASSED
|
|
```
|
|
|
|
### Without Security (baseline)
|
|
```
|
|
$ ./target/release/volt-vmm \
|
|
--kernel comparison/firecracker/vmlinux.bin \
|
|
--rootfs comparison/rootfs.ext4 \
|
|
--memory 128M --cpus 1 --net-backend none \
|
|
--no-seccomp --no-landlock
|
|
|
|
VM READY - BOOT TEST PASSED
|
|
```
|
|
|
|
Both modes produce identical boot results. Tested 3 consecutive runs — all passed.
|
|
|
|
## Final Allowlist (78 syscalls)
|
|
|
|
### File I/O (14)
|
|
`read`, `write`, `openat`, `close`, `fstat`, `lseek`, `pread64`, `pwrite64`,
|
|
`readv`, `writev`, `fsync`, `fdatasync`, `fallocate`★, `ftruncate`★
|
|
|
|
### Memory (6)
|
|
`mmap`, `mprotect`, `munmap`, `brk`, `madvise`, `mremap`
|
|
|
|
### KVM/Device (1)
|
|
`ioctl`
|
|
|
|
### Threading (7)
|
|
`clone`, `clone3`, `futex`, `set_robust_list`, `sched_yield`, `sched_getaffinity`, `rseq`
|
|
|
|
### Signals (4)
|
|
`rt_sigaction`, `rt_sigprocmask`, `rt_sigreturn`, `sigaltstack`
|
|
|
|
### Networking (16)
|
|
`accept4`, `bind`, `listen`, `socket`, `connect`, `recvfrom`, `sendto`,
|
|
`recvmsg`, `sendmsg`, `shutdown`, `getsockname`, `getpeername`, `setsockopt`,
|
|
`getsockopt`, `epoll_create1`, `epoll_ctl`, `epoll_wait`, `ppoll`
|
|
|
|
### Process (7)
|
|
`exit`, `exit_group`, `getpid`, `gettid`, `prctl`, `arch_prctl`, `prlimit64`, `tgkill`
|
|
|
|
### Timers (3)
|
|
`clock_gettime`, `nanosleep`, `clock_nanosleep`
|
|
|
|
### Misc (18)
|
|
`getrandom`, `eventfd2`, `timerfd_create`, `timerfd_settime`, `pipe2`,
|
|
`dup`, `dup2`, `fcntl`, `statx`, `newfstatat`, `access`, `readlinkat`,
|
|
`getcwd`, `unlink`, `unlinkat`, `mkdir`, `mkdirat`
|
|
|
|
★ = Added in Phase 3
|
|
|
|
## Phase 2 Handoff Note
|
|
|
|
The Phase 2 handoff described the VM stalling with "Failed to enable 64-bit or
|
|
32-bit DMA" when security was enabled. This issue appears to have been resolved
|
|
during Phase 2 development — the final committed code includes all necessary
|
|
syscalls for virtio-blk I/O. The DMA warning message is a kernel-level log that
|
|
appears in both secured and unsecured boots (it's a virtio-mmio driver message,
|
|
not a Volt error) and does not prevent boot completion.
|