Volt VMM (Neutron Stardust): source-available under AGPSL v5.0

KVM-based microVMM for the Volt platform:
- Sub-second VM boot times
- Minimal memory footprint
- Landlock LSM + seccomp security
- Virtio device support
- Custom kernel management

Copyright (c) Armored Gates LLC. All rights reserved.
Licensed under AGPSL v5.0
This commit is contained in:
Karl Clinger
2026-03-21 01:04:35 -05:00
commit 40ed108dd5
143 changed files with 50300 additions and 0 deletions

144
docs/phase3-seccomp-fix.md Normal file
View File

@@ -0,0 +1,144 @@
# Phase 3: Seccomp Allowlist Audit & Fix
## Status: ✅ COMPLETE
## Summary
The seccomp-bpf allowlist and Landlock configuration were audited for correctness.
**The VM already booted successfully with security features enabled** — the Phase 2
implementation included the necessary syscalls. Two additional syscalls (`fallocate`,
`ftruncate`) were added for production robustness.
## Findings
### Seccomp Filter
The Phase 2 seccomp allowlist (76 syscalls) already included all syscalls needed
for virtio-blk I/O processing:
| Syscall | Purpose | Status at Phase 2 |
|---------|---------|-------------------|
| `pread64` | Positional read for block I/O | ✅ Already present |
| `pwrite64` | Positional write for block I/O | ✅ Already present |
| `lseek` | File seeking for FileBackend | ✅ Already present |
| `fdatasync` | Data sync for flush operations | ✅ Already present |
| `fstat` | File metadata for disk size | ✅ Already present |
| `fsync` | Full sync for flush operations | ✅ Already present |
| `readv`/`writev` | Scatter-gather I/O | ✅ Already present |
| `madvise` | Memory advisory for guest mem | ✅ Already present |
| `mremap` | Memory remapping | ✅ Already present |
| `eventfd2` | Event notification for virtio | ✅ Already present |
| `timerfd_create` | Timer fd creation | ✅ Already present |
| `timerfd_settime` | Timer configuration | ✅ Already present |
| `ppoll` | Polling for events | ✅ Already present |
| `epoll_ctl` | Epoll event management | ✅ Already present |
| `epoll_wait` | Epoll event waiting | ✅ Already present |
| `epoll_create1` | Epoll instance creation | ✅ Already present |
### Syscalls Added in Phase 3
Two additional syscalls were added for production robustness:
| Syscall | Purpose | Why Added |
|---------|---------|-----------|
| `fallocate` | Pre-allocate disk space | Needed for CoW disk backends, qcow2 expansion, and Stellarium CAS storage |
| `ftruncate` | Resize files | Needed for disk resize operations and FileBackend::create() |
### Landlock Configuration
The Landlock filesystem sandbox was verified correct:
- **Kernel image**: Read-only access ✅
- **Rootfs disk**: Read-write access (including `Truncate` flag) ✅
- **Device nodes**: `/dev/kvm`, `/dev/net/tun`, `/dev/vhost-net` with `IoctlDev`
- **`/proc/self`**: Read-only access for fd management ✅
- **Stellarium volumes**: Read-write access when `--volume` is used ✅
- **API socket directory**: Socket creation + removal access ✅
Landlock reports "partially enforced" on kernel 6.1 because the code targets
ABI V5 (kernel 6.10+) and falls back gracefully. This is expected and correct.
### Syscall Trace Analysis
Using `strace -f` on the secured VMM, the following 17 unique syscalls were
observed during steady-state operation (all in the allowlist):
```
close, epoll_ctl, epoll_wait, exit_group, fsync, futex, ioctl,
lseek, mprotect, munmap, read, recvfrom, rt_sigreturn,
sched_yield, sendto, sigaltstack, write
```
No `SIGSYS` signals were generated. No syscalls returned `ENOSYS`.
## Test Results
### With Security (Seccomp + Landlock)
```
$ ./target/release/volt-vmm \
--kernel comparison/firecracker/vmlinux.bin \
--rootfs comparison/rootfs.ext4 \
--memory 128M --cpus 1 --net-backend none
Seccomp filter active: 78 syscalls allowed, all others → KILL_PROCESS
Landlock sandbox partially enforced
VM READY - BOOT TEST PASSED
```
### Without Security (baseline)
```
$ ./target/release/volt-vmm \
--kernel comparison/firecracker/vmlinux.bin \
--rootfs comparison/rootfs.ext4 \
--memory 128M --cpus 1 --net-backend none \
--no-seccomp --no-landlock
VM READY - BOOT TEST PASSED
```
Both modes produce identical boot results. Tested 3 consecutive runs — all passed.
## Final Allowlist (78 syscalls)
### File I/O (14)
`read`, `write`, `openat`, `close`, `fstat`, `lseek`, `pread64`, `pwrite64`,
`readv`, `writev`, `fsync`, `fdatasync`, `fallocate`★, `ftruncate`
### Memory (6)
`mmap`, `mprotect`, `munmap`, `brk`, `madvise`, `mremap`
### KVM/Device (1)
`ioctl`
### Threading (7)
`clone`, `clone3`, `futex`, `set_robust_list`, `sched_yield`, `sched_getaffinity`, `rseq`
### Signals (4)
`rt_sigaction`, `rt_sigprocmask`, `rt_sigreturn`, `sigaltstack`
### Networking (16)
`accept4`, `bind`, `listen`, `socket`, `connect`, `recvfrom`, `sendto`,
`recvmsg`, `sendmsg`, `shutdown`, `getsockname`, `getpeername`, `setsockopt`,
`getsockopt`, `epoll_create1`, `epoll_ctl`, `epoll_wait`, `ppoll`
### Process (7)
`exit`, `exit_group`, `getpid`, `gettid`, `prctl`, `arch_prctl`, `prlimit64`, `tgkill`
### Timers (3)
`clock_gettime`, `nanosleep`, `clock_nanosleep`
### Misc (18)
`getrandom`, `eventfd2`, `timerfd_create`, `timerfd_settime`, `pipe2`,
`dup`, `dup2`, `fcntl`, `statx`, `newfstatat`, `access`, `readlinkat`,
`getcwd`, `unlink`, `unlinkat`, `mkdir`, `mkdirat`
★ = Added in Phase 3
## Phase 2 Handoff Note
The Phase 2 handoff described the VM stalling with "Failed to enable 64-bit or
32-bit DMA" when security was enabled. This issue appears to have been resolved
during Phase 2 development — the final committed code includes all necessary
syscalls for virtio-blk I/O. The DMA warning message is a kernel-level log that
appears in both secured and unsecured boots (it's a virtio-mmio driver message,
not a Volt error) and does not prevent boot completion.