Volt VMM (Neutron Stardust): source-available under AGPSL v5.0

KVM-based microVMM for the Volt platform:
- Sub-second VM boot times
- Minimal memory footprint
- Landlock LSM + seccomp security
- Virtio device support
- Custom kernel management

Copyright (c) Armored Gates LLC. All rights reserved.
Licensed under AGPSL v5.0
This commit is contained in:
Karl Clinger
2026-03-21 01:04:35 -05:00
commit 40ed108dd5
143 changed files with 50300 additions and 0 deletions

378
docs/landlock-analysis.md Normal file
View File

@@ -0,0 +1,378 @@
# Landlock LSM Analysis for Volt
**Date:** 2026-03-08
**Status:** Research Complete
**Author:** Edgar (Subagent)
## Executive Summary
Landlock is a Linux Security Module that enables unprivileged sandboxing—allowing processes to restrict their own capabilities without requiring root privileges. For Volt (a VMM), Landlock provides compelling defense-in-depth benefits, but comes with kernel version requirements that must be carefully considered.
**Recommendation:** Make Landlock **optional but strongly encouraged**. When detected (kernel 5.13+), enable it by default. Document that users on older kernels have reduced defense-in-depth.
---
## 1. What is Landlock?
Landlock is a **stackable Linux Security Module (LSM)** that enables unprivileged processes to restrict their own ambient rights. Unlike traditional LSMs (SELinux, AppArmor), Landlock doesn't require system administrator configuration—applications can self-sandbox.
### Core Capabilities
| ABI Version | Kernel | Features |
|-------------|--------|----------|
| ABI 1 | 5.13+ | Filesystem access control (13 access rights) |
| ABI 2 | 5.19+ | `LANDLOCK_ACCESS_FS_REFER` (cross-directory moves/links) |
| ABI 3 | 6.2+ | `LANDLOCK_ACCESS_FS_TRUNCATE` |
| ABI 4 | 6.7+ | Network access control (TCP bind/connect) |
| ABI 5 | 6.10+ | `LANDLOCK_ACCESS_FS_IOCTL_DEV` (device ioctls) |
| ABI 6 | 6.12+ | IPC scoping (signals, abstract Unix sockets) |
| ABI 7 | 6.13+ | Audit logging support |
### How It Works
1. **Create a ruleset** defining handled access types:
```c
struct landlock_ruleset_attr ruleset_attr = {
.handled_access_fs = LANDLOCK_ACCESS_FS_READ_FILE |
LANDLOCK_ACCESS_FS_WRITE_FILE | ...
};
int ruleset_fd = landlock_create_ruleset(&ruleset_attr, sizeof(ruleset_attr), 0);
```
2. **Add rules** for allowed paths:
```c
struct landlock_path_beneath_attr path_beneath = {
.allowed_access = LANDLOCK_ACCESS_FS_READ_FILE,
.parent_fd = open("/allowed/path", O_PATH | O_CLOEXEC),
};
landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH, &path_beneath, 0);
```
3. **Enforce the ruleset** (irrevocable):
```c
prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); // Required first
landlock_restrict_self(ruleset_fd, 0);
```
### Key Properties
- **Unprivileged:** No CAP_SYS_ADMIN required (just `PR_SET_NO_NEW_PRIVS`)
- **Stackable:** Multiple layers can be applied; restrictions only accumulate
- **Irrevocable:** Once enforced, cannot be removed for process lifetime
- **Inherited:** Child processes inherit parent's Landlock domain
- **Path-based:** Rules attach to file hierarchies, not inodes
---
## 2. Kernel Version Requirements
### Minimum Requirements by Feature
| Feature | Minimum Kernel | Distro Support |
|---------|---------------|----------------|
| Basic filesystem | 5.13 (July 2021) | Ubuntu 22.04+, Debian 12+, RHEL 9+ |
| File referencing | 5.19 (July 2022) | Ubuntu 22.10+, Debian 12+ |
| File truncation | 6.2 (Feb 2023) | Ubuntu 23.04+, Fedora 38+ |
| Network (TCP) | 6.7 (Jan 2024) | Ubuntu 24.04+, Fedora 39+ |
### Distro Compatibility Matrix
| Distribution | Default Kernel | Landlock ABI | Network Support |
|--------------|---------------|--------------|-----------------|
| Ubuntu 20.04 LTS | 5.4 | ❌ None | ❌ |
| Ubuntu 22.04 LTS | 5.15 | ❌ None | ❌ |
| Ubuntu 24.04 LTS | 6.8 | ✅ ABI 4+ | ✅ |
| Debian 11 | 5.10 | ❌ None | ❌ |
| Debian 12 | 6.1 | ✅ ABI 3 | ❌ |
| RHEL 8 | 4.18 | ❌ None | ❌ |
| RHEL 9 | 5.14 | ✅ ABI 1 | ❌ |
| Fedora 40 | 6.8+ | ✅ ABI 4+ | ✅ |
### Detection at Runtime
```c
int abi = landlock_create_ruleset(NULL, 0, LANDLOCK_CREATE_RULESET_VERSION);
if (abi < 0) {
if (errno == ENOSYS) // Landlock not compiled in
if (errno == EOPNOTSUPP) // Landlock disabled
}
```
---
## 3. Advantages for Volt VMM
### 3.1 Defense in Depth Against VM Escape
If a guest exploits a vulnerability in the VMM (memory corruption, etc.) and achieves code execution in the VMM process, Landlock limits what the attacker can do:
| Attack Vector | Without Landlock | With Landlock |
|--------------|------------------|---------------|
| Read host files | Full access | Only allowed paths |
| Write host files | Full access | Only VM disk images |
| Execute binaries | Any executable | Denied (no EXECUTE right) |
| Network access | Unrestricted | Only specified ports (ABI 4+) |
| Device access | All /dev | Only /dev/kvm, /dev/net/tun |
### 3.2 Restricting VMM Process Capabilities
Volt can declare exactly what it needs:
```rust
// Example Volt Landlock policy
let ruleset = Ruleset::new()
.handle_access(AccessFs::ReadFile | AccessFs::WriteFile)?;
// Allow read-only access to kernel/initrd
ruleset.add_rule(PathBeneath::new(kernel_path, AccessFs::ReadFile))?;
ruleset.add_rule(PathBeneath::new(initrd_path, AccessFs::ReadFile))?;
// Allow read-write access to VM disk images
for disk in &vm_config.disks {
ruleset.add_rule(PathBeneath::new(&disk.path, AccessFs::ReadFile | AccessFs::WriteFile))?;
}
// Allow /dev/kvm and /dev/net/tun
ruleset.add_rule(PathBeneath::new("/dev/kvm", AccessFs::ReadFile | AccessFs::WriteFile))?;
ruleset.add_rule(PathBeneath::new("/dev/net/tun", AccessFs::ReadFile | AccessFs::WriteFile))?;
ruleset.restrict_self()?;
```
### 3.3 Comparison with seccomp-bpf
| Aspect | seccomp-bpf | Landlock |
|--------|-------------|----------|
| **Controls** | System call invocation | Resource access (files, network) |
| **Granularity** | Syscall number + args | Path hierarchies, ports |
| **Use case** | "Can call open()" | "Can access /tmp/vm-disk.img" |
| **Complexity** | Complex (BPF programs) | Simple (path-based rules) |
| **Kernel version** | 3.5+ | 5.13+ |
| **Pointer args** | Cannot inspect | N/A (path-based) |
| **Complementary?** | ✅ Yes | ✅ Yes |
**Key insight:** seccomp and Landlock are **complementary**, not alternatives.
- **seccomp:** "You may only call these 50 syscalls" (attack surface reduction)
- **Landlock:** "You may only access these specific files" (resource restriction)
A properly sandboxed VMM should use **both**:
1. seccomp to limit syscall surface
2. Landlock to limit accessible resources
---
## 4. Disadvantages and Considerations
### 4.1 Kernel Version Requirement
The 5.13+ requirement excludes:
- Ubuntu 20.04 LTS (EOL April 2025, but still deployed)
- Ubuntu 22.04 LTS without HWE kernel
- RHEL 8 (mainstream support until 2029)
- Debian 11 (EOL June 2026)
**Mitigation:** Make Landlock optional; gracefully degrade when unavailable.
### 4.2 ABI Evolution Complexity
Supporting multiple Landlock ABI versions requires careful coding:
```c
switch (abi) {
case 1:
ruleset_attr.handled_access_fs &= ~LANDLOCK_ACCESS_FS_REFER;
__attribute__((fallthrough));
case 2:
ruleset_attr.handled_access_fs &= ~LANDLOCK_ACCESS_FS_TRUNCATE;
__attribute__((fallthrough));
case 3:
ruleset_attr.handled_access_net = 0; // No network support
// ...
}
```
**Mitigation:** Use a Landlock library (e.g., `landlock` crate for Rust) that handles ABI negotiation.
### 4.3 Path Resolution Subtleties
- Bind mounts: Rules apply to the same files via either path
- OverlayFS: Rules do NOT propagate between layers and merged view
- Symlinks: Rules apply to the target, not the symlink itself
**Mitigation:** Document clearly; test with containerized/overlayfs scenarios.
### 4.4 No Dynamic Rule Modification
Once `landlock_restrict_self()` is called:
- Cannot remove rules
- Cannot expand allowed paths
- Can only add more restrictive rules
**For Volt:** Must know all needed paths at restriction time. For hotplug support, pre-declare potential hotplug paths (as Cloud Hypervisor does with `--landlock-rules`).
---
## 5. What Firecracker and Cloud Hypervisor Do
### 5.1 Firecracker
Firecracker uses a **multi-layered approach** via its "jailer" wrapper:
| Layer | Mechanism | Purpose |
|-------|-----------|---------|
| 1 | chroot + pivot_root | Filesystem isolation |
| 2 | User namespaces | UID/GID isolation |
| 3 | Network namespaces | Network isolation |
| 4 | Cgroups | Resource limits |
| 5 | seccomp-bpf | Syscall filtering |
| 6 | Capability dropping | Privilege reduction |
**Notably missing: Landlock.** Firecracker relies on the jailer's chroot for filesystem isolation, which requires:
- Root privileges to set up (then drops them)
- Careful hardlink/copy of resources into chroot
Firecracker's jailer is mature and battle-tested but requires privileged setup.
### 5.2 Cloud Hypervisor
Cloud Hypervisor **has native Landlock support** (`--landlock` flag):
```bash
./cloud-hypervisor \
--kernel ./vmlinux.bin \
--disk path=disk.raw \
--landlock \
--landlock-rules path="/path/to/hotplug",access="rw"
```
**Features:**
- Enabled via CLI flag (optional)
- Supports pre-declaring hotplug paths
- Falls back gracefully if kernel lacks support
- Combined with seccomp for defense in depth
**Cloud Hypervisor's approach is a good model for Volt.**
---
## 6. Recommendation for Volt
### Implementation Strategy
```
┌─────────────────────────────────────────────────────────────┐
│ Security Layer Stack │
├─────────────────────────────────────────────────────────────┤
│ Layer 5: Landlock (optional, 5.13+) │
│ - Filesystem path restrictions │
│ - Network port restrictions (6.7+) │
├─────────────────────────────────────────────────────────────┤
│ Layer 4: seccomp-bpf (required) │
│ - Syscall allowlist │
│ - Argument filtering │
├─────────────────────────────────────────────────────────────┤
│ Layer 3: Capability dropping (required) │
│ - Drop all caps except CAP_NET_ADMIN if needed │
├─────────────────────────────────────────────────────────────┤
│ Layer 2: User namespaces (optional) │
│ - Run as unprivileged user │
├─────────────────────────────────────────────────────────────┤
│ Layer 1: KVM isolation (inherent) │
│ - Hardware virtualization boundary │
└─────────────────────────────────────────────────────────────┘
```
### Specific Recommendations
1. **Make Landlock optional, default-enabled when available**
```rust
pub struct VoltConfig {
/// Enable Landlock sandboxing (requires kernel 5.13+)
/// Default: auto (enabled if available)
pub landlock: LandlockMode, // Auto | Enabled | Disabled
}
```
2. **Do NOT require kernel 5.13+**
- Too many production systems still on older kernels
- Landlock adds defense-in-depth, but seccomp+capabilities are adequate baseline
- Log a warning if Landlock unavailable
3. **Support hotplug path pre-declaration** (like Cloud Hypervisor)
```bash
volt-vmm --disk /vm/disk.img \
--landlock \
--landlock-allow-path /vm/hotplug/,rw
```
4. **Use the `landlock` Rust crate**
- Handles ABI version detection
- Provides ergonomic API
- Maintained, well-tested
5. **Minimum practical policy for VMM:**
```rust
// Read-only
- kernel image
- initrd
- any read-only disks
// Read-write
- VM disk images
- VM state/snapshot paths
- API socket path
- Logging paths
// Devices (special handling may be needed)
- /dev/kvm
- /dev/net/tun
- /dev/vhost-net (if used)
```
6. **Document security posture clearly:**
```
Volt Security Layers:
✅ KVM hardware isolation (always)
✅ seccomp syscall filtering (always)
✅ Capability dropping (always)
⚠️ Landlock filesystem restrictions (kernel 5.13+ required)
⚠️ Landlock network restrictions (kernel 6.7+ required)
```
### Why Not Require 5.13+?
| Consideration | Impact |
|---------------|--------|
| Ubuntu 22.04 LTS | Most common cloud image; ships 5.15 but Landlock often disabled |
| RHEL 8 | Enterprise deployments; kernel 4.18 |
| Embedded/IoT | Often run older LTS kernels |
| User expectations | VMMs should "just work" |
**Landlock is excellent defense-in-depth, but not a hard requirement.** The base security (KVM + seccomp + capabilities) is strong. Landlock makes it stronger.
---
## 7. Implementation Checklist
- [ ] Add `landlock` crate dependency
- [ ] Implement Landlock policy configuration
- [ ] Detect Landlock ABI at runtime
- [ ] Apply appropriate policy based on ABI version
- [ ] Support `--landlock` / `--no-landlock` CLI flags
- [ ] Support `--landlock-rules` for hotplug paths
- [ ] Log Landlock status at startup (enabled/disabled/unavailable)
- [ ] Document Landlock in security documentation
- [ ] Add integration tests with Landlock enabled
- [ ] Test on kernels without Landlock (graceful fallback)
---
## References
- [Landlock Documentation](https://landlock.io/)
- [Kernel Landlock API](https://docs.kernel.org/userspace-api/landlock.html)
- [Cloud Hypervisor Landlock docs](https://github.com/cloud-hypervisor/cloud-hypervisor/blob/main/docs/landlock.md)
- [Firecracker Jailer](https://github.com/firecracker-microvm/firecracker/blob/main/docs/jailer.md)
- [LWN: Landlock sets sail](https://lwn.net/Articles/859908/)
- [Rust landlock crate](https://crates.io/crates/landlock)