Volt VMM (Neutron Stardust): source-available under AGPSL v5.0

KVM-based microVMM for the Volt platform:
- Sub-second VM boot times
- Minimal memory footprint
- Landlock LSM + seccomp security
- Virtio device support
- Custom kernel management

Copyright (c) Armored Gates LLC. All rights reserved.
Licensed under AGPSL v5.0
This commit is contained in:
Karl Clinger
2026-03-21 01:04:35 -05:00
commit 40ed108dd5
143 changed files with 50300 additions and 0 deletions

View File

@@ -0,0 +1,125 @@
# CPUID Implementation for Volt VMM
**Date**: 2025-03-08
**Status**: ✅ **IMPLEMENTED AND WORKING**
## Summary
Implemented CPUID filtering and boot MSR configuration that enables Linux kernels to boot successfully in Volt VMM. The root cause of the previous triple-fault crash was missing CPUID configuration — specifically, the SYSCALL feature (CPUID 0x80000001, EDX bit 11) was not being advertised to the guest, causing a #GP fault when the kernel tried to enable it via WRMSR to EFER.
## Root Cause Analysis
### The Crash
```
vCPU 0 SHUTDOWN (triple fault?) at RIP=0xffffffff81000084
RAX=0x501 RCX=0xc0000080 (EFER MSR)
CR3=0x1d08000 (kernel's early_top_pgt)
EFER=0x500 (LME|LMA, but NOT SCE)
```
The kernel was trying to write `0x501` (LME | LMA | SCE) to EFER MSR at 0xC0000080. The SCE (SYSCALL Enable) bit requires CPUID to advertise SYSCALL support. Without proper CPUID, KVM generates #GP on the WRMSR. With IDT limit=0 (set by VMM for clean boot), #GP cascades to a triple fault.
### Why No CPUID Was a Problem
Without `KVM_SET_CPUID2`, the vCPU presents a bare/default CPUID to the guest. This may not include:
- **SYSCALL** (0x80000001 EDX bit 11) — Required for `wrmsr EFER.SCE`
- **NX/XD** (0x80000001 EDX bit 20) — Required for NX page table entries
- **Long Mode** (0x80000001 EDX bit 29) — Required for 64-bit
- **Hypervisor** (0x1 ECX bit 31) — Tells kernel it's in a VM for paravirt optimizations
## Implementation
### New Files
- **`vmm/src/kvm/cpuid.rs`** — Complete CPUID filtering module
### Modified Files
- **`vmm/src/kvm/mod.rs`** — Added `cpuid` module and exports
- **`vmm/src/kvm/vm.rs`** — Integrated CPUID into VM/vCPU creation flow
- **`vmm/src/kvm/vcpu.rs`** — Added boot MSR configuration
### CPUID Filtering Details
The implementation follows Firecracker's approach:
1. **Get host-supported CPUID** via `KVM_GET_SUPPORTED_CPUID`
2. **Filter/modify entries** per leaf:
| Leaf | Action | Rationale |
|------|--------|-----------|
| 0x0 | Pass through vendor | Changing vendor breaks CPU-specific kernel paths |
| 0x1 | Strip VMX/SMX/DTES64/MONITOR/DS_CPL, set HYPERVISOR bit | Security + paravirt |
| 0x4 | Adjust core topology | Match vCPU count |
| 0x6 | Clear all | Don't expose power management |
| 0x7 | **Strip TSX (HLE/RTM)**, strip MPX, RDT | Security, deprecated features |
| 0xA | Clear all | Disable PMU in guest |
| 0xB | Set APIC IDs per vCPU | Topology |
| 0x40000000 | Set KVM hypervisor signature | Enables KVM paravirt |
| 0x80000001 | **Ensure SYSCALL, NX, LM bits** | **Critical fix** |
| 0x80000007 | Only keep Invariant TSC | Clean power management |
3. **Apply to each vCPU** via `KVM_SET_CPUID2` before register setup
### Boot MSR Configuration
Added `setup_boot_msrs()` to vcpu.rs, matching Firecracker's `create_boot_msr_entries()`:
| MSR | Value | Purpose |
|-----|-------|---------|
| IA32_SYSENTER_CS/ESP/EIP | 0 | 32-bit syscall ABI (zeroed) |
| STAR, LSTAR, CSTAR, SYSCALL_MASK | 0 | 64-bit syscall ABI (kernel fills later) |
| KERNEL_GS_BASE | 0 | Per-CPU data (kernel fills later) |
| IA32_TSC | 0 | Time Stamp Counter |
| IA32_MISC_ENABLE | FAST_STRING (bit 0) | Enable fast string operations |
| MTRRdefType | (1<<11) \| 6 | MTRR enabled, default write-back |
## Test Results
### Linux 4.14.174 (vmlinux-firecracker-official.bin)
```
✅ Full boot to init (VFS panic expected — no rootfs provided)
- Kernel version detected
- KVM hypervisor detected
- kvm-clock configured
- NX protection active
- CPU mitigations (Spectre V1/V2, SSBD, TSX) detected
- All subsystems initialized (network, SCSI, serial, etc.)
- Boot time: ~1.4 seconds to init
```
### Minimal Hello Kernel (minimal-hello.elf)
```
✅ Still works: "Hello from minimal kernel!" + "OK"
```
## Architecture Notes
### Why vmlinux ELF Works Now
The previous analysis (kernel-pagetable-analysis.md) identified that the kernel's `__startup_64()` builds its own page tables and switches CR3, abandoning the VMM's tables. This was thought to be the root cause.
**It turns out that's not the issue.** The kernel's early page tables are sufficient for the kernel's own needs. The actual problem was:
1. Kernel enters `startup_64` at physical 0x1000000
2. `__startup_64()` builds page tables in kernel BSS (`early_top_pgt` at physical 0x1d08000)
3. CR3 switches to kernel's tables
4. Kernel tries `wrmsr EFER, 0x501` to enable SYSCALL
5. **Without CPUID advertising SYSCALL support → #GP → triple fault**
With CPUID properly configured:
5. WRMSR succeeds (CPUID advertises SYSCALL)
6. Kernel continues initialization
7. Kernel sets up its own IDT/GDT for exception handling
8. Early page fault handler manages any unmapped pages lazily
### Key Insight
The vmlinux direct boot works because:
- The kernel's `__startup_64` only needs kernel text mapped (which it creates)
- boot_params at 0x20000 is accessed early but via `%rsi` and identity mapping (before CR3 switch)
- The kernel's early exception handler can resolve any subsequent page faults
- **The crash was purely a CPUID/feature issue, not a page table issue**
## References
- [Firecracker CPUID source](https://github.com/firecracker-microvm/firecracker/tree/main/src/vmm/src/cpu_config/x86_64/cpuid)
- [Firecracker boot MSRs](https://github.com/firecracker-microvm/firecracker/blob/main/src/vmm/src/arch/x86_64/msr.rs)
- [Linux kernel CPUID usage](https://elixir.bootlin.com/linux/v4.14/source/arch/x86/kernel/head_64.S)
- [Intel SDM Vol 2A: CPUID](https://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-vol-2a-manual.html)