KVM-based microVMM for the Volt platform: - Sub-second VM boot times - Minimal memory footprint - Landlock LSM + seccomp security - Virtio device support - Custom kernel management Copyright (c) Armored Gates LLC. All rights reserved. Licensed under AGPSL v5.0
5.5 KiB
CPUID Implementation for Volt VMM
Date: 2025-03-08 Status: ✅ IMPLEMENTED AND WORKING
Summary
Implemented CPUID filtering and boot MSR configuration that enables Linux kernels to boot successfully in Volt VMM. The root cause of the previous triple-fault crash was missing CPUID configuration — specifically, the SYSCALL feature (CPUID 0x80000001, EDX bit 11) was not being advertised to the guest, causing a #GP fault when the kernel tried to enable it via WRMSR to EFER.
Root Cause Analysis
The Crash
vCPU 0 SHUTDOWN (triple fault?) at RIP=0xffffffff81000084
RAX=0x501 RCX=0xc0000080 (EFER MSR)
CR3=0x1d08000 (kernel's early_top_pgt)
EFER=0x500 (LME|LMA, but NOT SCE)
The kernel was trying to write 0x501 (LME | LMA | SCE) to EFER MSR at 0xC0000080. The SCE (SYSCALL Enable) bit requires CPUID to advertise SYSCALL support. Without proper CPUID, KVM generates #GP on the WRMSR. With IDT limit=0 (set by VMM for clean boot), #GP cascades to a triple fault.
Why No CPUID Was a Problem
Without KVM_SET_CPUID2, the vCPU presents a bare/default CPUID to the guest. This may not include:
- SYSCALL (0x80000001 EDX bit 11) — Required for
wrmsr EFER.SCE - NX/XD (0x80000001 EDX bit 20) — Required for NX page table entries
- Long Mode (0x80000001 EDX bit 29) — Required for 64-bit
- Hypervisor (0x1 ECX bit 31) — Tells kernel it's in a VM for paravirt optimizations
Implementation
New Files
vmm/src/kvm/cpuid.rs— Complete CPUID filtering module
Modified Files
vmm/src/kvm/mod.rs— Addedcpuidmodule and exportsvmm/src/kvm/vm.rs— Integrated CPUID into VM/vCPU creation flowvmm/src/kvm/vcpu.rs— Added boot MSR configuration
CPUID Filtering Details
The implementation follows Firecracker's approach:
- Get host-supported CPUID via
KVM_GET_SUPPORTED_CPUID - Filter/modify entries per leaf:
| Leaf | Action | Rationale |
|---|---|---|
| 0x0 | Pass through vendor | Changing vendor breaks CPU-specific kernel paths |
| 0x1 | Strip VMX/SMX/DTES64/MONITOR/DS_CPL, set HYPERVISOR bit | Security + paravirt |
| 0x4 | Adjust core topology | Match vCPU count |
| 0x6 | Clear all | Don't expose power management |
| 0x7 | Strip TSX (HLE/RTM), strip MPX, RDT | Security, deprecated features |
| 0xA | Clear all | Disable PMU in guest |
| 0xB | Set APIC IDs per vCPU | Topology |
| 0x40000000 | Set KVM hypervisor signature | Enables KVM paravirt |
| 0x80000001 | Ensure SYSCALL, NX, LM bits | Critical fix |
| 0x80000007 | Only keep Invariant TSC | Clean power management |
- Apply to each vCPU via
KVM_SET_CPUID2before register setup
Boot MSR Configuration
Added setup_boot_msrs() to vcpu.rs, matching Firecracker's create_boot_msr_entries():
| MSR | Value | Purpose |
|---|---|---|
| IA32_SYSENTER_CS/ESP/EIP | 0 | 32-bit syscall ABI (zeroed) |
| STAR, LSTAR, CSTAR, SYSCALL_MASK | 0 | 64-bit syscall ABI (kernel fills later) |
| KERNEL_GS_BASE | 0 | Per-CPU data (kernel fills later) |
| IA32_TSC | 0 | Time Stamp Counter |
| IA32_MISC_ENABLE | FAST_STRING (bit 0) | Enable fast string operations |
| MTRRdefType | (1<<11) | 6 | MTRR enabled, default write-back |
Test Results
Linux 4.14.174 (vmlinux-firecracker-official.bin)
✅ Full boot to init (VFS panic expected — no rootfs provided)
- Kernel version detected
- KVM hypervisor detected
- kvm-clock configured
- NX protection active
- CPU mitigations (Spectre V1/V2, SSBD, TSX) detected
- All subsystems initialized (network, SCSI, serial, etc.)
- Boot time: ~1.4 seconds to init
Minimal Hello Kernel (minimal-hello.elf)
✅ Still works: "Hello from minimal kernel!" + "OK"
Architecture Notes
Why vmlinux ELF Works Now
The previous analysis (kernel-pagetable-analysis.md) identified that the kernel's __startup_64() builds its own page tables and switches CR3, abandoning the VMM's tables. This was thought to be the root cause.
It turns out that's not the issue. The kernel's early page tables are sufficient for the kernel's own needs. The actual problem was:
- Kernel enters
startup_64at physical 0x1000000 __startup_64()builds page tables in kernel BSS (early_top_pgtat physical 0x1d08000)- CR3 switches to kernel's tables
- Kernel tries
wrmsr EFER, 0x501to enable SYSCALL - Without CPUID advertising SYSCALL support → #GP → triple fault
With CPUID properly configured: 5. WRMSR succeeds (CPUID advertises SYSCALL) 6. Kernel continues initialization 7. Kernel sets up its own IDT/GDT for exception handling 8. Early page fault handler manages any unmapped pages lazily
Key Insight
The vmlinux direct boot works because:
- The kernel's
__startup_64only needs kernel text mapped (which it creates) - boot_params at 0x20000 is accessed early but via
%rsiand identity mapping (before CR3 switch) - The kernel's early exception handler can resolve any subsequent page faults
- The crash was purely a CPUID/feature issue, not a page table issue