# CPUID Implementation for Volt VMM **Date**: 2025-03-08 **Status**: ✅ **IMPLEMENTED AND WORKING** ## Summary Implemented CPUID filtering and boot MSR configuration that enables Linux kernels to boot successfully in Volt VMM. The root cause of the previous triple-fault crash was missing CPUID configuration — specifically, the SYSCALL feature (CPUID 0x80000001, EDX bit 11) was not being advertised to the guest, causing a #GP fault when the kernel tried to enable it via WRMSR to EFER. ## Root Cause Analysis ### The Crash ``` vCPU 0 SHUTDOWN (triple fault?) at RIP=0xffffffff81000084 RAX=0x501 RCX=0xc0000080 (EFER MSR) CR3=0x1d08000 (kernel's early_top_pgt) EFER=0x500 (LME|LMA, but NOT SCE) ``` The kernel was trying to write `0x501` (LME | LMA | SCE) to EFER MSR at 0xC0000080. The SCE (SYSCALL Enable) bit requires CPUID to advertise SYSCALL support. Without proper CPUID, KVM generates #GP on the WRMSR. With IDT limit=0 (set by VMM for clean boot), #GP cascades to a triple fault. ### Why No CPUID Was a Problem Without `KVM_SET_CPUID2`, the vCPU presents a bare/default CPUID to the guest. This may not include: - **SYSCALL** (0x80000001 EDX bit 11) — Required for `wrmsr EFER.SCE` - **NX/XD** (0x80000001 EDX bit 20) — Required for NX page table entries - **Long Mode** (0x80000001 EDX bit 29) — Required for 64-bit - **Hypervisor** (0x1 ECX bit 31) — Tells kernel it's in a VM for paravirt optimizations ## Implementation ### New Files - **`vmm/src/kvm/cpuid.rs`** — Complete CPUID filtering module ### Modified Files - **`vmm/src/kvm/mod.rs`** — Added `cpuid` module and exports - **`vmm/src/kvm/vm.rs`** — Integrated CPUID into VM/vCPU creation flow - **`vmm/src/kvm/vcpu.rs`** — Added boot MSR configuration ### CPUID Filtering Details The implementation follows Firecracker's approach: 1. **Get host-supported CPUID** via `KVM_GET_SUPPORTED_CPUID` 2. **Filter/modify entries** per leaf: | Leaf | Action | Rationale | |------|--------|-----------| | 0x0 | Pass through vendor | Changing vendor breaks CPU-specific kernel paths | | 0x1 | Strip VMX/SMX/DTES64/MONITOR/DS_CPL, set HYPERVISOR bit | Security + paravirt | | 0x4 | Adjust core topology | Match vCPU count | | 0x6 | Clear all | Don't expose power management | | 0x7 | **Strip TSX (HLE/RTM)**, strip MPX, RDT | Security, deprecated features | | 0xA | Clear all | Disable PMU in guest | | 0xB | Set APIC IDs per vCPU | Topology | | 0x40000000 | Set KVM hypervisor signature | Enables KVM paravirt | | 0x80000001 | **Ensure SYSCALL, NX, LM bits** | **Critical fix** | | 0x80000007 | Only keep Invariant TSC | Clean power management | 3. **Apply to each vCPU** via `KVM_SET_CPUID2` before register setup ### Boot MSR Configuration Added `setup_boot_msrs()` to vcpu.rs, matching Firecracker's `create_boot_msr_entries()`: | MSR | Value | Purpose | |-----|-------|---------| | IA32_SYSENTER_CS/ESP/EIP | 0 | 32-bit syscall ABI (zeroed) | | STAR, LSTAR, CSTAR, SYSCALL_MASK | 0 | 64-bit syscall ABI (kernel fills later) | | KERNEL_GS_BASE | 0 | Per-CPU data (kernel fills later) | | IA32_TSC | 0 | Time Stamp Counter | | IA32_MISC_ENABLE | FAST_STRING (bit 0) | Enable fast string operations | | MTRRdefType | (1<<11) \| 6 | MTRR enabled, default write-back | ## Test Results ### Linux 4.14.174 (vmlinux-firecracker-official.bin) ``` ✅ Full boot to init (VFS panic expected — no rootfs provided) - Kernel version detected - KVM hypervisor detected - kvm-clock configured - NX protection active - CPU mitigations (Spectre V1/V2, SSBD, TSX) detected - All subsystems initialized (network, SCSI, serial, etc.) - Boot time: ~1.4 seconds to init ``` ### Minimal Hello Kernel (minimal-hello.elf) ``` ✅ Still works: "Hello from minimal kernel!" + "OK" ``` ## Architecture Notes ### Why vmlinux ELF Works Now The previous analysis (kernel-pagetable-analysis.md) identified that the kernel's `__startup_64()` builds its own page tables and switches CR3, abandoning the VMM's tables. This was thought to be the root cause. **It turns out that's not the issue.** The kernel's early page tables are sufficient for the kernel's own needs. The actual problem was: 1. Kernel enters `startup_64` at physical 0x1000000 2. `__startup_64()` builds page tables in kernel BSS (`early_top_pgt` at physical 0x1d08000) 3. CR3 switches to kernel's tables 4. Kernel tries `wrmsr EFER, 0x501` to enable SYSCALL 5. **Without CPUID advertising SYSCALL support → #GP → triple fault** With CPUID properly configured: 5. WRMSR succeeds (CPUID advertises SYSCALL) 6. Kernel continues initialization 7. Kernel sets up its own IDT/GDT for exception handling 8. Early page fault handler manages any unmapped pages lazily ### Key Insight The vmlinux direct boot works because: - The kernel's `__startup_64` only needs kernel text mapped (which it creates) - boot_params at 0x20000 is accessed early but via `%rsi` and identity mapping (before CR3 switch) - The kernel's early exception handler can resolve any subsequent page faults - **The crash was purely a CPUID/feature issue, not a page table issue** ## References - [Firecracker CPUID source](https://github.com/firecracker-microvm/firecracker/tree/main/src/vmm/src/cpu_config/x86_64/cpuid) - [Firecracker boot MSRs](https://github.com/firecracker-microvm/firecracker/blob/main/src/vmm/src/arch/x86_64/msr.rs) - [Linux kernel CPUID usage](https://elixir.bootlin.com/linux/v4.14/source/arch/x86/kernel/head_64.S) - [Intel SDM Vol 2A: CPUID](https://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-vol-2a-manual.html)