Files
volt-vmm/docs/cpuid-implementation.md
Karl Clinger 40ed108dd5 Volt VMM (Neutron Stardust): source-available under AGPSL v5.0
KVM-based microVMM for the Volt platform:
- Sub-second VM boot times
- Minimal memory footprint
- Landlock LSM + seccomp security
- Virtio device support
- Custom kernel management

Copyright (c) Armored Gates LLC. All rights reserved.
Licensed under AGPSL v5.0
2026-03-21 01:04:35 -05:00

5.5 KiB

CPUID Implementation for Volt VMM

Date: 2025-03-08 Status: IMPLEMENTED AND WORKING

Summary

Implemented CPUID filtering and boot MSR configuration that enables Linux kernels to boot successfully in Volt VMM. The root cause of the previous triple-fault crash was missing CPUID configuration — specifically, the SYSCALL feature (CPUID 0x80000001, EDX bit 11) was not being advertised to the guest, causing a #GP fault when the kernel tried to enable it via WRMSR to EFER.

Root Cause Analysis

The Crash

vCPU 0 SHUTDOWN (triple fault?) at RIP=0xffffffff81000084
RAX=0x501 RCX=0xc0000080 (EFER MSR)
CR3=0x1d08000 (kernel's early_top_pgt)
EFER=0x500 (LME|LMA, but NOT SCE)

The kernel was trying to write 0x501 (LME | LMA | SCE) to EFER MSR at 0xC0000080. The SCE (SYSCALL Enable) bit requires CPUID to advertise SYSCALL support. Without proper CPUID, KVM generates #GP on the WRMSR. With IDT limit=0 (set by VMM for clean boot), #GP cascades to a triple fault.

Why No CPUID Was a Problem

Without KVM_SET_CPUID2, the vCPU presents a bare/default CPUID to the guest. This may not include:

  • SYSCALL (0x80000001 EDX bit 11) — Required for wrmsr EFER.SCE
  • NX/XD (0x80000001 EDX bit 20) — Required for NX page table entries
  • Long Mode (0x80000001 EDX bit 29) — Required for 64-bit
  • Hypervisor (0x1 ECX bit 31) — Tells kernel it's in a VM for paravirt optimizations

Implementation

New Files

  • vmm/src/kvm/cpuid.rs — Complete CPUID filtering module

Modified Files

  • vmm/src/kvm/mod.rs — Added cpuid module and exports
  • vmm/src/kvm/vm.rs — Integrated CPUID into VM/vCPU creation flow
  • vmm/src/kvm/vcpu.rs — Added boot MSR configuration

CPUID Filtering Details

The implementation follows Firecracker's approach:

  1. Get host-supported CPUID via KVM_GET_SUPPORTED_CPUID
  2. Filter/modify entries per leaf:
Leaf Action Rationale
0x0 Pass through vendor Changing vendor breaks CPU-specific kernel paths
0x1 Strip VMX/SMX/DTES64/MONITOR/DS_CPL, set HYPERVISOR bit Security + paravirt
0x4 Adjust core topology Match vCPU count
0x6 Clear all Don't expose power management
0x7 Strip TSX (HLE/RTM), strip MPX, RDT Security, deprecated features
0xA Clear all Disable PMU in guest
0xB Set APIC IDs per vCPU Topology
0x40000000 Set KVM hypervisor signature Enables KVM paravirt
0x80000001 Ensure SYSCALL, NX, LM bits Critical fix
0x80000007 Only keep Invariant TSC Clean power management
  1. Apply to each vCPU via KVM_SET_CPUID2 before register setup

Boot MSR Configuration

Added setup_boot_msrs() to vcpu.rs, matching Firecracker's create_boot_msr_entries():

MSR Value Purpose
IA32_SYSENTER_CS/ESP/EIP 0 32-bit syscall ABI (zeroed)
STAR, LSTAR, CSTAR, SYSCALL_MASK 0 64-bit syscall ABI (kernel fills later)
KERNEL_GS_BASE 0 Per-CPU data (kernel fills later)
IA32_TSC 0 Time Stamp Counter
IA32_MISC_ENABLE FAST_STRING (bit 0) Enable fast string operations
MTRRdefType (1<<11) | 6 MTRR enabled, default write-back

Test Results

Linux 4.14.174 (vmlinux-firecracker-official.bin)

✅ Full boot to init (VFS panic expected — no rootfs provided)
- Kernel version detected
- KVM hypervisor detected
- kvm-clock configured
- NX protection active
- CPU mitigations (Spectre V1/V2, SSBD, TSX) detected
- All subsystems initialized (network, SCSI, serial, etc.)
- Boot time: ~1.4 seconds to init

Minimal Hello Kernel (minimal-hello.elf)

✅ Still works: "Hello from minimal kernel!" + "OK"

Architecture Notes

Why vmlinux ELF Works Now

The previous analysis (kernel-pagetable-analysis.md) identified that the kernel's __startup_64() builds its own page tables and switches CR3, abandoning the VMM's tables. This was thought to be the root cause.

It turns out that's not the issue. The kernel's early page tables are sufficient for the kernel's own needs. The actual problem was:

  1. Kernel enters startup_64 at physical 0x1000000
  2. __startup_64() builds page tables in kernel BSS (early_top_pgt at physical 0x1d08000)
  3. CR3 switches to kernel's tables
  4. Kernel tries wrmsr EFER, 0x501 to enable SYSCALL
  5. Without CPUID advertising SYSCALL support → #GP → triple fault

With CPUID properly configured: 5. WRMSR succeeds (CPUID advertises SYSCALL) 6. Kernel continues initialization 7. Kernel sets up its own IDT/GDT for exception handling 8. Early page fault handler manages any unmapped pages lazily

Key Insight

The vmlinux direct boot works because:

  • The kernel's __startup_64 only needs kernel text mapped (which it creates)
  • boot_params at 0x20000 is accessed early but via %rsi and identity mapping (before CR3 switch)
  • The kernel's early exception handler can resolve any subsequent page faults
  • The crash was purely a CPUID/feature issue, not a page table issue

References