Volt VMM (Neutron Stardust): source-available under AGPSL v5.0

KVM-based microVMM for the Volt platform:
- Sub-second VM boot times
- Minimal memory footprint
- Landlock LSM + seccomp security
- Virtio device support
- Custom kernel management

Copyright (c) Armored Gates LLC. All rights reserved.
Licensed under AGPSL v5.0
This commit is contained in:
Karl Clinger
2026-03-21 01:04:35 -05:00
commit 40ed108dd5
143 changed files with 50300 additions and 0 deletions

View File

@@ -0,0 +1,424 @@
# Firecracker VMM Benchmark Results
**Date:** 2026-03-08
**Firecracker Version:** v1.14.2 (latest stable)
**Binary:** static-pie linked, x86_64, not stripped
**Test Host:** julius — Intel Xeon Silver 4210R @ 2.40GHz, 20 cores, Linux 6.1.0-42-amd64
**Kernel:** vmlinux-4.14.174 (Firecracker's official guest kernel, 21,441,304 bytes)
**Methodology:** No rootfs attached — kernel boots to VFS panic. Matches Volt test methodology.
---
## Table of Contents
1. [Executive Summary](#1-executive-summary)
2. [Binary Size](#2-binary-size)
3. [Cold Boot Time](#3-cold-boot-time)
4. [Startup Breakdown](#4-startup-breakdown)
5. [Memory Overhead](#5-memory-overhead)
6. [CPU Features (CPUID)](#6-cpu-features-cpuid)
7. [Thread Model](#7-thread-model)
8. [Comparison with Volt](#8-comparison-with-volt-vmm)
9. [Methodology Notes](#9-methodology-notes)
---
## 1. Executive Summary
| Metric | Firecracker v1.14.2 | Notes |
|--------|---------------------|-------|
| Binary size | 3.44 MB (3,436,512 bytes) | Static-pie, not stripped |
| Cold boot to kernel panic (wall) | **1,127ms median** | Includes ~500ms i8042 stall |
| Cold boot (no i8042 stall) | **351ms median** | With `i8042.noaux i8042.nokbd` |
| Kernel internal boot time | **912ms** / **138ms** | Default / no-i8042 |
| VMM overhead (startup→VM running) | **~80ms** | FC process + API + KVM setup |
| RSS at 128MB guest | **52 MB** | ~50MB VMM overhead |
| RSS at 256MB guest | **56 MB** | +4MB vs 128MB guest |
| RSS at 512MB guest | **60 MB** | +8MB vs 128MB guest |
| Threads during VM run | 3 | main + fc_api + fc_vcpu_0 |
**Key Finding:** The ~912ms "boot time" with the default Firecracker kernel (4.14.174) is dominated by a **~500ms i8042 keyboard controller timeout**. The actual kernel initialization takes only ~130ms. This is a kernel issue, not a VMM issue.
---
## 2. Binary Size
```
-rwxr-xr-x 1 karl karl 3,436,512 Feb 26 11:32 firecracker-v1.14.2-x86_64
```
| Property | Value |
|----------|-------|
| Size | 3.44 MB (3,436,512 bytes) |
| Format | ELF 64-bit LSB pie executable, x86-64 |
| Linking | Static-pie (no shared library dependencies) |
| Stripped | No (includes symbol table) |
| Debug sections | 0 |
| Language | Rust |
### Related Binaries
| Binary | Size |
|--------|------|
| firecracker | 3.44 MB |
| jailer | 2.29 MB |
| cpu-template-helper | 2.58 MB |
| snapshot-editor | 1.23 MB |
| seccompiler-bin | 1.16 MB |
| rebase-snap | 0.52 MB |
---
## 3. Cold Boot Time
### Default Boot Args (`console=ttyS0 reboot=k panic=1 pci=off`)
10 iterations, 128MB guest RAM, 1 vCPU:
| Iteration | Wall Clock (ms) | Kernel Time (s) |
|-----------|-----------------|------------------|
| 1 | 1,130 | 0.9156 |
| 2 | 1,144 | 0.9097 |
| 3 | 1,132 | 0.9112 |
| 4 | 1,113 | 0.9138 |
| 5 | 1,126 | 0.9115 |
| 6 | 1,128 | 0.9130 |
| 7 | 1,143 | 0.9099 |
| 8 | 1,117 | 0.9119 |
| 9 | 1,123 | 0.9119 |
| 10 | 1,115 | 0.9169 |
| Statistic | Wall Clock (ms) | Kernel Time (ms) |
|-----------|-----------------|-------------------|
| **Min** | 1,113 | 910 |
| **Median** | 1,127 | 912 |
| **Max** | 1,144 | 917 |
| **Mean** | 1,127 | 913 |
| **Stddev** | ~10 | ~2 |
### Optimized Boot Args (`... i8042.noaux i8042.nokbd`)
Disabling the i8042 keyboard controller removes a ~500ms probe timeout:
| Iteration | Wall Clock (ms) | Kernel Time (s) |
|-----------|-----------------|------------------|
| 1 | 330 | 0.1418 |
| 2 | 347 | 0.1383 |
| 3 | 357 | 0.1391 |
| 4 | 358 | 0.1379 |
| 5 | 351 | 0.1367 |
| 6 | 371 | 0.1385 |
| 7 | 346 | 0.1376 |
| 8 | 378 | 0.1393 |
| 9 | 328 | 0.1382 |
| 10 | 355 | 0.1388 |
| Statistic | Wall Clock (ms) | Kernel Time (ms) |
|-----------|-----------------|-------------------|
| **Min** | 328 | 137 |
| **Median** | 353 | 138 |
| **Max** | 378 | 142 |
| **Mean** | 352 | 138 |
### Wall Clock vs Kernel Time Gap Analysis
The ~200ms gap between wall clock and kernel internal time is:
- **~80ms** — Firecracker process startup + API configuration + KVM VM creation
- **~125ms** — Kernel time between panic message and process exit (reboot handling, serial flush)
---
## 4. Startup Breakdown
Measured with nanosecond wall-clock timing of each API call:
| Phase | Duration | Cumulative | Description |
|-------|----------|------------|-------------|
| **FC process start → socket ready** | 7-9 ms | 8 ms | Firecracker binary loads, creates API socket |
| **PUT /boot-source** | 12-16 ms | 22 ms | Loads + validates kernel ELF (21MB) |
| **PUT /machine-config** | 8-15 ms | 33 ms | Validates machine configuration |
| **PUT /actions (InstanceStart)** | 44-74 ms | 80 ms | Creates KVM VM, allocates guest memory, sets up vCPU, page tables, starts vCPU thread |
| **Kernel boot (with i8042)** | ~912 ms | 992 ms | Includes 500ms i8042 probe timeout |
| **Kernel boot (no i8042)** | ~138 ms | 218 ms | Pure kernel initialization |
| **Kernel panic → process exit** | ~125 ms | — | Reboot handling, serial flush |
### API Overhead Detail (5 runs)
| Run | Socket | Boot-src | Machine-cfg | InstanceStart | Total to VM |
|-----|--------|----------|-------------|---------------|-------------|
| 1 | 9ms | 11ms | 8ms | 48ms | 76ms |
| 2 | 9ms | 14ms | 14ms | 63ms | 101ms |
| 3 | 8ms | 12ms | 15ms | 65ms | 101ms |
| 4 | 9ms | 13ms | 8ms | 44ms | 75ms |
| 5 | 9ms | 14ms | 9ms | 74ms | 108ms |
| **Median** | **9ms** | **13ms** | **9ms** | **63ms** | **101ms** |
The InstanceStart phase is the most variable (44-74ms) because it does the heavy lifting: KVM_CREATE_VM, mmap guest memory, set up page tables, configure vCPU registers, create vCPU thread, and enter KVM_RUN.
### Seccomp Impact
| Mode | Avg Wall Clock (5 runs) |
|------|------------------------|
| With seccomp | 8ms to exit |
| Without seccomp (`--no-seccomp`) | 8ms to exit |
Seccomp has no measurable impact on boot time (measured with `--no-api --config-file` mode).
---
## 5. Memory Overhead
### RSS by Guest Memory Size
Measured during active VM execution (kernel booted, pre-panic):
| Guest Memory | RSS (KB) | RSS (MB) | VSZ (KB) | VSZ (MB) | VMM Overhead |
|-------------|----------|----------|----------|----------|-------------|
| — (pre-boot) | 3,396 | 3 | — | — | Base process |
| 128 MB | 51,26053,520 | 5052 | 139,084 | 135 | ~50 MB |
| 256 MB | 57,61657,972 | 5657 | 270,156 | 263 | ~54 MB |
| 512 MB | 61,70462,068 | 6061 | 532,300 | 519 | ~58 MB |
### Memory Breakdown (128MB guest)
From `/proc/PID/smaps_rollup` and `/proc/PID/status`:
| Metric | Value |
|--------|-------|
| Pss (proportional) | 51,800 KB |
| Pss_Anon | 49,432 KB |
| Pss_File | 2,364 KB |
| AnonHugePages | 47,104 KB |
| VmData | 136,128 KB (132 MB) |
| VmExe | 2,380 KB (2.3 MB) |
| VmStk | 132 KB |
| VmLib | 8 KB |
| Memory regions | 29 |
| Threads | 3 |
### Key Observations
1. **Guest memory is mmap'd but demand-paged**: VSZ scales linearly with guest size, but RSS only reflects touched pages
2. **VMM base overhead is ~3.4 MB** (pre-boot RSS)
3. **~50 MB RSS at 128MB guest**: The kernel touches ~47MB during boot (page tables, kernel code, data structures)
4. **AnonHugePages = 47MB**: THP (Transparent Huge Pages) is used for guest memory, reducing TLB pressure
5. **Scaling**: RSS increases ~4MB per 128MB of additional guest memory (minimal — guest pages are only touched on demand)
### Pre-boot vs Post-boot Memory
| Phase | RSS |
|-------|-----|
| After FC process start | 3,396 KB (3.3 MB) |
| After boot-source + machine-config | 3,396 KB (3.3 MB) — no change |
| After InstanceStart (VM running) | 51,260+ KB (~50 MB) |
All guest memory allocation happens during InstanceStart. The API configuration phase uses zero additional memory.
---
## 6. CPU Features (CPUID)
Firecracker v1.14.2 exposes the following CPU features to guests (as reported by kernel 4.14.174):
### XSAVE Features Exposed
| Feature | XSAVE Bit | Offset | Size |
|---------|-----------|--------|------|
| x87 FPU | 0x001 | — | — |
| SSE | 0x002 | — | — |
| AVX | 0x004 | 576 | 256 bytes |
| MPX bounds | 0x008 | 832 | 64 bytes |
| MPX CSR | 0x010 | 896 | 64 bytes |
| AVX-512 opmask | 0x020 | 960 | 64 bytes |
| AVX-512 Hi256 | 0x040 | 1024 | 512 bytes |
| AVX-512 ZMM_Hi256 | 0x080 | 1536 | 1024 bytes |
| PKU | 0x200 | 2560 | 8 bytes |
Total XSAVE context: 2,568 bytes (compacted format).
### CPU Identity (as seen by guest)
```
vendor_id: GenuineIntel
model name: Intel(R) Xeon(R) Processor @ 2.40GHz
family: 0x6
model: 0x55
stepping: 0x7
```
Firecracker strips the full CPU model name and reports a generic "Intel(R) Xeon(R) Processor @ 2.40GHz" (removed "Silver 4210R" from host).
### Security Mitigations Active in Guest
| Mitigation | Status |
|-----------|--------|
| NX (Execute Disable) | Active |
| Spectre V1 | usercopy/swapgs barriers |
| Spectre V2 | Enhanced IBRS |
| SpectreRSB | RSB filling on context switch |
| IBPB | Conditional on context switch |
| SSBD | Via prctl and seccomp |
| TAA | TSX disabled |
### Paravirt Features
| Feature | Present |
|---------|---------|
| KVM hypervisor detection | ✅ |
| kvm-clock | ✅ (MSRs 4b564d01/4b564d00) |
| KVM async PF | ✅ |
| KVM stealtime | ✅ |
| PV qspinlock | ✅ |
| x2apic | ✅ |
### Devices Visible to Guest
| Device | Type | Notes |
|--------|------|-------|
| Serial (ttyS0) | I/O 0x3f8 | 8250/16550 UART (U6_16550A) |
| i8042 keyboard | I/O 0x60, 0x64 | PS/2 controller |
| IOAPIC | MMIO 0xfec00000 | 24 GSIs |
| Local APIC | MMIO 0xfee00000 | x2apic mode |
| virtio-mmio | MMIO | Not probed (pci=off, no rootfs) |
---
## 7. Thread Model
Firecracker uses a minimal thread model:
| Thread | Name | Role |
|--------|------|------|
| Main | `firecracker-bin` | Event loop, serial I/O, device emulation |
| API | `fc_api` | HTTP API server on Unix socket |
| vCPU 0 | `fc_vcpu 0` | KVM_RUN loop for vCPU 0 |
With N vCPUs, there would be N+2 threads total.
### Process Details
| Property | Value |
|----------|-------|
| Seccomp | Level 2 (strict) |
| NoNewPrivs | Yes |
| Capabilities | None (all dropped) |
| Seccomp filters | 1 |
| FD limit | 1,048,576 |
---
## 8. Comparison with Volt
### Binary Size
| VMM | Size | Linking |
|-----|------|---------|
| Firecracker v1.14.2 | 3.44 MB (3,436,512 bytes) | Static-pie, not stripped |
| Volt 0.1.0 | 3.26 MB (3,258,448 bytes) | Dynamic (release build) |
Volt is **5% smaller**, though Firecracker is statically linked (includes musl libc).
### Boot Time Comparison
Both tested with the same kernel (vmlinux-4.14.174), same boot args, no rootfs:
| Metric | Firecracker | Volt | Delta |
|--------|-------------|-----------|-------|
| Wall clock (default boot) | 1,127ms median | TBD | — |
| Kernel internal time | 912ms | TBD | — |
| VMM startup overhead | ~80ms | TBD | — |
| Wall clock (no i8042) | 351ms median | TBD | — |
**Note:** Fill in Volt numbers from `benchmark-volt-vmm.md` for direct comparison.
### Memory Overhead
| Guest Size | Firecracker RSS | Volt RSS | Delta |
|-----------|-----------------|---------------|-------|
| Pre-boot (base) | 3.3 MB | TBD | — |
| 128 MB | 5052 MB | TBD | — |
| 256 MB | 5657 MB | TBD | — |
| 512 MB | 6061 MB | TBD | — |
### Architecture Differences Affecting Performance
| Aspect | Firecracker | Volt |
|--------|-------------|-----------|
| API model | REST over Unix socket (always on) | Direct (no API server) |
| Thread model | main + api + N×vcpu | main + N×vcpu |
| Memory allocation | During InstanceStart | During VM setup |
| Kernel loading | Via API call (separate step) | At startup |
| Seccomp | BPF filter, ~50 syscalls | Planned |
| Guest memory | mmap + demand-paging + THP | TBD |
Firecracker's API-based architecture adds ~80ms overhead but enables runtime configuration. A direct-launch VMM like Volt can potentially start faster by eliminating the socket setup and HTTP parsing.
---
## 9. Methodology Notes
### Test Environment
- **Host OS:** Debian (Linux 6.1.0-42-amd64)
- **CPU:** Intel Xeon Silver 4210R @ 2.40GHz (Cascade Lake)
- **KVM:** `/dev/kvm` with user `karl` in group `kvm`
- **Firecracker:** Downloaded from GitHub releases, not jailed (bare process)
- **No jailer:** Tests run without the jailer for apples-to-apples VMM comparison
### What's Measured
- **Wall clock time:** `date +%s%N` before FC process start to detection of "Rebooting in" in serial output
- **Kernel internal time:** Extracted from kernel log timestamps (`[0.912xxx]` before "Rebooting in")
- **RSS:** `ps -p PID -o rss=` captured during VM execution
- **VMM overhead:** Time from process start to InstanceStart API return
### Caveats
1. **No rootfs:** Kernel panics at VFS mount. This measures pure boot, not a complete VM startup with userspace.
2. **i8042 timeout:** The default kernel (4.14.174) spends ~500ms probing the PS/2 keyboard controller. This is a kernel config issue, not a VMM issue. A custom kernel with `CONFIG_SERIO_I8042=n` would eliminate this.
3. **Serial output buffering:** Firecracker's serial port occasionally hits `WouldBlock` errors, which may slightly affect kernel timing (serial I/O blocks the vCPU when the buffer fills).
4. **No huge page pre-allocation:** Tests use default THP (Transparent Huge Pages). Pre-allocating huge pages would reduce memory allocation latency.
5. **Both kernels identical:** The "official" Firecracker kernel and `vmlinux-4.14` symlink point to the same 21MB binary (vmlinux-4.14.174).
### Kernel Boot Timeline (annotated)
```
0ms FC process starts
8ms API socket ready
22ms Kernel loaded (PUT /boot-source)
33ms Machine configured (PUT /machine-config)
80ms VM running (PUT /actions InstanceStart)
┌─── Kernel execution begins ───┐
~84ms │ Memory init, e820 map │
~84ms │ KVM hypervisor detected │
~84ms │ kvm-clock initialized │
~88ms │ SMP init, CPU0 identified │
~113ms │ devtmpfs, clocksource │
~150ms │ Network stack init │
~176ms │ Serial driver registered │
~188ms │ i8042 probe begins │ ← 500ms stall
~464ms │ i8042 KBD port registered │
~976ms │ i8042 keyboard input created │ ← i8042 probe complete
~980ms │ VFS: Cannot open root device │
~985ms │ Kernel panic │
~993ms │ "Rebooting in 1 seconds.." │
└────────────────────────────────┘
~1130ms Serial output flushed, process exits
```
---
## Raw Data Files
All raw benchmark data is stored in `/tmp/fc-bench-results/`:
- `boot-times-official.txt` — 10 iterations of wall-clock + kernel times
- `precise-boot-times.txt` — 10 iterations with --no-api mode
- `memory-official.txt` — RSS/VSZ for 128/256/512 MB guest sizes
- `smaps-detail-{128,256,512}.txt` — Detailed memory maps
- `status-official-{128,256,512}.txt` — /proc/PID/status snapshots
- `kernel-output-official.txt` — Full kernel serial output
---
*Generated by automated benchmark suite, 2026-03-08*