Volt VMM (Neutron Stardust): source-available under AGPSL v5.0

KVM-based microVMM for the Volt platform: - Sub-second VM boot times - Minimal memory footprint - Landlock LSM + seccomp security - Virtio device support - Custom kernel management Copyright (c) Armored Gates LLC. All rights reserved. Licensed under AGPSL v5.0
2026-03-21 01:04:35 -05:00
commit 40ed108dd5
143 changed files with 50300 additions and 0 deletions
--- a/designs/networkd-virtio-net.md
+++ b/designs/networkd-virtio-net.md
@@ -0,0 +1,302 @@
+# systemd-networkd Enhanced virtio-net
+
+## Overview
+
+This design enhances Volt's virtio-net implementation by integrating with systemd-networkd for declarative, lifecycle-managed network configuration. Instead of Volt manually creating/configuring TAP devices, networkd manages them declaratively.
+
+## Architecture
+
+```
+┌─────────────────────────────────────────────────────────────────────┐
+│                         systemd-networkd                             │
+│  ┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐   │
+│  │ volt-vmm-br0    │  │ vm-{uuid}.netdev │  │ vm-{uuid}.network│   │
+│  │ (.netdev bridge) │  │ (TAP definition) │  │ (bridge attach)  │   │
+│  └────────┬─────────┘  └────────┬─────────┘  └────────┬─────────┘   │
+│           │                     │                     │              │
+│           └─────────────────────┼─────────────────────┘              │
+│                                 ▼                                    │
+│                        ┌───────────────┐                             │
+│                        │    br0        │ ◄── Unified bridge          │
+│                        │  (bridge)     │     (VMs + Voltainer)       │
+│                        └───────┬───────┘                             │
+│                                │                                     │
+│              ┌─────────────────┼─────────────────┐                   │
+│              ▼                 ▼                 ▼                   │
+│        ┌─────────┐       ┌─────────┐       ┌─────────┐               │
+│        │ tap0    │       │ veth0   │       │ tap1    │               │
+│        │ (VM-1)  │       │ (cont.) │       │ (VM-2)  │               │
+│        └────┬────┘       └────┬────┘       └────┬────┘               │
+└─────────────┼────────────────┼────────────────┼─────────────────────┘
+              │                │                │
+              ▼                ▼                ▼
+        ┌─────────┐       ┌─────────┐      ┌─────────┐
+        │Volt│       │Voltainer│      │Volt│
+        │  VM-1   │       │Container│      │  VM-2   │
+        └─────────┘       └─────────┘      └─────────┘
+```
+
+## Benefits
+
+1. **Declarative Configuration**: Network topology defined in unit files, version-controllable
+2. **Automatic Cleanup**: systemd removes TAP devices when VM exits
+3. **Lifecycle Integration**: TAP created before VM starts, destroyed after
+4. **Unified Networking**: VMs and Voltainer containers share the same bridge infrastructure
+5. **vhost-net Acceleration**: Kernel-level packet processing bypasses userspace
+6. **Predictable Naming**: TAP names derived from VM UUID
+
+## Components
+
+### 1. Bridge Infrastructure (One-time Setup)
+
+```ini
+# /etc/systemd/network/10-volt-vmm-br0.netdev
+[NetDev]
+Name=br0
+Kind=bridge
+MACAddress=52:54:00:00:00:01
+
+[Bridge]
+STP=false
+ForwardDelaySec=0
+```
+
+```ini
+# /etc/systemd/network/10-volt-vmm-br0.network
+[Match]
+Name=br0
+
+[Network]
+Address=10.42.0.1/24
+IPForward=yes
+IPMasquerade=both
+ConfigureWithoutCarrier=yes
+```
+
+### 2. Per-VM TAP Template
+
+Volt generates these dynamically:
+
+```ini
+# /run/systemd/network/50-vm-{uuid}.netdev
+[NetDev]
+Name=tap-{short_uuid}
+Kind=tap
+MACAddress=none
+
+[Tap]
+User=root
+Group=root
+VNetHeader=true
+MultiQueue=true
+PacketInfo=false
+```
+
+```ini
+# /run/systemd/network/50-vm-{uuid}.network
+[Match]
+Name=tap-{short_uuid}
+
+[Network]
+Bridge=br0
+ConfigureWithoutCarrier=yes
+```
+
+### 3. vhost-net Acceleration
+
+vhost-net offloads packet processing to the kernel:
+
+```
+┌─────────────────────────────────────────────────┐
+│                   Guest VM                       │
+│  ┌─────────────────────────────────────────┐    │
+│  │           virtio-net driver              │    │
+│  └─────────────────┬───────────────────────┘    │
+└───────────────────┬┼────────────────────────────┘
+                    ││
+         ┌──────────┘│
+         │           │     KVM Exit (rare)
+         ▼           ▼
+┌────────────────────────────────────────────────┐
+│              vhost-net (kernel)                 │
+│                                                 │
+│  - Processes virtqueue directly in kernel       │
+│  - Zero-copy between TAP and guest memory       │
+│  - Avoids userspace context switches            │
+│  - ~30-50% throughput improvement               │
+└────────────────────┬───────────────────────────┘
+                     │
+                     ▼
+              ┌─────────────┐
+              │ TAP device  │
+              └─────────────┘
+```
+
+**Without vhost-net:**
+```
+Guest → KVM exit → QEMU/Volt userspace → syscall → TAP → kernel → network
+```
+
+**With vhost-net:**
+```
+Guest → vhost-net (kernel) → TAP → network
+```
+
+## Integration with Voltainer
+
+Both Volt VMs and Voltainer containers connect to the same bridge:
+
+### Voltainer Network Zone
+
+```yaml
+# /etc/voltainer/network/zone-default.yaml
+kind: NetworkZone
+name: default
+bridge: br0
+subnet: 10.42.0.0/24
+gateway: 10.42.0.1
+dhcp:
+  enabled: true
+  range: 10.42.0.100-10.42.0.254
+```
+
+### Volt VM Allocation
+
+VMs get static IPs from a reserved range (10.42.0.2-10.42.0.99):
+
+```yaml
+network:
+  - zone: default
+    mac: "52:54:00:ab:cd:ef"
+    ipv4: "10.42.0.10/24"
+```
+
+## File Locations
+
+| File Type | Location | Persistence |
+|-----------|----------|-------------|
+| Bridge .netdev/.network | `/etc/systemd/network/` | Permanent |
+| VM TAP .netdev/.network | `/run/systemd/network/` | Runtime only |
+| Voltainer zone config | `/etc/voltainer/network/` | Permanent |
+| vhost-net module | Kernel built-in | N/A |
+
+## Lifecycle
+
+### VM Start
+
+1. Volt generates `.netdev` and `.network` in `/run/systemd/network/`
+2. `networkctl reload` triggers networkd to create TAP
+3. Wait for TAP interface to appear (`networkctl status tap-XXX`)
+4. Open TAP fd with O_RDWR
+5. Enable vhost-net via `/dev/vhost-net` ioctl
+6. Boot VM with virtio-net using the TAP fd
+
+### VM Stop
+
+1. Close vhost-net and TAP file descriptors
+2. Delete `.netdev` and `.network` from `/run/systemd/network/`
+3. `networkctl reload` triggers cleanup
+4. TAP interface automatically removed
+
+## vhost-net Setup Sequence
+
+```c
+// 1. Open vhost-net device
+int vhost_fd = open("/dev/vhost-net", O_RDWR);
+
+// 2. Set owner (associate with TAP)
+ioctl(vhost_fd, VHOST_SET_OWNER, 0);
+
+// 3. Set memory region table
+struct vhost_memory *mem = ...;  // Guest memory regions
+ioctl(vhost_fd, VHOST_SET_MEM_TABLE, mem);
+
+// 4. Set vring info for each queue (RX and TX)
+struct vhost_vring_state state = { .index = 0, .num = queue_size };
+ioctl(vhost_fd, VHOST_SET_VRING_NUM, &state);
+
+struct vhost_vring_addr addr = {
+    .index = 0,
+    .desc_user_addr = desc_addr,
+    .used_user_addr = used_addr,
+    .avail_user_addr = avail_addr,
+};
+ioctl(vhost_fd, VHOST_SET_VRING_ADDR, &addr);
+
+// 5. Set kick/call eventfds
+struct vhost_vring_file kick = { .index = 0, .fd = kick_eventfd };
+ioctl(vhost_fd, VHOST_SET_VRING_KICK, &kick);
+
+struct vhost_vring_file call = { .index = 0, .fd = call_eventfd };
+ioctl(vhost_fd, VHOST_SET_VRING_CALL, &call);
+
+// 6. Associate with TAP backend
+struct vhost_vring_file backend = { .index = 0, .fd = tap_fd };
+ioctl(vhost_fd, VHOST_NET_SET_BACKEND, &backend);
+```
+
+## Performance Comparison
+
+| Metric | userspace virtio-net | vhost-net |
+|--------|---------------------|-----------|
+| Throughput (1500 MTU) | ~5 Gbps | ~8 Gbps |
+| Throughput (Jumbo 9000) | ~8 Gbps | ~15 Gbps |
+| Latency (ping) | ~200 µs | ~80 µs |
+| CPU usage | Higher | 30-50% lower |
+| Context switches | Many | Minimal |
+
+## Configuration Examples
+
+### Minimal VM with Networking
+
+```json
+{
+  "vcpus": 2,
+  "memory_mib": 512,
+  "kernel": "vmlinux",
+  "network": [{
+    "id": "eth0",
+    "mode": "networkd",
+    "bridge": "br0",
+    "mac": "52:54:00:12:34:56",
+    "vhost": true
+  }]
+}
+```
+
+### Multi-NIC VM
+
+```json
+{
+  "network": [
+    {
+      "id": "mgmt",
+      "bridge": "br-mgmt",
+      "vhost": true
+    },
+    {
+      "id": "data",
+      "bridge": "br-data",
+      "mtu": 9000,
+      "vhost": true,
+      "multiqueue": 4
+    }
+  ]
+}
+```
+
+## Error Handling
+
+| Error | Cause | Recovery |
+|-------|-------|----------|
+| TAP creation timeout | networkd slow/unresponsive | Retry with backoff, fall back to direct creation |
+| vhost-net open fails | Module not loaded | Fall back to userspace virtio-net |
+| Bridge not found | Infrastructure not set up | Create bridge or fail with clear error |
+| MAC conflict | Duplicate MAC on bridge | Auto-regenerate MAC |
+
+## Future Enhancements
+
+1. **SR-IOV Passthrough**: Direct VF assignment for bare-metal performance
+2. **DPDK Backend**: Alternative to TAP for ultra-low-latency
+3. **virtio-vhost-user**: Offload to separate process for isolation
+4. **Network Namespace Integration**: Per-VM network namespaces for isolation
--- a/designs/storage-architecture.md
+++ b/designs/storage-architecture.md
@@ -0,0 +1,757 @@
+# Stellarium: Unified Storage Architecture for Volt
+
+> *"Every byte has a home. Every home is shared. Nothing is stored twice."*
+
+## 1. Vision Statement
+
+**Stellarium** is a revolutionary storage architecture that treats storage not as isolated volumes, but as a **unified content-addressed stellar cloud** where every unique byte exists exactly once, and every VM draws from the same constellation of data.
+
+### What Makes This Revolutionary
+
+Traditional VM storage operates on a fundamental lie: that each VM has its own dedicated disk. This creates:
+- **Massive redundancy** — 1000 Debian VMs = 1000 copies of libc
+- **Slow boots** — Each VM reads its own copy of boot files
+- **Wasted IOPS** — Page cache misses everywhere
+- **Memory bloat** — Same data cached N times
+
+**Stellarium inverts this model.** Instead of VMs owning storage, **storage serves VMs through a unified content mesh**. The result:
+
+| Metric | Traditional | Stellarium | Improvement |
+|--------|-------------|------------|-------------|
+| Storage per 1000 Debian VMs | 10 TB | 12 GB + deltas | **833x** |
+| Cold boot time | 2-5s | <50ms | **40-100x** |
+| Memory efficiency | 1 GB/VM | ~50 MB shared core | **20x** |
+| IOPS for identical reads | N | 1 | **Nx** |
+
+---
+
+## 2. Architecture Overview
+
+```
+┌─────────────────────────────────────────────────────────────────────┐
+│                         STELLARIUM LAYERS                           │
+├─────────────────────────────────────────────────────────────────────┤
+│                                                                     │
+│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐                 │
+│  │  Volt  │  │  Volt  │  │  Volt  │   VM Layer      │
+│  │   microVM   │  │   microVM   │  │   microVM   │                 │
+│  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘                 │
+│         │                │                │                         │
+│  ┌──────┴────────────────┴────────────────┴──────┐                 │
+│  │              STELLARIUM VirtIO Driver          │   Driver        │
+│  │         (Memory-Mapped CAS Interface)          │   Layer         │
+│  └──────────────────────┬────────────────────────┘                 │
+│                         │                                           │
+│  ┌──────────────────────┴────────────────────────┐                 │
+│  │                NOVA-STORE                      │   Store         │
+│  │  ┌─────────┐ ┌─────────┐ ┌─────────┐         │   Layer         │
+│  │  │ TinyVol │ │ShareVol │ │ DeltaVol│         │                 │
+│  │  │ Manager │ │ Manager │ │ Manager │         │                 │
+│  │  └────┬────┘ └────┬────┘ └────┬────┘         │                 │
+│  │       └───────────┴───────────┘               │                 │
+│  │                   │                           │                 │
+│  │  ┌────────────────┴────────────────┐         │                 │
+│  │  │     PHOTON (Content Router)      │         │                 │
+│  │  │   Hot→Memory  Warm→NVMe  Cold→S3 │         │                 │
+│  │  └────────────────┬────────────────┘         │                 │
+│  └───────────────────┼──────────────────────────┘                 │
+│                      │                                             │
+│  ┌───────────────────┴──────────────────────────┐                 │
+│  │              NEBULA (CAS Core)                │   Foundation    │
+│  │                                               │   Layer         │
+│  │  ┌─────────┐ ┌─────────┐ ┌─────────────────┐ │                 │
+│  │  │  Chunk  │ │  Block  │ │   Distributed   │ │                 │
+│  │  │ Packer  │ │  Dedup  │ │   Hash Index    │ │                 │
+│  │  └─────────┘ └─────────┘ └─────────────────┘ │                 │
+│  │                                               │                 │
+│  │  ┌─────────────────────────────────────────┐ │                 │
+│  │  │      COSMIC MESH (Distributed CAS)       │ │                 │
+│  │  │   Local NVMe ←→ Cluster ←→ Object Store  │ │                 │
+│  │  └─────────────────────────────────────────┘ │                 │
+│  └───────────────────────────────────────────────┘                 │
+│                                                                     │
+└─────────────────────────────────────────────────────────────────────┘
+```
+
+### Core Components
+
+#### NEBULA: Content-Addressable Storage Core
+The foundation layer. Every piece of data is:
+- **Chunked** using content-defined chunking (CDC) with FastCDC algorithm
+- **Hashed** with BLAKE3 (256-bit, hardware-accelerated)
+- **Deduplicated** at write time via hash lookup
+- **Stored once** regardless of how many VMs reference it
+
+#### PHOTON: Intelligent Content Router
+Manages data placement across the storage hierarchy:
+- **L1 (Hot)**: Memory-mapped, instant access, boot-critical data
+- **L2 (Warm)**: NVMe, sub-millisecond, working set
+- **L3 (Cool)**: SSD, single-digit ms, recent data
+- **L4 (Cold)**: Object storage (S3/R2), archival
+
+#### NOVA-STORE: Volume Abstraction Layer
+Presents traditional block/file interfaces to VMs while backed by CAS:
+- **TinyVol**: Ultra-lightweight volumes with minimal metadata
+- **ShareVol**: Copy-on-write shared volumes
+- **DeltaVol**: Delta-encoded writable layers
+
+---
+
+## 3. Key Innovations
+
+### 3.1 Stellar Deduplication
+
+**Innovation**: Inline deduplication with zero write amplification.
+
+Traditional dedup:
+```
+Write → Buffer → Hash → Lookup → Decide → Store
+         (copy)         (wait)   (maybe copy again)
+```
+
+Stellar dedup:
+```
+Write → Hash-while-streaming → CAS Insert (atomic)
+        (no buffer needed)     (single write or reference)
+```
+
+**Implementation**:
+```rust
+struct StellarChunk {
+    hash: Blake3Hash,      // 32 bytes
+    size: u16,             // 2 bytes (max 64KB chunks)
+    refs: AtomicU32,       // 4 bytes - reference count
+    tier: AtomicU8,        // 1 byte - storage tier
+    flags: u8,             // 1 byte - compression, encryption
+    // Total: 40 bytes metadata per chunk
+}
+
+// Hash table: 40 bytes × 1B chunks = 40GB index for ~40TB unique data
+// Fits in memory on modern servers
+```
+
+### 3.2 TinyVol: Minimal Volume Overhead
+
+**Innovation**: Volumes as tiny manifest files, not pre-allocated space.
+
+```
+Traditional qcow2:   Header (512B) + L1 Table + L2 Tables + Refcount...
+                     Minimum overhead: ~512KB even for empty volume
+
+TinyVol:             Just a manifest pointing to chunks
+                     Overhead: 64 bytes base + 48 bytes per modified chunk
+                     Empty 10GB volume: 64 bytes
+                     1GB modified: 64B + (1GB/64KB × 48B) = ~768KB
+```
+
+**Structure**:
+```rust
+struct TinyVol {
+    magic: [u8; 8],        // "TINYVOL\0"
+    version: u32,
+    flags: u32,
+    base_image: Blake3Hash, // Optional parent
+    size_bytes: u64,
+    chunk_map: BTreeMap<ChunkIndex, ChunkRef>,
+}
+
+struct ChunkRef {
+    hash: Blake3Hash,       // 32 bytes
+    offset_in_vol: u48,     // 6 bytes
+    len: u16,               // 2 bytes
+    flags: u64,             // 8 bytes (CoW, compressed, etc.)
+}
+```
+
+### 3.3 ShareVol: Zero-Copy Shared Volumes
+
+**Innovation**: Multiple VMs share read paths, with instant copy-on-write.
+
+```
+Traditional Shared Storage:
+  VM1 reads /lib/libc.so → Disk read → VM1 memory
+  VM2 reads /lib/libc.so → Disk read → VM2 memory
+  (Same data read twice, stored twice in RAM)
+
+ShareVol:
+  VM1 reads /lib/libc.so → Shared mapping (already in memory)
+  VM2 reads /lib/libc.so → Same shared mapping
+  (Single read, single memory location, N consumers)
+```
+
+**Memory-Mapped CAS**:
+```rust
+// Shared content is memory-mapped once
+struct SharedMapping {
+    hash: Blake3Hash,
+    mmap_addr: *const u8,
+    mmap_len: usize,
+    vm_refs: AtomicU32,      // How many VMs reference this
+    last_access: AtomicU64,  // For eviction
+}
+
+// VMs get read-only mappings to shared content
+// Write attempts trigger CoW into TinyVol delta layer
+```
+
+### 3.4 Cosmic Packing: Small File Optimization
+
+**Innovation**: Pack small files into larger chunks without losing addressability.
+
+Problem: Millions of small files (< 4KB) waste space at chunk boundaries.
+
+Solution: **Cosmic Packs** — aggregated storage with inline index:
+
+```
+┌─────────────────────────────────────────────────┐
+│              COSMIC PACK (64KB)                 │
+├─────────────────────────────────────────────────┤
+│ Header (64B)                                    │
+│   - magic, version, entry_count                 │
+├─────────────────────────────────────────────────┤
+│ Index (variable, ~100B per entry)               │
+│   - [hash, offset, len, flags] × N              │
+├─────────────────────────────────────────────────┤
+│ Data (remaining space)                          │
+│   - Packed file contents                        │
+└─────────────────────────────────────────────────┘
+```
+
+**Benefit**: 1000 × 100-byte files = 100KB raw, but with individual addressing overhead. Cosmic Pack: single 64KB chunk, full addressability retained.
+
+### 3.5 Stellar Boot: Sub-50ms VM Start
+
+**Innovation**: Boot data is pre-staged in memory before VM starts.
+
+```
+Boot Sequence Comparison:
+
+Traditional:
+  t=0ms    VMM starts
+  t=5ms    BIOS loads
+  t=50ms   Kernel requested
+  t=100ms  Kernel loaded from disk
+  t=200ms  initrd loaded
+  t=500ms  Root FS mounted
+  t=2000ms Boot complete
+
+Stellar Boot:
+  t=-50ms  Boot manifest analyzed (during scheduling)
+  t=-25ms  Hot chunks pre-faulted to memory
+  t=0ms    VMM starts with memory-mapped boot data
+  t=5ms    Kernel executes (already in memory)
+  t=15ms   initrd processed (already in memory)
+  t=40ms   Root FS ready (ShareVol, pre-mapped)
+  t=50ms   Boot complete
+```
+
+**Boot Manifest**:
+```rust
+struct BootManifest {
+    kernel: Blake3Hash,
+    initrd: Option<Blake3Hash>,
+    root_vol: TinyVolRef,
+    
+    // Predicted hot chunks for first 100ms
+    prefetch_set: Vec<Blake3Hash>,
+    
+    // Memory layout hints
+    kernel_load_addr: u64,
+    initrd_load_addr: Option<u64>,
+}
+```
+
+### 3.6 CDN-Native Distribution: Voltainer Integration
+
+**Innovation**: Images distributed via CDN, layers indexed directly in NEBULA.
+
+```
+Traditional (Registry-based):
+  Registry API → Pull manifest → Pull layers → Extract → Overlay FS
+  (Complex protocol, copies data, registry infrastructure required)
+
+Stellarium + CDN:
+  HTTPS GET manifest → HTTPS GET missing chunks → Mount
+  (Simple HTTP, zero extraction, CDN handles global distribution)
+```
+
+**CDN-Native Architecture**:
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                    CDN-NATIVE DISTRIBUTION                       │
+├─────────────────────────────────────────────────────────────────┤
+│                                                                  │
+│  cdn.armoredgate.com/                                           │
+│  ├── manifests/                                                 │
+│  │   └── {blake3-hash}.json    ← Image/layer manifests         │
+│  └── blobs/                                                     │
+│      └── {blake3-hash}         ← Raw content chunks             │
+│                                                                  │
+│  Benefits:                                                       │
+│  ✓ No registry daemon to run                                   │
+│  ✓ No registry protocol complexity                              │
+│  ✓ Global edge caching built-in                                │
+│  ✓ Simple HTTPS GET (curl-debuggable)                          │
+│  ✓ Content-addressed = perfect cache keys                       │
+│  ✓ Dedup at CDN level (same hash = same edge cache)            │
+│                                                                  │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+**Implementation**:
+```rust
+struct CdnDistribution {
+    base_url: String,  // "https://cdn.armoredgate.com"
+    
+    async fn fetch_manifest(&self, hash: &Blake3Hash) -> Result<ImageManifest> {
+        let url = format!("{}/manifests/{}.json", self.base_url, hash);
+        let resp = reqwest::get(&url).await?;
+        Ok(resp.json().await?)
+    }
+    
+    async fn fetch_chunk(&self, hash: &Blake3Hash) -> Result<Vec<u8>> {
+        let url = format!("{}/blobs/{}", self.base_url, hash);
+        let resp = reqwest::get(&url).await?;
+        
+        // Verify content hash matches (integrity check)
+        let data = resp.bytes().await?;
+        assert_eq!(blake3::hash(&data), *hash);
+        
+        Ok(data.to_vec())
+    }
+    
+    async fn fetch_missing(&self, needed: &[Blake3Hash], local: &Nebula) -> Result<()> {
+        // Only fetch chunks we don't have locally
+        let missing: Vec<_> = needed.iter()
+            .filter(|h| !local.exists(h))
+            .collect();
+        
+        // Parallel fetch from CDN
+        futures::future::join_all(
+            missing.iter().map(|h| self.fetch_and_store(h, local))
+        ).await;
+        
+        Ok(())
+    }
+}
+
+struct VoltainerImage {
+    manifest_hash: Blake3Hash,
+    layers: Vec<LayerRef>,
+}
+
+struct LayerRef {
+    hash: Blake3Hash,           // Content hash (CDN path)
+    stellar_manifest: TinyVol,  // Direct mapping to Stellar chunks
+}
+
+// Voltainer pull = simple CDN fetch
+async fn voltainer_pull(image: &str, cdn: &CdnDistribution, nebula: &Nebula) -> Result<VoltainerImage> {
+    // 1. Resolve image name to manifest hash (local index or CDN lookup)
+    let manifest_hash = resolve_image_hash(image).await?;
+    
+    // 2. Fetch manifest from CDN
+    let manifest = cdn.fetch_manifest(&manifest_hash).await?;
+    
+    // 3. Fetch only missing chunks (dedup-aware)
+    let needed_chunks = manifest.all_chunk_hashes();
+    cdn.fetch_missing(&needed_chunks, nebula).await?;
+    
+    // 4. Image is ready - no extraction, layers ARE the storage
+    Ok(VoltainerImage::from_manifest(manifest))
+}
+```
+
+**Voltainer Integration**:
+```rust
+// Voltainer (systemd-nspawn based) uses Stellarium directly
+impl VoltainerRuntime {
+    async fn create_container(&self, image: &VoltainerImage) -> Result<Container> {
+        // Layers are already in NEBULA, just create overlay view
+        let rootfs = self.stellarium.create_overlay_view(&image.layers)?;
+        
+        // systemd-nspawn mounts the Stellarium-backed rootfs
+        let container = systemd_nspawn::Container::new()
+            .directory(&rootfs)
+            .private_network(true)
+            .boot(false)
+            .spawn()?;
+        
+        Ok(container)
+    }
+}
+```
+
+### 3.7 Memory-Storage Convergence
+
+**Innovation**: Memory and storage share the same backing, eliminating double-buffering.
+
+```
+Traditional:
+  Storage: [Block Device] → [Page Cache] → [VM Memory]
+           (data copied twice)
+
+Stellarium:
+  Unified: [CAS Memory Map] ←──────────→ [VM Memory View]
+           (single location, two views)
+```
+
+**DAX-Style Direct Access**:
+```rust
+// VM sees storage as memory-mapped region
+struct StellarBlockDevice {
+    volumes: Vec<TinyVol>,
+    
+    fn handle_read(&self, offset: u64, len: u32) -> &[u8] {
+        let chunk = self.volumes[0].chunk_at(offset);
+        let mapping = photon.get_or_map(chunk.hash);
+        &mapping[chunk.local_offset..][..len]
+    }
+    
+    // Writes go to delta layer
+    fn handle_write(&mut self, offset: u64, data: &[u8]) {
+        self.volumes[0].write_delta(offset, data);
+    }
+}
+```
+
+---
+
+## 4. Density Targets
+
+### Storage Efficiency
+
+| Scenario | Traditional | Stellarium | Target |
+|----------|-------------|------------|--------|
+| 1000 Ubuntu 22.04 VMs | 2.5 TB | 2.8 GB shared + 10 MB/VM avg delta | **99.6% reduction** |
+| 10000 Python app VMs (same base) | 25 TB | 2.8 GB + 5 MB/VM | **99.8% reduction** |
+| Mixed workload (100 unique bases) | 2.5 TB | 50 GB shared + 20 MB/VM avg | **94% reduction** |
+
+### Memory Efficiency
+
+| Component | Traditional | Stellarium | Target |
+|-----------|-------------|------------|--------|
+| Kernel (per VM) | 8-15 MB | Shared (~0 marginal) | **99%+ reduction** |
+| libc (per VM) | 2 MB | Shared | **99%+ reduction** |
+| Page cache duplication | High | Zero | **100% reduction** |
+| Effective RAM per VM | 512 MB - 1 GB | 50-100 MB unique | **5-10x improvement** |
+
+### Performance
+
+| Metric | Traditional | Stellarium Target |
+|--------|-------------|-------------------|
+| Cold boot (minimal VM) | 500ms - 2s | < 50ms |
+| Warm boot (pre-cached) | 100-500ms | < 20ms |
+| Clone time (full copy) | 10-60s | < 1ms (CoW instant) |
+| Dedup ratio (homogeneous) | N/A | 50:1 to 1000:1 |
+| IOPS (deduplicated reads) | N | 1 |
+
+### Density Goals
+
+| Scenario | Traditional (64GB RAM host) | Stellarium Target |
+|----------|------------------------------|-------------------|
+| Minimal VMs (32MB each) | ~1000 | 5000-10000 |
+| Small VMs (128MB each) | ~400 | 2000-4000 |
+| Medium VMs (512MB each) | ~100 | 500-1000 |
+| Storage per 10K VMs | 10-50 TB | 10-50 GB |
+
+---
+
+## 5. Integration with Volt VMM
+
+### Boot Path Integration
+
+```rust
+// Volt VMM integration
+impl VoltVmm {
+    fn boot_with_stellarium(&mut self, manifest: BootManifest) -> Result<()> {
+        // 1. Pre-fault boot chunks to L1 (memory)
+        let prefetch_handle = stellarium.prefetch(&manifest.prefetch_set);
+        
+        // 2. Set up memory-mapped kernel
+        let kernel_mapping = stellarium.map_readonly(&manifest.kernel);
+        self.load_kernel_direct(kernel_mapping);
+        
+        // 3. Set up memory-mapped initrd (if present)
+        if let Some(initrd) = &manifest.initrd {
+            let initrd_mapping = stellarium.map_readonly(initrd);
+            self.load_initrd_direct(initrd_mapping);
+        }
+        
+        // 4. Configure VirtIO-Stellar device
+        self.add_stellar_blk(manifest.root_vol)?;
+        
+        // 5. Ensure prefetch complete
+        prefetch_handle.wait();
+        
+        // 6. Boot
+        self.start()
+    }
+}
+```
+
+### VirtIO-Stellar Driver
+
+Custom VirtIO block device that speaks Stellarium natively:
+
+```rust
+struct VirtioStellarConfig {
+    // Standard virtio-blk compatible
+    capacity: u64,
+    size_max: u32,
+    seg_max: u32,
+    
+    // Stellarium extensions
+    stellar_features: u64,      // STELLAR_F_SHAREVOL, STELLAR_F_DEDUP, etc.
+    vol_hash: Blake3Hash,       // Volume identity
+    shared_regions: u32,        // Number of pre-shared regions
+}
+
+// Request types (extends standard virtio-blk)
+enum StellarRequest {
+    Read { sector: u64, len: u32 },
+    Write { sector: u64, data: Vec<u8> },
+    
+    // Stellarium extensions
+    MapShared { hash: Blake3Hash },    // Map shared chunk directly
+    QueryDedup { sector: u64 },        // Check if sector is deduplicated
+    Prefetch { sectors: Vec<u64> },    // Hint upcoming reads
+}
+```
+
+### Snapshot and Restore
+
+```rust
+// Instant snapshots via TinyVol CoW
+fn snapshot_vm(vm: &VoltVm) -> VmSnapshot {
+    VmSnapshot {
+        // Memory as Stellar chunks
+        memory_chunks: stellarium.chunk_memory(vm.memory_region()),
+        
+        // Volume is already CoW - just reference
+        root_vol: vm.root_vol.clone_manifest(),
+        
+        // CPU state is tiny
+        cpu_state: vm.save_cpu_state(),
+    }
+}
+
+// Restore from snapshot
+fn restore_vm(snapshot: &VmSnapshot) -> VoltVm {
+    let mut vm = VoltVm::new();
+    
+    // Memory is mapped directly from Stellar chunks
+    vm.map_memory_from_stellar(&snapshot.memory_chunks);
+    
+    // Volume manifest is loaded (no data copy)
+    vm.attach_vol(snapshot.root_vol.clone());
+    
+    // Restore CPU state
+    vm.restore_cpu_state(&snapshot.cpu_state);
+    
+    vm
+}
+```
+
+### Live Migration with Dedup
+
+```rust
+// Only transfer unique chunks during migration
+async fn migrate_vm(vm: &VoltVm, target: &NodeAddr) -> Result<()> {
+    // 1. Get list of chunks VM references
+    let vm_chunks = vm.collect_chunk_refs();
+    
+    // 2. Query target for chunks it already has
+    let target_has = target.query_chunks(&vm_chunks).await?;
+    
+    // 3. Transfer only missing chunks
+    let missing = vm_chunks.difference(&target_has);
+    target.receive_chunks(&missing).await?;
+    
+    // 4. Transfer tiny metadata
+    target.receive_manifest(&vm.root_vol).await?;
+    target.receive_memory_manifest(&vm.memory_chunks).await?;
+    
+    // 5. Final state sync and switchover
+    vm.pause();
+    target.receive_final_state(vm.cpu_state()).await?;
+    target.resume().await?;
+    
+    Ok(())
+}
+```
+
+---
+
+## 6. Implementation Priorities
+
+### Phase 1: Foundation (Month 1-2)
+**Goal**: Core CAS and basic volume support
+
+1. **NEBULA Core**
+   - BLAKE3 hashing with SIMD acceleration
+   - In-memory hash table (robin hood hashing)
+   - Basic chunk storage (local NVMe)
+   - Reference counting
+   
+2. **TinyVol v1**
+   - Manifest format
+   - Read-only volume mounting
+   - Basic CoW writes
+
+3. **VirtIO-Stellar Driver**
+   - Basic block interface
+   - Integration with Volt
+
+**Deliverable**: Boot a VM from Stellarium storage
+
+### Phase 2: Deduplication (Month 2-3)
+**Goal**: Inline dedup with zero performance regression
+
+1. **Inline Deduplication**
+   - Write path with hash-first
+   - Atomic insert-or-reference
+   - Dedup metrics/reporting
+
+2. **Content-Defined Chunking**
+   - FastCDC implementation
+   - Tuned for VM workloads
+   
+3. **Base Image Sharing**
+   - ShareVol implementation
+   - Multiple VMs sharing base
+
+**Deliverable**: 10:1+ dedup ratio for homogeneous VMs
+
+### Phase 3: Performance (Month 3-4)
+**Goal**: Sub-50ms boot, memory convergence
+
+1. **PHOTON Tiering**
+   - Hot/warm/cold classification
+   - Automatic promotion/demotion
+   - Memory-mapped hot tier
+
+2. **Boot Optimization**
+   - Boot manifest analysis
+   - Prefetch implementation
+   - Zero-copy kernel loading
+
+3. **Memory-Storage Convergence**
+   - DAX-style direct access
+   - Shared page elimination
+
+**Deliverable**: <50ms cold boot, memory sharing active
+
+### Phase 4: Density (Month 4-5)
+**Goal**: 10000+ VMs per host achievable
+
+1. **Small File Packing**
+   - Cosmic Pack implementation
+   - Inline file storage
+
+2. **Aggressive Sharing**
+   - Cross-VM page dedup
+   - Kernel/library sharing
+
+3. **Memory Pressure Handling**
+   - Intelligent eviction
+   - Graceful degradation
+
+**Deliverable**: 5000+ density on 64GB host
+
+### Phase 5: Distribution (Month 5-6)
+**Goal**: Multi-node Stellarium cluster
+
+1. **Cosmic Mesh**
+   - Distributed hash index
+   - Cross-node chunk routing
+   - Consistent hashing for placement
+
+2. **Migration Optimization**
+   - Chunk pre-staging
+   - Delta transfers
+
+3. **Object Storage Backend**
+   - S3/R2 cold tier
+   - Async writeback
+
+**Deliverable**: Seamless multi-node storage
+
+### Phase 6: Voltainer + CDN Native (Month 6-7)
+**Goal**: Voltainer containers as first-class citizens, CDN-native distribution
+
+1. **CDN Distribution Layer**
+   - Manifest/chunk fetch from ArmoredGate CDN
+   - Parallel chunk retrieval
+   - Edge cache warming strategies
+
+2. **Voltainer Integration**
+   - Direct Stellarium mount for systemd-nspawn
+   - Shared layers between Voltainer containers and Volt VMs
+   - Unified storage for both runtimes
+
+3. **Layer Mapping**
+   - Direct layer registration in NEBULA
+   - No extraction needed
+   - Content-addressed = perfect CDN cache keys
+
+**Deliverable**: Voltainer containers boot in <100ms, unified with VM storage
+
+---
+
+## 7. Name: **Stellarium**
+
+### Why Stellarium?
+
+Continuing the cosmic theme of **Stardust** (cluster) and **Volt** (VMM):
+
+- **Stellar** = Star-like, exceptional, relating to stars
+- **-arium** = A place for (like aquarium, planetarium)
+- **Stellarium** = "A place for stars" — where all your VM's data lives
+
+### Component Names (Cosmic Theme)
+
+| Component | Name | Meaning |
+|-----------|------|---------|
+| CAS Core | **NEBULA** | Birthplace of stars, cloud of shared matter |
+| Content Router | **PHOTON** | Light-speed data movement |
+| Chunk Packer | **Cosmic Pack** | Aggregating cosmic dust |
+| Volume Manager | **Nova-Store** | Connects to Volt |
+| Distributed Mesh | **Cosmic Mesh** | Interconnected universe |
+| Boot Optimizer | **Stellar Boot** | Star-like speed |
+| Small File Pack | **Cosmic Dust** | Tiny particles aggregated |
+
+### Taglines
+
+- *"Every byte a star. Every star shared."*
+- *"The storage that makes density possible."*
+- *"Where VMs find their data, instantly."*
+
+---
+
+## 8. Summary
+
+**Stellarium** transforms storage from a per-VM liability into a shared asset. By treating all data as content-addressed chunks in a unified namespace:
+
+1. **Deduplication becomes free** — No extra work, it's the storage model
+2. **Sharing becomes default** — VMs reference, not copy
+3. **Boot becomes instant** — Data is pre-positioned
+4. **Density becomes extreme** — 10-100x more VMs per host
+5. **Migration becomes trivial** — Only ship unique data
+
+Combined with Volt's minimal VMM overhead, Stellarium enables the original ArmoredContainers vision: **VM isolation at container density, with VM security guarantees**.
+
+### The Stellarium Promise
+
+> On a 64GB host with 2TB NVMe:
+> - **10,000+ microVMs** running simultaneously
+> - **50GB total storage** for 10,000 Debian-based workloads
+> - **<50ms** boot time for any VM
+> - **Instant** cloning and snapshots
+> - **Seamless** live migration
+
+This isn't incremental improvement. This is a **new storage paradigm** for the microVM era.
+
+---
+
+*Stellarium: The stellar storage for stellar density.*