KVM-based microVMM for the Volt platform: - Sub-second VM boot times - Minimal memory footprint - Landlock LSM + seccomp security - Virtio device support - Custom kernel management Copyright (c) Armored Gates LLC. All rights reserved. Licensed under AGPSL v5.0
10 KiB
10 KiB
systemd-networkd Enhanced virtio-net
Overview
This design enhances Volt's virtio-net implementation by integrating with systemd-networkd for declarative, lifecycle-managed network configuration. Instead of Volt manually creating/configuring TAP devices, networkd manages them declaratively.
Architecture
┌─────────────────────────────────────────────────────────────────────┐
│ systemd-networkd │
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │
│ │ volt-vmm-br0 │ │ vm-{uuid}.netdev │ │ vm-{uuid}.network│ │
│ │ (.netdev bridge) │ │ (TAP definition) │ │ (bridge attach) │ │
│ └────────┬─────────┘ └────────┬─────────┘ └────────┬─────────┘ │
│ │ │ │ │
│ └─────────────────────┼─────────────────────┘ │
│ ▼ │
│ ┌───────────────┐ │
│ │ br0 │ ◄── Unified bridge │
│ │ (bridge) │ (VMs + Voltainer) │
│ └───────┬───────┘ │
│ │ │
│ ┌─────────────────┼─────────────────┐ │
│ ▼ ▼ ▼ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ tap0 │ │ veth0 │ │ tap1 │ │
│ │ (VM-1) │ │ (cont.) │ │ (VM-2) │ │
│ └────┬────┘ └────┬────┘ └────┬────┘ │
└─────────────┼────────────────┼────────────────┼─────────────────────┘
│ │ │
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│Volt│ │Voltainer│ │Volt│
│ VM-1 │ │Container│ │ VM-2 │
└─────────┘ └─────────┘ └─────────┘
Benefits
- Declarative Configuration: Network topology defined in unit files, version-controllable
- Automatic Cleanup: systemd removes TAP devices when VM exits
- Lifecycle Integration: TAP created before VM starts, destroyed after
- Unified Networking: VMs and Voltainer containers share the same bridge infrastructure
- vhost-net Acceleration: Kernel-level packet processing bypasses userspace
- Predictable Naming: TAP names derived from VM UUID
Components
1. Bridge Infrastructure (One-time Setup)
# /etc/systemd/network/10-volt-vmm-br0.netdev
[NetDev]
Name=br0
Kind=bridge
MACAddress=52:54:00:00:00:01
[Bridge]
STP=false
ForwardDelaySec=0
# /etc/systemd/network/10-volt-vmm-br0.network
[Match]
Name=br0
[Network]
Address=10.42.0.1/24
IPForward=yes
IPMasquerade=both
ConfigureWithoutCarrier=yes
2. Per-VM TAP Template
Volt generates these dynamically:
# /run/systemd/network/50-vm-{uuid}.netdev
[NetDev]
Name=tap-{short_uuid}
Kind=tap
MACAddress=none
[Tap]
User=root
Group=root
VNetHeader=true
MultiQueue=true
PacketInfo=false
# /run/systemd/network/50-vm-{uuid}.network
[Match]
Name=tap-{short_uuid}
[Network]
Bridge=br0
ConfigureWithoutCarrier=yes
3. vhost-net Acceleration
vhost-net offloads packet processing to the kernel:
┌─────────────────────────────────────────────────┐
│ Guest VM │
│ ┌─────────────────────────────────────────┐ │
│ │ virtio-net driver │ │
│ └─────────────────┬───────────────────────┘ │
└───────────────────┬┼────────────────────────────┘
││
┌──────────┘│
│ │ KVM Exit (rare)
▼ ▼
┌────────────────────────────────────────────────┐
│ vhost-net (kernel) │
│ │
│ - Processes virtqueue directly in kernel │
│ - Zero-copy between TAP and guest memory │
│ - Avoids userspace context switches │
│ - ~30-50% throughput improvement │
└────────────────────┬───────────────────────────┘
│
▼
┌─────────────┐
│ TAP device │
└─────────────┘
Without vhost-net:
Guest → KVM exit → QEMU/Volt userspace → syscall → TAP → kernel → network
With vhost-net:
Guest → vhost-net (kernel) → TAP → network
Integration with Voltainer
Both Volt VMs and Voltainer containers connect to the same bridge:
Voltainer Network Zone
# /etc/voltainer/network/zone-default.yaml
kind: NetworkZone
name: default
bridge: br0
subnet: 10.42.0.0/24
gateway: 10.42.0.1
dhcp:
enabled: true
range: 10.42.0.100-10.42.0.254
Volt VM Allocation
VMs get static IPs from a reserved range (10.42.0.2-10.42.0.99):
network:
- zone: default
mac: "52:54:00:ab:cd:ef"
ipv4: "10.42.0.10/24"
File Locations
| File Type | Location | Persistence |
|---|---|---|
| Bridge .netdev/.network | /etc/systemd/network/ |
Permanent |
| VM TAP .netdev/.network | /run/systemd/network/ |
Runtime only |
| Voltainer zone config | /etc/voltainer/network/ |
Permanent |
| vhost-net module | Kernel built-in | N/A |
Lifecycle
VM Start
- Volt generates
.netdevand.networkin/run/systemd/network/ networkctl reloadtriggers networkd to create TAP- Wait for TAP interface to appear (
networkctl status tap-XXX) - Open TAP fd with O_RDWR
- Enable vhost-net via
/dev/vhost-netioctl - Boot VM with virtio-net using the TAP fd
VM Stop
- Close vhost-net and TAP file descriptors
- Delete
.netdevand.networkfrom/run/systemd/network/ networkctl reloadtriggers cleanup- TAP interface automatically removed
vhost-net Setup Sequence
// 1. Open vhost-net device
int vhost_fd = open("/dev/vhost-net", O_RDWR);
// 2. Set owner (associate with TAP)
ioctl(vhost_fd, VHOST_SET_OWNER, 0);
// 3. Set memory region table
struct vhost_memory *mem = ...; // Guest memory regions
ioctl(vhost_fd, VHOST_SET_MEM_TABLE, mem);
// 4. Set vring info for each queue (RX and TX)
struct vhost_vring_state state = { .index = 0, .num = queue_size };
ioctl(vhost_fd, VHOST_SET_VRING_NUM, &state);
struct vhost_vring_addr addr = {
.index = 0,
.desc_user_addr = desc_addr,
.used_user_addr = used_addr,
.avail_user_addr = avail_addr,
};
ioctl(vhost_fd, VHOST_SET_VRING_ADDR, &addr);
// 5. Set kick/call eventfds
struct vhost_vring_file kick = { .index = 0, .fd = kick_eventfd };
ioctl(vhost_fd, VHOST_SET_VRING_KICK, &kick);
struct vhost_vring_file call = { .index = 0, .fd = call_eventfd };
ioctl(vhost_fd, VHOST_SET_VRING_CALL, &call);
// 6. Associate with TAP backend
struct vhost_vring_file backend = { .index = 0, .fd = tap_fd };
ioctl(vhost_fd, VHOST_NET_SET_BACKEND, &backend);
Performance Comparison
| Metric | userspace virtio-net | vhost-net |
|---|---|---|
| Throughput (1500 MTU) | ~5 Gbps | ~8 Gbps |
| Throughput (Jumbo 9000) | ~8 Gbps | ~15 Gbps |
| Latency (ping) | ~200 µs | ~80 µs |
| CPU usage | Higher | 30-50% lower |
| Context switches | Many | Minimal |
Configuration Examples
Minimal VM with Networking
{
"vcpus": 2,
"memory_mib": 512,
"kernel": "vmlinux",
"network": [{
"id": "eth0",
"mode": "networkd",
"bridge": "br0",
"mac": "52:54:00:12:34:56",
"vhost": true
}]
}
Multi-NIC VM
{
"network": [
{
"id": "mgmt",
"bridge": "br-mgmt",
"vhost": true
},
{
"id": "data",
"bridge": "br-data",
"mtu": 9000,
"vhost": true,
"multiqueue": 4
}
]
}
Error Handling
| Error | Cause | Recovery |
|---|---|---|
| TAP creation timeout | networkd slow/unresponsive | Retry with backoff, fall back to direct creation |
| vhost-net open fails | Module not loaded | Fall back to userspace virtio-net |
| Bridge not found | Infrastructure not set up | Create bridge or fail with clear error |
| MAC conflict | Duplicate MAC on bridge | Auto-regenerate MAC |
Future Enhancements
- SR-IOV Passthrough: Direct VF assignment for bare-metal performance
- DPDK Backend: Alternative to TAP for ultra-low-latency
- virtio-vhost-user: Offload to separate process for isolation
- Network Namespace Integration: Per-VM network namespaces for isolation