Files
volt-vmm/designs/networkd-virtio-net.md
Karl Clinger 40ed108dd5 Volt VMM (Neutron Stardust): source-available under AGPSL v5.0
KVM-based microVMM for the Volt platform:
- Sub-second VM boot times
- Minimal memory footprint
- Landlock LSM + seccomp security
- Virtio device support
- Custom kernel management

Copyright (c) Armored Gates LLC. All rights reserved.
Licensed under AGPSL v5.0
2026-03-21 01:04:35 -05:00

303 lines
10 KiB
Markdown

# systemd-networkd Enhanced virtio-net
## Overview
This design enhances Volt's virtio-net implementation by integrating with systemd-networkd for declarative, lifecycle-managed network configuration. Instead of Volt manually creating/configuring TAP devices, networkd manages them declaratively.
## Architecture
```
┌─────────────────────────────────────────────────────────────────────┐
│ systemd-networkd │
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │
│ │ volt-vmm-br0 │ │ vm-{uuid}.netdev │ │ vm-{uuid}.network│ │
│ │ (.netdev bridge) │ │ (TAP definition) │ │ (bridge attach) │ │
│ └────────┬─────────┘ └────────┬─────────┘ └────────┬─────────┘ │
│ │ │ │ │
│ └─────────────────────┼─────────────────────┘ │
│ ▼ │
│ ┌───────────────┐ │
│ │ br0 │ ◄── Unified bridge │
│ │ (bridge) │ (VMs + Voltainer) │
│ └───────┬───────┘ │
│ │ │
│ ┌─────────────────┼─────────────────┐ │
│ ▼ ▼ ▼ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ tap0 │ │ veth0 │ │ tap1 │ │
│ │ (VM-1) │ │ (cont.) │ │ (VM-2) │ │
│ └────┬────┘ └────┬────┘ └────┬────┘ │
└─────────────┼────────────────┼────────────────┼─────────────────────┘
│ │ │
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│Volt│ │Voltainer│ │Volt│
│ VM-1 │ │Container│ │ VM-2 │
└─────────┘ └─────────┘ └─────────┘
```
## Benefits
1. **Declarative Configuration**: Network topology defined in unit files, version-controllable
2. **Automatic Cleanup**: systemd removes TAP devices when VM exits
3. **Lifecycle Integration**: TAP created before VM starts, destroyed after
4. **Unified Networking**: VMs and Voltainer containers share the same bridge infrastructure
5. **vhost-net Acceleration**: Kernel-level packet processing bypasses userspace
6. **Predictable Naming**: TAP names derived from VM UUID
## Components
### 1. Bridge Infrastructure (One-time Setup)
```ini
# /etc/systemd/network/10-volt-vmm-br0.netdev
[NetDev]
Name=br0
Kind=bridge
MACAddress=52:54:00:00:00:01
[Bridge]
STP=false
ForwardDelaySec=0
```
```ini
# /etc/systemd/network/10-volt-vmm-br0.network
[Match]
Name=br0
[Network]
Address=10.42.0.1/24
IPForward=yes
IPMasquerade=both
ConfigureWithoutCarrier=yes
```
### 2. Per-VM TAP Template
Volt generates these dynamically:
```ini
# /run/systemd/network/50-vm-{uuid}.netdev
[NetDev]
Name=tap-{short_uuid}
Kind=tap
MACAddress=none
[Tap]
User=root
Group=root
VNetHeader=true
MultiQueue=true
PacketInfo=false
```
```ini
# /run/systemd/network/50-vm-{uuid}.network
[Match]
Name=tap-{short_uuid}
[Network]
Bridge=br0
ConfigureWithoutCarrier=yes
```
### 3. vhost-net Acceleration
vhost-net offloads packet processing to the kernel:
```
┌─────────────────────────────────────────────────┐
│ Guest VM │
│ ┌─────────────────────────────────────────┐ │
│ │ virtio-net driver │ │
│ └─────────────────┬───────────────────────┘ │
└───────────────────┬┼────────────────────────────┘
││
┌──────────┘│
│ │ KVM Exit (rare)
▼ ▼
┌────────────────────────────────────────────────┐
│ vhost-net (kernel) │
│ │
│ - Processes virtqueue directly in kernel │
│ - Zero-copy between TAP and guest memory │
│ - Avoids userspace context switches │
│ - ~30-50% throughput improvement │
└────────────────────┬───────────────────────────┘
┌─────────────┐
│ TAP device │
└─────────────┘
```
**Without vhost-net:**
```
Guest → KVM exit → QEMU/Volt userspace → syscall → TAP → kernel → network
```
**With vhost-net:**
```
Guest → vhost-net (kernel) → TAP → network
```
## Integration with Voltainer
Both Volt VMs and Voltainer containers connect to the same bridge:
### Voltainer Network Zone
```yaml
# /etc/voltainer/network/zone-default.yaml
kind: NetworkZone
name: default
bridge: br0
subnet: 10.42.0.0/24
gateway: 10.42.0.1
dhcp:
enabled: true
range: 10.42.0.100-10.42.0.254
```
### Volt VM Allocation
VMs get static IPs from a reserved range (10.42.0.2-10.42.0.99):
```yaml
network:
- zone: default
mac: "52:54:00:ab:cd:ef"
ipv4: "10.42.0.10/24"
```
## File Locations
| File Type | Location | Persistence |
|-----------|----------|-------------|
| Bridge .netdev/.network | `/etc/systemd/network/` | Permanent |
| VM TAP .netdev/.network | `/run/systemd/network/` | Runtime only |
| Voltainer zone config | `/etc/voltainer/network/` | Permanent |
| vhost-net module | Kernel built-in | N/A |
## Lifecycle
### VM Start
1. Volt generates `.netdev` and `.network` in `/run/systemd/network/`
2. `networkctl reload` triggers networkd to create TAP
3. Wait for TAP interface to appear (`networkctl status tap-XXX`)
4. Open TAP fd with O_RDWR
5. Enable vhost-net via `/dev/vhost-net` ioctl
6. Boot VM with virtio-net using the TAP fd
### VM Stop
1. Close vhost-net and TAP file descriptors
2. Delete `.netdev` and `.network` from `/run/systemd/network/`
3. `networkctl reload` triggers cleanup
4. TAP interface automatically removed
## vhost-net Setup Sequence
```c
// 1. Open vhost-net device
int vhost_fd = open("/dev/vhost-net", O_RDWR);
// 2. Set owner (associate with TAP)
ioctl(vhost_fd, VHOST_SET_OWNER, 0);
// 3. Set memory region table
struct vhost_memory *mem = ...; // Guest memory regions
ioctl(vhost_fd, VHOST_SET_MEM_TABLE, mem);
// 4. Set vring info for each queue (RX and TX)
struct vhost_vring_state state = { .index = 0, .num = queue_size };
ioctl(vhost_fd, VHOST_SET_VRING_NUM, &state);
struct vhost_vring_addr addr = {
.index = 0,
.desc_user_addr = desc_addr,
.used_user_addr = used_addr,
.avail_user_addr = avail_addr,
};
ioctl(vhost_fd, VHOST_SET_VRING_ADDR, &addr);
// 5. Set kick/call eventfds
struct vhost_vring_file kick = { .index = 0, .fd = kick_eventfd };
ioctl(vhost_fd, VHOST_SET_VRING_KICK, &kick);
struct vhost_vring_file call = { .index = 0, .fd = call_eventfd };
ioctl(vhost_fd, VHOST_SET_VRING_CALL, &call);
// 6. Associate with TAP backend
struct vhost_vring_file backend = { .index = 0, .fd = tap_fd };
ioctl(vhost_fd, VHOST_NET_SET_BACKEND, &backend);
```
## Performance Comparison
| Metric | userspace virtio-net | vhost-net |
|--------|---------------------|-----------|
| Throughput (1500 MTU) | ~5 Gbps | ~8 Gbps |
| Throughput (Jumbo 9000) | ~8 Gbps | ~15 Gbps |
| Latency (ping) | ~200 µs | ~80 µs |
| CPU usage | Higher | 30-50% lower |
| Context switches | Many | Minimal |
## Configuration Examples
### Minimal VM with Networking
```json
{
"vcpus": 2,
"memory_mib": 512,
"kernel": "vmlinux",
"network": [{
"id": "eth0",
"mode": "networkd",
"bridge": "br0",
"mac": "52:54:00:12:34:56",
"vhost": true
}]
}
```
### Multi-NIC VM
```json
{
"network": [
{
"id": "mgmt",
"bridge": "br-mgmt",
"vhost": true
},
{
"id": "data",
"bridge": "br-data",
"mtu": 9000,
"vhost": true,
"multiqueue": 4
}
]
}
```
## Error Handling
| Error | Cause | Recovery |
|-------|-------|----------|
| TAP creation timeout | networkd slow/unresponsive | Retry with backoff, fall back to direct creation |
| vhost-net open fails | Module not loaded | Fall back to userspace virtio-net |
| Bridge not found | Infrastructure not set up | Create bridge or fail with clear error |
| MAC conflict | Duplicate MAC on bridge | Auto-regenerate MAC |
## Future Enhancements
1. **SR-IOV Passthrough**: Direct VF assignment for bare-metal performance
2. **DPDK Backend**: Alternative to TAP for ultra-low-latency
3. **virtio-vhost-user**: Offload to separate process for isolation
4. **Network Namespace Integration**: Per-VM network namespaces for isolation