Volt VMM (Neutron Stardust): source-available under AGPSL v5.0
KVM-based microVMM for the Volt platform: - Sub-second VM boot times - Minimal memory footprint - Landlock LSM + seccomp security - Virtio device support - Custom kernel management Copyright (c) Armored Gates LLC. All rights reserved. Licensed under AGPSL v5.0
This commit is contained in:
302
designs/networkd-virtio-net.md
Normal file
302
designs/networkd-virtio-net.md
Normal file
@@ -0,0 +1,302 @@
|
||||
# systemd-networkd Enhanced virtio-net
|
||||
|
||||
## Overview
|
||||
|
||||
This design enhances Volt's virtio-net implementation by integrating with systemd-networkd for declarative, lifecycle-managed network configuration. Instead of Volt manually creating/configuring TAP devices, networkd manages them declaratively.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ systemd-networkd │
|
||||
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │
|
||||
│ │ volt-vmm-br0 │ │ vm-{uuid}.netdev │ │ vm-{uuid}.network│ │
|
||||
│ │ (.netdev bridge) │ │ (TAP definition) │ │ (bridge attach) │ │
|
||||
│ └────────┬─────────┘ └────────┬─────────┘ └────────┬─────────┘ │
|
||||
│ │ │ │ │
|
||||
│ └─────────────────────┼─────────────────────┘ │
|
||||
│ ▼ │
|
||||
│ ┌───────────────┐ │
|
||||
│ │ br0 │ ◄── Unified bridge │
|
||||
│ │ (bridge) │ (VMs + Voltainer) │
|
||||
│ └───────┬───────┘ │
|
||||
│ │ │
|
||||
│ ┌─────────────────┼─────────────────┐ │
|
||||
│ ▼ ▼ ▼ │
|
||||
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
|
||||
│ │ tap0 │ │ veth0 │ │ tap1 │ │
|
||||
│ │ (VM-1) │ │ (cont.) │ │ (VM-2) │ │
|
||||
│ └────┬────┘ └────┬────┘ └────┬────┘ │
|
||||
└─────────────┼────────────────┼────────────────┼─────────────────────┘
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌─────────┐ ┌─────────┐ ┌─────────┐
|
||||
│Volt│ │Voltainer│ │Volt│
|
||||
│ VM-1 │ │Container│ │ VM-2 │
|
||||
└─────────┘ └─────────┘ └─────────┘
|
||||
```
|
||||
|
||||
## Benefits
|
||||
|
||||
1. **Declarative Configuration**: Network topology defined in unit files, version-controllable
|
||||
2. **Automatic Cleanup**: systemd removes TAP devices when VM exits
|
||||
3. **Lifecycle Integration**: TAP created before VM starts, destroyed after
|
||||
4. **Unified Networking**: VMs and Voltainer containers share the same bridge infrastructure
|
||||
5. **vhost-net Acceleration**: Kernel-level packet processing bypasses userspace
|
||||
6. **Predictable Naming**: TAP names derived from VM UUID
|
||||
|
||||
## Components
|
||||
|
||||
### 1. Bridge Infrastructure (One-time Setup)
|
||||
|
||||
```ini
|
||||
# /etc/systemd/network/10-volt-vmm-br0.netdev
|
||||
[NetDev]
|
||||
Name=br0
|
||||
Kind=bridge
|
||||
MACAddress=52:54:00:00:00:01
|
||||
|
||||
[Bridge]
|
||||
STP=false
|
||||
ForwardDelaySec=0
|
||||
```
|
||||
|
||||
```ini
|
||||
# /etc/systemd/network/10-volt-vmm-br0.network
|
||||
[Match]
|
||||
Name=br0
|
||||
|
||||
[Network]
|
||||
Address=10.42.0.1/24
|
||||
IPForward=yes
|
||||
IPMasquerade=both
|
||||
ConfigureWithoutCarrier=yes
|
||||
```
|
||||
|
||||
### 2. Per-VM TAP Template
|
||||
|
||||
Volt generates these dynamically:
|
||||
|
||||
```ini
|
||||
# /run/systemd/network/50-vm-{uuid}.netdev
|
||||
[NetDev]
|
||||
Name=tap-{short_uuid}
|
||||
Kind=tap
|
||||
MACAddress=none
|
||||
|
||||
[Tap]
|
||||
User=root
|
||||
Group=root
|
||||
VNetHeader=true
|
||||
MultiQueue=true
|
||||
PacketInfo=false
|
||||
```
|
||||
|
||||
```ini
|
||||
# /run/systemd/network/50-vm-{uuid}.network
|
||||
[Match]
|
||||
Name=tap-{short_uuid}
|
||||
|
||||
[Network]
|
||||
Bridge=br0
|
||||
ConfigureWithoutCarrier=yes
|
||||
```
|
||||
|
||||
### 3. vhost-net Acceleration
|
||||
|
||||
vhost-net offloads packet processing to the kernel:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────┐
|
||||
│ Guest VM │
|
||||
│ ┌─────────────────────────────────────────┐ │
|
||||
│ │ virtio-net driver │ │
|
||||
│ └─────────────────┬───────────────────────┘ │
|
||||
└───────────────────┬┼────────────────────────────┘
|
||||
││
|
||||
┌──────────┘│
|
||||
│ │ KVM Exit (rare)
|
||||
▼ ▼
|
||||
┌────────────────────────────────────────────────┐
|
||||
│ vhost-net (kernel) │
|
||||
│ │
|
||||
│ - Processes virtqueue directly in kernel │
|
||||
│ - Zero-copy between TAP and guest memory │
|
||||
│ - Avoids userspace context switches │
|
||||
│ - ~30-50% throughput improvement │
|
||||
└────────────────────┬───────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────┐
|
||||
│ TAP device │
|
||||
└─────────────┘
|
||||
```
|
||||
|
||||
**Without vhost-net:**
|
||||
```
|
||||
Guest → KVM exit → QEMU/Volt userspace → syscall → TAP → kernel → network
|
||||
```
|
||||
|
||||
**With vhost-net:**
|
||||
```
|
||||
Guest → vhost-net (kernel) → TAP → network
|
||||
```
|
||||
|
||||
## Integration with Voltainer
|
||||
|
||||
Both Volt VMs and Voltainer containers connect to the same bridge:
|
||||
|
||||
### Voltainer Network Zone
|
||||
|
||||
```yaml
|
||||
# /etc/voltainer/network/zone-default.yaml
|
||||
kind: NetworkZone
|
||||
name: default
|
||||
bridge: br0
|
||||
subnet: 10.42.0.0/24
|
||||
gateway: 10.42.0.1
|
||||
dhcp:
|
||||
enabled: true
|
||||
range: 10.42.0.100-10.42.0.254
|
||||
```
|
||||
|
||||
### Volt VM Allocation
|
||||
|
||||
VMs get static IPs from a reserved range (10.42.0.2-10.42.0.99):
|
||||
|
||||
```yaml
|
||||
network:
|
||||
- zone: default
|
||||
mac: "52:54:00:ab:cd:ef"
|
||||
ipv4: "10.42.0.10/24"
|
||||
```
|
||||
|
||||
## File Locations
|
||||
|
||||
| File Type | Location | Persistence |
|
||||
|-----------|----------|-------------|
|
||||
| Bridge .netdev/.network | `/etc/systemd/network/` | Permanent |
|
||||
| VM TAP .netdev/.network | `/run/systemd/network/` | Runtime only |
|
||||
| Voltainer zone config | `/etc/voltainer/network/` | Permanent |
|
||||
| vhost-net module | Kernel built-in | N/A |
|
||||
|
||||
## Lifecycle
|
||||
|
||||
### VM Start
|
||||
|
||||
1. Volt generates `.netdev` and `.network` in `/run/systemd/network/`
|
||||
2. `networkctl reload` triggers networkd to create TAP
|
||||
3. Wait for TAP interface to appear (`networkctl status tap-XXX`)
|
||||
4. Open TAP fd with O_RDWR
|
||||
5. Enable vhost-net via `/dev/vhost-net` ioctl
|
||||
6. Boot VM with virtio-net using the TAP fd
|
||||
|
||||
### VM Stop
|
||||
|
||||
1. Close vhost-net and TAP file descriptors
|
||||
2. Delete `.netdev` and `.network` from `/run/systemd/network/`
|
||||
3. `networkctl reload` triggers cleanup
|
||||
4. TAP interface automatically removed
|
||||
|
||||
## vhost-net Setup Sequence
|
||||
|
||||
```c
|
||||
// 1. Open vhost-net device
|
||||
int vhost_fd = open("/dev/vhost-net", O_RDWR);
|
||||
|
||||
// 2. Set owner (associate with TAP)
|
||||
ioctl(vhost_fd, VHOST_SET_OWNER, 0);
|
||||
|
||||
// 3. Set memory region table
|
||||
struct vhost_memory *mem = ...; // Guest memory regions
|
||||
ioctl(vhost_fd, VHOST_SET_MEM_TABLE, mem);
|
||||
|
||||
// 4. Set vring info for each queue (RX and TX)
|
||||
struct vhost_vring_state state = { .index = 0, .num = queue_size };
|
||||
ioctl(vhost_fd, VHOST_SET_VRING_NUM, &state);
|
||||
|
||||
struct vhost_vring_addr addr = {
|
||||
.index = 0,
|
||||
.desc_user_addr = desc_addr,
|
||||
.used_user_addr = used_addr,
|
||||
.avail_user_addr = avail_addr,
|
||||
};
|
||||
ioctl(vhost_fd, VHOST_SET_VRING_ADDR, &addr);
|
||||
|
||||
// 5. Set kick/call eventfds
|
||||
struct vhost_vring_file kick = { .index = 0, .fd = kick_eventfd };
|
||||
ioctl(vhost_fd, VHOST_SET_VRING_KICK, &kick);
|
||||
|
||||
struct vhost_vring_file call = { .index = 0, .fd = call_eventfd };
|
||||
ioctl(vhost_fd, VHOST_SET_VRING_CALL, &call);
|
||||
|
||||
// 6. Associate with TAP backend
|
||||
struct vhost_vring_file backend = { .index = 0, .fd = tap_fd };
|
||||
ioctl(vhost_fd, VHOST_NET_SET_BACKEND, &backend);
|
||||
```
|
||||
|
||||
## Performance Comparison
|
||||
|
||||
| Metric | userspace virtio-net | vhost-net |
|
||||
|--------|---------------------|-----------|
|
||||
| Throughput (1500 MTU) | ~5 Gbps | ~8 Gbps |
|
||||
| Throughput (Jumbo 9000) | ~8 Gbps | ~15 Gbps |
|
||||
| Latency (ping) | ~200 µs | ~80 µs |
|
||||
| CPU usage | Higher | 30-50% lower |
|
||||
| Context switches | Many | Minimal |
|
||||
|
||||
## Configuration Examples
|
||||
|
||||
### Minimal VM with Networking
|
||||
|
||||
```json
|
||||
{
|
||||
"vcpus": 2,
|
||||
"memory_mib": 512,
|
||||
"kernel": "vmlinux",
|
||||
"network": [{
|
||||
"id": "eth0",
|
||||
"mode": "networkd",
|
||||
"bridge": "br0",
|
||||
"mac": "52:54:00:12:34:56",
|
||||
"vhost": true
|
||||
}]
|
||||
}
|
||||
```
|
||||
|
||||
### Multi-NIC VM
|
||||
|
||||
```json
|
||||
{
|
||||
"network": [
|
||||
{
|
||||
"id": "mgmt",
|
||||
"bridge": "br-mgmt",
|
||||
"vhost": true
|
||||
},
|
||||
{
|
||||
"id": "data",
|
||||
"bridge": "br-data",
|
||||
"mtu": 9000,
|
||||
"vhost": true,
|
||||
"multiqueue": 4
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Error | Cause | Recovery |
|
||||
|-------|-------|----------|
|
||||
| TAP creation timeout | networkd slow/unresponsive | Retry with backoff, fall back to direct creation |
|
||||
| vhost-net open fails | Module not loaded | Fall back to userspace virtio-net |
|
||||
| Bridge not found | Infrastructure not set up | Create bridge or fail with clear error |
|
||||
| MAC conflict | Duplicate MAC on bridge | Auto-regenerate MAC |
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
1. **SR-IOV Passthrough**: Direct VF assignment for bare-metal performance
|
||||
2. **DPDK Backend**: Alternative to TAP for ultra-low-latency
|
||||
3. **virtio-vhost-user**: Offload to separate process for isolation
|
||||
4. **Network Namespace Integration**: Per-VM network namespaces for isolation
|
||||
Reference in New Issue
Block a user