# systemd-networkd Enhanced virtio-net ## Overview This design enhances Volt's virtio-net implementation by integrating with systemd-networkd for declarative, lifecycle-managed network configuration. Instead of Volt manually creating/configuring TAP devices, networkd manages them declaratively. ## Architecture ``` ┌─────────────────────────────────────────────────────────────────────┐ │ systemd-networkd │ │ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │ │ │ volt-vmm-br0 │ │ vm-{uuid}.netdev │ │ vm-{uuid}.network│ │ │ │ (.netdev bridge) │ │ (TAP definition) │ │ (bridge attach) │ │ │ └────────┬─────────┘ └────────┬─────────┘ └────────┬─────────┘ │ │ │ │ │ │ │ └─────────────────────┼─────────────────────┘ │ │ ▼ │ │ ┌───────────────┐ │ │ │ br0 │ ◄── Unified bridge │ │ │ (bridge) │ (VMs + Voltainer) │ │ └───────┬───────┘ │ │ │ │ │ ┌─────────────────┼─────────────────┐ │ │ ▼ ▼ ▼ │ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │ │ tap0 │ │ veth0 │ │ tap1 │ │ │ │ (VM-1) │ │ (cont.) │ │ (VM-2) │ │ │ └────┬────┘ └────┬────┘ └────┬────┘ │ └─────────────┼────────────────┼────────────────┼─────────────────────┘ │ │ │ ▼ ▼ ▼ ┌─────────┐ ┌─────────┐ ┌─────────┐ │Volt│ │Voltainer│ │Volt│ │ VM-1 │ │Container│ │ VM-2 │ └─────────┘ └─────────┘ └─────────┘ ``` ## Benefits 1. **Declarative Configuration**: Network topology defined in unit files, version-controllable 2. **Automatic Cleanup**: systemd removes TAP devices when VM exits 3. **Lifecycle Integration**: TAP created before VM starts, destroyed after 4. **Unified Networking**: VMs and Voltainer containers share the same bridge infrastructure 5. **vhost-net Acceleration**: Kernel-level packet processing bypasses userspace 6. **Predictable Naming**: TAP names derived from VM UUID ## Components ### 1. Bridge Infrastructure (One-time Setup) ```ini # /etc/systemd/network/10-volt-vmm-br0.netdev [NetDev] Name=br0 Kind=bridge MACAddress=52:54:00:00:00:01 [Bridge] STP=false ForwardDelaySec=0 ``` ```ini # /etc/systemd/network/10-volt-vmm-br0.network [Match] Name=br0 [Network] Address=10.42.0.1/24 IPForward=yes IPMasquerade=both ConfigureWithoutCarrier=yes ``` ### 2. Per-VM TAP Template Volt generates these dynamically: ```ini # /run/systemd/network/50-vm-{uuid}.netdev [NetDev] Name=tap-{short_uuid} Kind=tap MACAddress=none [Tap] User=root Group=root VNetHeader=true MultiQueue=true PacketInfo=false ``` ```ini # /run/systemd/network/50-vm-{uuid}.network [Match] Name=tap-{short_uuid} [Network] Bridge=br0 ConfigureWithoutCarrier=yes ``` ### 3. vhost-net Acceleration vhost-net offloads packet processing to the kernel: ``` ┌─────────────────────────────────────────────────┐ │ Guest VM │ │ ┌─────────────────────────────────────────┐ │ │ │ virtio-net driver │ │ │ └─────────────────┬───────────────────────┘ │ └───────────────────┬┼────────────────────────────┘ ││ ┌──────────┘│ │ │ KVM Exit (rare) ▼ ▼ ┌────────────────────────────────────────────────┐ │ vhost-net (kernel) │ │ │ │ - Processes virtqueue directly in kernel │ │ - Zero-copy between TAP and guest memory │ │ - Avoids userspace context switches │ │ - ~30-50% throughput improvement │ └────────────────────┬───────────────────────────┘ │ ▼ ┌─────────────┐ │ TAP device │ └─────────────┘ ``` **Without vhost-net:** ``` Guest → KVM exit → QEMU/Volt userspace → syscall → TAP → kernel → network ``` **With vhost-net:** ``` Guest → vhost-net (kernel) → TAP → network ``` ## Integration with Voltainer Both Volt VMs and Voltainer containers connect to the same bridge: ### Voltainer Network Zone ```yaml # /etc/voltainer/network/zone-default.yaml kind: NetworkZone name: default bridge: br0 subnet: 10.42.0.0/24 gateway: 10.42.0.1 dhcp: enabled: true range: 10.42.0.100-10.42.0.254 ``` ### Volt VM Allocation VMs get static IPs from a reserved range (10.42.0.2-10.42.0.99): ```yaml network: - zone: default mac: "52:54:00:ab:cd:ef" ipv4: "10.42.0.10/24" ``` ## File Locations | File Type | Location | Persistence | |-----------|----------|-------------| | Bridge .netdev/.network | `/etc/systemd/network/` | Permanent | | VM TAP .netdev/.network | `/run/systemd/network/` | Runtime only | | Voltainer zone config | `/etc/voltainer/network/` | Permanent | | vhost-net module | Kernel built-in | N/A | ## Lifecycle ### VM Start 1. Volt generates `.netdev` and `.network` in `/run/systemd/network/` 2. `networkctl reload` triggers networkd to create TAP 3. Wait for TAP interface to appear (`networkctl status tap-XXX`) 4. Open TAP fd with O_RDWR 5. Enable vhost-net via `/dev/vhost-net` ioctl 6. Boot VM with virtio-net using the TAP fd ### VM Stop 1. Close vhost-net and TAP file descriptors 2. Delete `.netdev` and `.network` from `/run/systemd/network/` 3. `networkctl reload` triggers cleanup 4. TAP interface automatically removed ## vhost-net Setup Sequence ```c // 1. Open vhost-net device int vhost_fd = open("/dev/vhost-net", O_RDWR); // 2. Set owner (associate with TAP) ioctl(vhost_fd, VHOST_SET_OWNER, 0); // 3. Set memory region table struct vhost_memory *mem = ...; // Guest memory regions ioctl(vhost_fd, VHOST_SET_MEM_TABLE, mem); // 4. Set vring info for each queue (RX and TX) struct vhost_vring_state state = { .index = 0, .num = queue_size }; ioctl(vhost_fd, VHOST_SET_VRING_NUM, &state); struct vhost_vring_addr addr = { .index = 0, .desc_user_addr = desc_addr, .used_user_addr = used_addr, .avail_user_addr = avail_addr, }; ioctl(vhost_fd, VHOST_SET_VRING_ADDR, &addr); // 5. Set kick/call eventfds struct vhost_vring_file kick = { .index = 0, .fd = kick_eventfd }; ioctl(vhost_fd, VHOST_SET_VRING_KICK, &kick); struct vhost_vring_file call = { .index = 0, .fd = call_eventfd }; ioctl(vhost_fd, VHOST_SET_VRING_CALL, &call); // 6. Associate with TAP backend struct vhost_vring_file backend = { .index = 0, .fd = tap_fd }; ioctl(vhost_fd, VHOST_NET_SET_BACKEND, &backend); ``` ## Performance Comparison | Metric | userspace virtio-net | vhost-net | |--------|---------------------|-----------| | Throughput (1500 MTU) | ~5 Gbps | ~8 Gbps | | Throughput (Jumbo 9000) | ~8 Gbps | ~15 Gbps | | Latency (ping) | ~200 µs | ~80 µs | | CPU usage | Higher | 30-50% lower | | Context switches | Many | Minimal | ## Configuration Examples ### Minimal VM with Networking ```json { "vcpus": 2, "memory_mib": 512, "kernel": "vmlinux", "network": [{ "id": "eth0", "mode": "networkd", "bridge": "br0", "mac": "52:54:00:12:34:56", "vhost": true }] } ``` ### Multi-NIC VM ```json { "network": [ { "id": "mgmt", "bridge": "br-mgmt", "vhost": true }, { "id": "data", "bridge": "br-data", "mtu": 9000, "vhost": true, "multiqueue": 4 } ] } ``` ## Error Handling | Error | Cause | Recovery | |-------|-------|----------| | TAP creation timeout | networkd slow/unresponsive | Retry with backoff, fall back to direct creation | | vhost-net open fails | Module not loaded | Fall back to userspace virtio-net | | Bridge not found | Infrastructure not set up | Create bridge or fail with clear error | | MAC conflict | Duplicate MAC on bridge | Auto-regenerate MAC | ## Future Enhancements 1. **SR-IOV Passthrough**: Direct VF assignment for bare-metal performance 2. **DPDK Backend**: Alternative to TAP for ultra-low-latency 3. **virtio-vhost-user**: Offload to separate process for isolation 4. **Network Namespace Integration**: Per-VM network namespaces for isolation