Complete infrastructure platform CLI: - Container runtime (systemd-nspawn) - VoltVisor VMs (Neutron Stardust / QEMU) - Stellarium CAS (content-addressed storage) - ORAS Registry - GitOps integration - Landlock LSM security - Compose orchestration - Mesh networking Copyright (c) Armored Gates LLC. All rights reserved. Licensed under AGPSL v5.0
632 lines
12 KiB
Markdown
632 lines
12 KiB
Markdown
# Troubleshooting
|
|
|
|
Common issues and solutions for the Volt Platform.
|
|
|
|
## Quick Diagnostics
|
|
|
|
Run these first to understand the state of your system:
|
|
|
|
```bash
|
|
# Platform health check
|
|
volt system health
|
|
|
|
# Platform info
|
|
volt system info
|
|
|
|
# What's running?
|
|
volt ps --all
|
|
|
|
# Daemon status
|
|
volt daemon status
|
|
|
|
# Network status
|
|
volt net status
|
|
```
|
|
|
|
---
|
|
|
|
## Container Issues
|
|
|
|
### Container Won't Start
|
|
|
|
**Symptom**: `volt container start <name>` fails or returns an error.
|
|
|
|
**Check the logs first**:
|
|
```bash
|
|
volt container logs <name>
|
|
volt logs <name>
|
|
```
|
|
|
|
**Common causes**:
|
|
|
|
1. **Image not found**
|
|
```
|
|
Error: image "ubuntu:24.04" not found
|
|
```
|
|
Pull the image first:
|
|
```bash
|
|
sudo volt image pull ubuntu:24.04
|
|
volt image list
|
|
```
|
|
|
|
2. **Name conflict**
|
|
```
|
|
Error: container "web" already exists
|
|
```
|
|
Delete the existing container or use a different name:
|
|
```bash
|
|
volt container delete web
|
|
```
|
|
|
|
3. **systemd-nspawn not installed**
|
|
```
|
|
Error: systemd-nspawn not found
|
|
```
|
|
Install the systemd-container package:
|
|
```bash
|
|
# Debian/Ubuntu
|
|
sudo apt install systemd-container
|
|
|
|
# Fedora/Rocky
|
|
sudo dnf install systemd-container
|
|
```
|
|
|
|
4. **Rootfs directory missing or corrupt**
|
|
```bash
|
|
ls -la /var/lib/volt/containers/<name>/rootfs/
|
|
```
|
|
If empty or missing, recreate the container:
|
|
```bash
|
|
volt container delete <name>
|
|
volt container create --name <name> --image <image> --start
|
|
```
|
|
|
|
5. **Resource limits too restrictive**
|
|
Try creating without limits, then add them:
|
|
```bash
|
|
volt container create --name test --image ubuntu:24.04 --start
|
|
volt container update test --memory 512M
|
|
```
|
|
|
|
### Container Starts But Process Exits Immediately
|
|
|
|
**Check the main process**:
|
|
```bash
|
|
volt container logs <name>
|
|
volt container inspect <name>
|
|
```
|
|
|
|
Common cause: the container has no init process or the specified command doesn't exist in the image.
|
|
|
|
```bash
|
|
# Try interactive shell to debug
|
|
volt container shell <name>
|
|
```
|
|
|
|
### Can't Exec Into Container
|
|
|
|
**Symptom**: `volt container exec` fails.
|
|
|
|
1. **Container not running**:
|
|
```bash
|
|
volt ps --all | grep <name>
|
|
volt container start <name>
|
|
```
|
|
|
|
2. **Shell not available in image**:
|
|
The default shell (`/bin/sh`) might not exist in minimal images. Check:
|
|
```bash
|
|
volt container exec <name> -- /bin/bash
|
|
volt container exec <name> -- /bin/busybox sh
|
|
```
|
|
|
|
### Container Resource Limits Not Working
|
|
|
|
Verify cgroup v2 is enabled:
|
|
```bash
|
|
mount | grep cgroup2
|
|
# Should show: cgroup2 on /sys/fs/cgroup type cgroup2
|
|
```
|
|
|
|
Check the cgroup settings:
|
|
```bash
|
|
volt container inspect <name> -o json | grep -i memory
|
|
cat /sys/fs/cgroup/system.slice/volt-container@<name>.service/memory.max
|
|
```
|
|
|
|
---
|
|
|
|
## VM Issues
|
|
|
|
### VM Won't Start
|
|
|
|
**Check prerequisites**:
|
|
```bash
|
|
# KVM available?
|
|
ls -la /dev/kvm
|
|
|
|
# QEMU installed?
|
|
which qemu-system-x86_64
|
|
|
|
# Kernel modules loaded?
|
|
lsmod | grep kvm
|
|
```
|
|
|
|
**If `/dev/kvm` doesn't exist**:
|
|
```bash
|
|
# Load KVM modules
|
|
sudo modprobe kvm
|
|
sudo modprobe kvm_intel # or kvm_amd
|
|
|
|
# Check BIOS: virtualization must be enabled (VT-x / AMD-V)
|
|
dmesg | grep -i kvm
|
|
```
|
|
|
|
**If permission denied on `/dev/kvm`**:
|
|
```bash
|
|
# Add user to kvm group
|
|
sudo usermod -aG kvm $USER
|
|
# Log out and back in
|
|
|
|
# Or check group ownership
|
|
ls -la /dev/kvm
|
|
# Should be: crw-rw---- 1 root kvm
|
|
```
|
|
|
|
### VM Starts But No SSH Access
|
|
|
|
1. **VM might still be booting**. Wait 30-60 seconds for first boot.
|
|
|
|
2. **Check VM has an IP**:
|
|
```bash
|
|
volt vm list -o wide
|
|
```
|
|
|
|
3. **SSH might not be installed/running in the VM**:
|
|
```bash
|
|
volt vm exec <name> -- systemctl status sshd
|
|
```
|
|
|
|
4. **Network connectivity**:
|
|
```bash
|
|
# From host, ping the VM's IP
|
|
ping <vm-ip>
|
|
```
|
|
|
|
### VM Performance Issues
|
|
|
|
Apply a tuning profile:
|
|
```bash
|
|
volt tune profile apply <vm-name> --profile database
|
|
```
|
|
|
|
Or tune individually:
|
|
```bash
|
|
# Pin CPUs
|
|
volt tune cpu pin <vm-name> --cpus 4,5,6,7
|
|
|
|
# Enable hugepages
|
|
volt tune memory hugepages --enable --size 2M --count 4096
|
|
|
|
# Set I/O scheduler
|
|
volt tune io scheduler /dev/sda --scheduler none
|
|
```
|
|
|
|
---
|
|
|
|
## Service Issues
|
|
|
|
### Service Won't Start
|
|
|
|
```bash
|
|
# Check status
|
|
volt service status <name>
|
|
|
|
# View logs
|
|
volt service logs <name>
|
|
|
|
# View the unit file for issues
|
|
volt service show <name>
|
|
```
|
|
|
|
Common causes:
|
|
|
|
1. **ExecStart path doesn't exist**:
|
|
```bash
|
|
which <binary-path>
|
|
```
|
|
|
|
2. **User/group doesn't exist**:
|
|
```bash
|
|
id <service-user>
|
|
# Create if missing
|
|
sudo useradd -r -s /bin/false <service-user>
|
|
```
|
|
|
|
3. **Working directory doesn't exist**:
|
|
```bash
|
|
ls -la <workdir-path>
|
|
sudo mkdir -p <workdir-path>
|
|
```
|
|
|
|
4. **Port already in use**:
|
|
```bash
|
|
ss -tlnp | grep <port>
|
|
```
|
|
|
|
### Service Keeps Restarting
|
|
|
|
Check the restart loop:
|
|
```bash
|
|
volt service status <name>
|
|
volt service logs <name> --tail 50
|
|
```
|
|
|
|
If the service fails immediately on start, systemd may hit the start rate limit. Check:
|
|
```bash
|
|
# View full systemd status
|
|
systemctl status <name>.service
|
|
```
|
|
|
|
Temporarily adjust restart behavior:
|
|
```bash
|
|
volt service edit <name> --inline "RestartSec=10"
|
|
```
|
|
|
|
### Can't Delete a Service
|
|
|
|
```bash
|
|
# If it says "refusing to delete system unit"
|
|
# Volt protects system services. Only user-created services can be deleted.
|
|
|
|
# If stuck, manually:
|
|
volt service stop <name>
|
|
volt service disable <name>
|
|
volt service delete <name>
|
|
```
|
|
|
|
---
|
|
|
|
## Networking Issues
|
|
|
|
### No Network Connectivity from Container
|
|
|
|
1. **Check bridge exists**:
|
|
```bash
|
|
volt net bridge list
|
|
```
|
|
If `volt0` is missing:
|
|
```bash
|
|
sudo volt net bridge create volt0 --subnet 10.0.0.0/24
|
|
```
|
|
|
|
2. **Check IP forwarding**:
|
|
```bash
|
|
volt tune sysctl get net.ipv4.ip_forward
|
|
# Should be 1. If not:
|
|
sudo volt tune sysctl set net.ipv4.ip_forward 1 --persist
|
|
```
|
|
|
|
3. **Check NAT/masquerade rules**:
|
|
```bash
|
|
sudo nft list ruleset | grep masquerade
|
|
```
|
|
|
|
4. **Check container has an IP**:
|
|
```bash
|
|
volt container inspect <name>
|
|
```
|
|
|
|
### Workloads Can't Resolve Names
|
|
|
|
1. **Check internal DNS**:
|
|
```bash
|
|
volt net dns list
|
|
```
|
|
|
|
2. **Flush DNS cache**:
|
|
```bash
|
|
volt net dns flush
|
|
```
|
|
|
|
3. **Check upstream DNS in config**:
|
|
```bash
|
|
volt config get network.dns.upstream
|
|
```
|
|
|
|
### Port Forward Not Working
|
|
|
|
1. **Verify the forward exists**:
|
|
```bash
|
|
volt net port list
|
|
```
|
|
|
|
2. **Check the target is running and listening**:
|
|
```bash
|
|
volt ps | grep <target>
|
|
volt container exec <target> -- ss -tlnp
|
|
```
|
|
|
|
3. **Check firewall rules**:
|
|
```bash
|
|
volt net firewall list
|
|
```
|
|
|
|
4. **Check for host-level firewall conflicts**:
|
|
```bash
|
|
sudo nft list ruleset
|
|
sudo iptables -L -n # if iptables is also in use
|
|
```
|
|
|
|
### Firewall Rule Not Taking Effect
|
|
|
|
1. **List current rules**:
|
|
```bash
|
|
volt net firewall list
|
|
```
|
|
|
|
2. **Rule ordering matters**. More specific rules should come first. If a broad `deny` rule precedes your `accept` rule, traffic will be blocked.
|
|
|
|
3. **Flush and recreate if confused**:
|
|
```bash
|
|
volt net firewall flush
|
|
# Re-add rules in the correct order
|
|
```
|
|
|
|
---
|
|
|
|
## Daemon Issues
|
|
|
|
### Daemon Not Running
|
|
|
|
```bash
|
|
volt daemon status
|
|
# If not running:
|
|
sudo volt daemon start
|
|
```
|
|
|
|
Check systemd:
|
|
```bash
|
|
systemctl status volt.service
|
|
journalctl -u volt.service --no-pager -n 50
|
|
```
|
|
|
|
### Daemon Won't Start
|
|
|
|
1. **Socket in use**:
|
|
```bash
|
|
ls -la /var/run/volt/volt.sock
|
|
# Remove stale socket
|
|
sudo rm /var/run/volt/volt.sock
|
|
sudo volt daemon start
|
|
```
|
|
|
|
2. **Config file invalid**:
|
|
```bash
|
|
volt config validate
|
|
```
|
|
|
|
3. **Missing directories**:
|
|
```bash
|
|
sudo mkdir -p /var/lib/volt /var/run/volt /var/log/volt /var/cache/volt /etc/volt
|
|
```
|
|
|
|
4. **PID file stale**:
|
|
```bash
|
|
cat /var/run/volt/volt.pid
|
|
# Check if that PID exists
|
|
ps -p $(cat /var/run/volt/volt.pid)
|
|
# If no process, remove it
|
|
sudo rm /var/run/volt/volt.pid
|
|
sudo volt daemon start
|
|
```
|
|
|
|
### Commands Timeout
|
|
|
|
```bash
|
|
# Increase timeout
|
|
volt --timeout 120 <command>
|
|
|
|
# Or check if daemon is overloaded
|
|
volt daemon status
|
|
volt top
|
|
```
|
|
|
|
---
|
|
|
|
## Permission Issues
|
|
|
|
### "Permission denied" Errors
|
|
|
|
Most state-changing operations require root or `volt` group membership:
|
|
|
|
```bash
|
|
# Add user to volt group
|
|
sudo usermod -aG volt $USER
|
|
# Log out and back in for group change to take effect
|
|
|
|
# Or use sudo
|
|
sudo volt container create --name web --image ubuntu:24.04 --start
|
|
```
|
|
|
|
### Read-Only Operations Work, Write Operations Fail
|
|
|
|
This is expected for non-root, non-`volt-group` users. These commands always work:
|
|
|
|
```bash
|
|
volt ps # Read-only
|
|
volt top # Read-only
|
|
volt logs <name> # Read-only
|
|
volt service list # Read-only
|
|
volt config show # Read-only
|
|
```
|
|
|
|
These require privileges:
|
|
|
|
```bash
|
|
volt container create # Needs root/volt group
|
|
volt service create # Needs root
|
|
volt net firewall add # Needs root
|
|
volt tune sysctl set # Needs root
|
|
```
|
|
|
|
---
|
|
|
|
## Storage Issues
|
|
|
|
### Disk Space Full
|
|
|
|
```bash
|
|
# Check disk usage
|
|
volt system info
|
|
|
|
# Clean up unused images
|
|
volt image list
|
|
volt image delete <unused-image>
|
|
|
|
# Clean CAS garbage
|
|
volt cas gc --dry-run
|
|
volt cas gc
|
|
|
|
# Clear cache (safe to delete)
|
|
sudo rm -rf /var/cache/volt/*
|
|
|
|
# Check container sizes
|
|
du -sh /var/lib/volt/containers/*/
|
|
```
|
|
|
|
### CAS Integrity Errors
|
|
|
|
```bash
|
|
# Verify CAS store
|
|
volt cas verify
|
|
|
|
# If corrupted objects are found, re-pull affected images
|
|
volt image delete <affected-image>
|
|
volt image pull <image>
|
|
```
|
|
|
|
### Volume Won't Attach
|
|
|
|
1. **Volume exists?**
|
|
```bash
|
|
volt volume list
|
|
```
|
|
|
|
2. **Already attached?**
|
|
```bash
|
|
volt volume inspect <name>
|
|
```
|
|
|
|
3. **Target workload running?**
|
|
Volumes can typically only be attached to running workloads.
|
|
|
|
---
|
|
|
|
## Compose Issues
|
|
|
|
### `volt compose up` Fails
|
|
|
|
1. **Validate the compose file**:
|
|
```bash
|
|
volt compose config
|
|
```
|
|
|
|
2. **Missing images**:
|
|
```bash
|
|
volt compose pull
|
|
```
|
|
|
|
3. **Dependency issues**: Check that `depends_on` targets exist in the file and their conditions can be met.
|
|
|
|
4. **Network conflicts**: If subnets overlap with existing networks:
|
|
```bash
|
|
volt net list
|
|
```
|
|
|
|
### Environment Variables Not Resolving
|
|
|
|
```bash
|
|
# Check .env file exists in same directory as compose file
|
|
cat .env
|
|
|
|
# Variables must be set in the host environment or .env file
|
|
export DB_PASSWORD=mysecret
|
|
volt compose up
|
|
```
|
|
|
|
Undefined variables with no default cause an error. Use default syntax:
|
|
```yaml
|
|
environment:
|
|
DB_PASSWORD: "${DB_PASSWORD:-defaultpass}"
|
|
```
|
|
|
|
---
|
|
|
|
## Exit Codes
|
|
|
|
Use exit codes in scripts for error handling:
|
|
|
|
| Code | Meaning | Action |
|
|
|------|---------|--------|
|
|
| 0 | Success | Continue |
|
|
| 2 | Bad arguments | Fix command syntax |
|
|
| 3 | Not found | Resource doesn't exist |
|
|
| 4 | Already exists | Resource name taken |
|
|
| 5 | Permission denied | Use sudo or join `volt` group |
|
|
| 6 | Daemon down | `sudo volt daemon start` |
|
|
| 7 | Timeout | Retry with `--timeout` |
|
|
| 9 | Conflict | Resource in wrong state |
|
|
|
|
```bash
|
|
volt container start web
|
|
case $? in
|
|
0) echo "Started" ;;
|
|
3) echo "Container not found" ;;
|
|
5) echo "Permission denied — try sudo" ;;
|
|
6) echo "Daemon not running — sudo volt daemon start" ;;
|
|
9) echo "Already running" ;;
|
|
*) echo "Error: $?" ;;
|
|
esac
|
|
```
|
|
|
|
---
|
|
|
|
## Collecting Debug Info
|
|
|
|
When reporting issues, gather:
|
|
|
|
```bash
|
|
# Version
|
|
volt --version
|
|
|
|
# System info
|
|
volt system info -o json
|
|
|
|
# Health check
|
|
volt system health
|
|
|
|
# Daemon logs
|
|
journalctl -u volt.service --no-pager -n 100
|
|
|
|
# Run the failing command with debug
|
|
volt --debug <failing-command>
|
|
|
|
# Audit log
|
|
tail -50 /var/log/volt/audit.log
|
|
```
|
|
|
|
## Factory Reset
|
|
|
|
If all else fails, reset Volt to defaults. **This is destructive** — it stops all workloads and removes all configuration.
|
|
|
|
```bash
|
|
sudo volt system reset --confirm
|
|
```
|
|
|
|
After reset, reinitialize:
|
|
```bash
|
|
sudo volt daemon start
|
|
volt system health
|
|
```
|