Files
volt/docs/troubleshooting.md
Karl Clinger 0ebe75b2ca Volt CLI: source-available under AGPSL v5.0
Complete infrastructure platform CLI:
- Container runtime (systemd-nspawn)
- VoltVisor VMs (Neutron Stardust / QEMU)
- Stellarium CAS (content-addressed storage)
- ORAS Registry
- GitOps integration
- Landlock LSM security
- Compose orchestration
- Mesh networking

Copyright (c) Armored Gates LLC. All rights reserved.
Licensed under AGPSL v5.0
2026-03-21 02:08:15 -05:00

632 lines
12 KiB
Markdown

# Troubleshooting
Common issues and solutions for the Volt Platform.
## Quick Diagnostics
Run these first to understand the state of your system:
```bash
# Platform health check
volt system health
# Platform info
volt system info
# What's running?
volt ps --all
# Daemon status
volt daemon status
# Network status
volt net status
```
---
## Container Issues
### Container Won't Start
**Symptom**: `volt container start <name>` fails or returns an error.
**Check the logs first**:
```bash
volt container logs <name>
volt logs <name>
```
**Common causes**:
1. **Image not found**
```
Error: image "ubuntu:24.04" not found
```
Pull the image first:
```bash
sudo volt image pull ubuntu:24.04
volt image list
```
2. **Name conflict**
```
Error: container "web" already exists
```
Delete the existing container or use a different name:
```bash
volt container delete web
```
3. **systemd-nspawn not installed**
```
Error: systemd-nspawn not found
```
Install the systemd-container package:
```bash
# Debian/Ubuntu
sudo apt install systemd-container
# Fedora/Rocky
sudo dnf install systemd-container
```
4. **Rootfs directory missing or corrupt**
```bash
ls -la /var/lib/volt/containers/<name>/rootfs/
```
If empty or missing, recreate the container:
```bash
volt container delete <name>
volt container create --name <name> --image <image> --start
```
5. **Resource limits too restrictive**
Try creating without limits, then add them:
```bash
volt container create --name test --image ubuntu:24.04 --start
volt container update test --memory 512M
```
### Container Starts But Process Exits Immediately
**Check the main process**:
```bash
volt container logs <name>
volt container inspect <name>
```
Common cause: the container has no init process or the specified command doesn't exist in the image.
```bash
# Try interactive shell to debug
volt container shell <name>
```
### Can't Exec Into Container
**Symptom**: `volt container exec` fails.
1. **Container not running**:
```bash
volt ps --all | grep <name>
volt container start <name>
```
2. **Shell not available in image**:
The default shell (`/bin/sh`) might not exist in minimal images. Check:
```bash
volt container exec <name> -- /bin/bash
volt container exec <name> -- /bin/busybox sh
```
### Container Resource Limits Not Working
Verify cgroup v2 is enabled:
```bash
mount | grep cgroup2
# Should show: cgroup2 on /sys/fs/cgroup type cgroup2
```
Check the cgroup settings:
```bash
volt container inspect <name> -o json | grep -i memory
cat /sys/fs/cgroup/system.slice/volt-container@<name>.service/memory.max
```
---
## VM Issues
### VM Won't Start
**Check prerequisites**:
```bash
# KVM available?
ls -la /dev/kvm
# QEMU installed?
which qemu-system-x86_64
# Kernel modules loaded?
lsmod | grep kvm
```
**If `/dev/kvm` doesn't exist**:
```bash
# Load KVM modules
sudo modprobe kvm
sudo modprobe kvm_intel # or kvm_amd
# Check BIOS: virtualization must be enabled (VT-x / AMD-V)
dmesg | grep -i kvm
```
**If permission denied on `/dev/kvm`**:
```bash
# Add user to kvm group
sudo usermod -aG kvm $USER
# Log out and back in
# Or check group ownership
ls -la /dev/kvm
# Should be: crw-rw---- 1 root kvm
```
### VM Starts But No SSH Access
1. **VM might still be booting**. Wait 30-60 seconds for first boot.
2. **Check VM has an IP**:
```bash
volt vm list -o wide
```
3. **SSH might not be installed/running in the VM**:
```bash
volt vm exec <name> -- systemctl status sshd
```
4. **Network connectivity**:
```bash
# From host, ping the VM's IP
ping <vm-ip>
```
### VM Performance Issues
Apply a tuning profile:
```bash
volt tune profile apply <vm-name> --profile database
```
Or tune individually:
```bash
# Pin CPUs
volt tune cpu pin <vm-name> --cpus 4,5,6,7
# Enable hugepages
volt tune memory hugepages --enable --size 2M --count 4096
# Set I/O scheduler
volt tune io scheduler /dev/sda --scheduler none
```
---
## Service Issues
### Service Won't Start
```bash
# Check status
volt service status <name>
# View logs
volt service logs <name>
# View the unit file for issues
volt service show <name>
```
Common causes:
1. **ExecStart path doesn't exist**:
```bash
which <binary-path>
```
2. **User/group doesn't exist**:
```bash
id <service-user>
# Create if missing
sudo useradd -r -s /bin/false <service-user>
```
3. **Working directory doesn't exist**:
```bash
ls -la <workdir-path>
sudo mkdir -p <workdir-path>
```
4. **Port already in use**:
```bash
ss -tlnp | grep <port>
```
### Service Keeps Restarting
Check the restart loop:
```bash
volt service status <name>
volt service logs <name> --tail 50
```
If the service fails immediately on start, systemd may hit the start rate limit. Check:
```bash
# View full systemd status
systemctl status <name>.service
```
Temporarily adjust restart behavior:
```bash
volt service edit <name> --inline "RestartSec=10"
```
### Can't Delete a Service
```bash
# If it says "refusing to delete system unit"
# Volt protects system services. Only user-created services can be deleted.
# If stuck, manually:
volt service stop <name>
volt service disable <name>
volt service delete <name>
```
---
## Networking Issues
### No Network Connectivity from Container
1. **Check bridge exists**:
```bash
volt net bridge list
```
If `volt0` is missing:
```bash
sudo volt net bridge create volt0 --subnet 10.0.0.0/24
```
2. **Check IP forwarding**:
```bash
volt tune sysctl get net.ipv4.ip_forward
# Should be 1. If not:
sudo volt tune sysctl set net.ipv4.ip_forward 1 --persist
```
3. **Check NAT/masquerade rules**:
```bash
sudo nft list ruleset | grep masquerade
```
4. **Check container has an IP**:
```bash
volt container inspect <name>
```
### Workloads Can't Resolve Names
1. **Check internal DNS**:
```bash
volt net dns list
```
2. **Flush DNS cache**:
```bash
volt net dns flush
```
3. **Check upstream DNS in config**:
```bash
volt config get network.dns.upstream
```
### Port Forward Not Working
1. **Verify the forward exists**:
```bash
volt net port list
```
2. **Check the target is running and listening**:
```bash
volt ps | grep <target>
volt container exec <target> -- ss -tlnp
```
3. **Check firewall rules**:
```bash
volt net firewall list
```
4. **Check for host-level firewall conflicts**:
```bash
sudo nft list ruleset
sudo iptables -L -n # if iptables is also in use
```
### Firewall Rule Not Taking Effect
1. **List current rules**:
```bash
volt net firewall list
```
2. **Rule ordering matters**. More specific rules should come first. If a broad `deny` rule precedes your `accept` rule, traffic will be blocked.
3. **Flush and recreate if confused**:
```bash
volt net firewall flush
# Re-add rules in the correct order
```
---
## Daemon Issues
### Daemon Not Running
```bash
volt daemon status
# If not running:
sudo volt daemon start
```
Check systemd:
```bash
systemctl status volt.service
journalctl -u volt.service --no-pager -n 50
```
### Daemon Won't Start
1. **Socket in use**:
```bash
ls -la /var/run/volt/volt.sock
# Remove stale socket
sudo rm /var/run/volt/volt.sock
sudo volt daemon start
```
2. **Config file invalid**:
```bash
volt config validate
```
3. **Missing directories**:
```bash
sudo mkdir -p /var/lib/volt /var/run/volt /var/log/volt /var/cache/volt /etc/volt
```
4. **PID file stale**:
```bash
cat /var/run/volt/volt.pid
# Check if that PID exists
ps -p $(cat /var/run/volt/volt.pid)
# If no process, remove it
sudo rm /var/run/volt/volt.pid
sudo volt daemon start
```
### Commands Timeout
```bash
# Increase timeout
volt --timeout 120 <command>
# Or check if daemon is overloaded
volt daemon status
volt top
```
---
## Permission Issues
### "Permission denied" Errors
Most state-changing operations require root or `volt` group membership:
```bash
# Add user to volt group
sudo usermod -aG volt $USER
# Log out and back in for group change to take effect
# Or use sudo
sudo volt container create --name web --image ubuntu:24.04 --start
```
### Read-Only Operations Work, Write Operations Fail
This is expected for non-root, non-`volt-group` users. These commands always work:
```bash
volt ps # Read-only
volt top # Read-only
volt logs <name> # Read-only
volt service list # Read-only
volt config show # Read-only
```
These require privileges:
```bash
volt container create # Needs root/volt group
volt service create # Needs root
volt net firewall add # Needs root
volt tune sysctl set # Needs root
```
---
## Storage Issues
### Disk Space Full
```bash
# Check disk usage
volt system info
# Clean up unused images
volt image list
volt image delete <unused-image>
# Clean CAS garbage
volt cas gc --dry-run
volt cas gc
# Clear cache (safe to delete)
sudo rm -rf /var/cache/volt/*
# Check container sizes
du -sh /var/lib/volt/containers/*/
```
### CAS Integrity Errors
```bash
# Verify CAS store
volt cas verify
# If corrupted objects are found, re-pull affected images
volt image delete <affected-image>
volt image pull <image>
```
### Volume Won't Attach
1. **Volume exists?**
```bash
volt volume list
```
2. **Already attached?**
```bash
volt volume inspect <name>
```
3. **Target workload running?**
Volumes can typically only be attached to running workloads.
---
## Compose Issues
### `volt compose up` Fails
1. **Validate the compose file**:
```bash
volt compose config
```
2. **Missing images**:
```bash
volt compose pull
```
3. **Dependency issues**: Check that `depends_on` targets exist in the file and their conditions can be met.
4. **Network conflicts**: If subnets overlap with existing networks:
```bash
volt net list
```
### Environment Variables Not Resolving
```bash
# Check .env file exists in same directory as compose file
cat .env
# Variables must be set in the host environment or .env file
export DB_PASSWORD=mysecret
volt compose up
```
Undefined variables with no default cause an error. Use default syntax:
```yaml
environment:
DB_PASSWORD: "${DB_PASSWORD:-defaultpass}"
```
---
## Exit Codes
Use exit codes in scripts for error handling:
| Code | Meaning | Action |
|------|---------|--------|
| 0 | Success | Continue |
| 2 | Bad arguments | Fix command syntax |
| 3 | Not found | Resource doesn't exist |
| 4 | Already exists | Resource name taken |
| 5 | Permission denied | Use sudo or join `volt` group |
| 6 | Daemon down | `sudo volt daemon start` |
| 7 | Timeout | Retry with `--timeout` |
| 9 | Conflict | Resource in wrong state |
```bash
volt container start web
case $? in
0) echo "Started" ;;
3) echo "Container not found" ;;
5) echo "Permission denied — try sudo" ;;
6) echo "Daemon not running — sudo volt daemon start" ;;
9) echo "Already running" ;;
*) echo "Error: $?" ;;
esac
```
---
## Collecting Debug Info
When reporting issues, gather:
```bash
# Version
volt --version
# System info
volt system info -o json
# Health check
volt system health
# Daemon logs
journalctl -u volt.service --no-pager -n 100
# Run the failing command with debug
volt --debug <failing-command>
# Audit log
tail -50 /var/log/volt/audit.log
```
## Factory Reset
If all else fails, reset Volt to defaults. **This is destructive** — it stops all workloads and removes all configuration.
```bash
sudo volt system reset --confirm
```
After reset, reinitialize:
```bash
sudo volt daemon start
volt system health
```