Volt CLI: source-available under AGPSL v5.0
Complete infrastructure platform CLI: - Container runtime (systemd-nspawn) - VoltVisor VMs (Neutron Stardust / QEMU) - Stellarium CAS (content-addressed storage) - ORAS Registry - GitOps integration - Landlock LSM security - Compose orchestration - Mesh networking Copyright (c) Armored Gates LLC. All rights reserved. Licensed under AGPSL v5.0
This commit is contained in:
631
docs/troubleshooting.md
Normal file
631
docs/troubleshooting.md
Normal file
@@ -0,0 +1,631 @@
|
||||
# Troubleshooting
|
||||
|
||||
Common issues and solutions for the Volt Platform.
|
||||
|
||||
## Quick Diagnostics
|
||||
|
||||
Run these first to understand the state of your system:
|
||||
|
||||
```bash
|
||||
# Platform health check
|
||||
volt system health
|
||||
|
||||
# Platform info
|
||||
volt system info
|
||||
|
||||
# What's running?
|
||||
volt ps --all
|
||||
|
||||
# Daemon status
|
||||
volt daemon status
|
||||
|
||||
# Network status
|
||||
volt net status
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Container Issues
|
||||
|
||||
### Container Won't Start
|
||||
|
||||
**Symptom**: `volt container start <name>` fails or returns an error.
|
||||
|
||||
**Check the logs first**:
|
||||
```bash
|
||||
volt container logs <name>
|
||||
volt logs <name>
|
||||
```
|
||||
|
||||
**Common causes**:
|
||||
|
||||
1. **Image not found**
|
||||
```
|
||||
Error: image "ubuntu:24.04" not found
|
||||
```
|
||||
Pull the image first:
|
||||
```bash
|
||||
sudo volt image pull ubuntu:24.04
|
||||
volt image list
|
||||
```
|
||||
|
||||
2. **Name conflict**
|
||||
```
|
||||
Error: container "web" already exists
|
||||
```
|
||||
Delete the existing container or use a different name:
|
||||
```bash
|
||||
volt container delete web
|
||||
```
|
||||
|
||||
3. **systemd-nspawn not installed**
|
||||
```
|
||||
Error: systemd-nspawn not found
|
||||
```
|
||||
Install the systemd-container package:
|
||||
```bash
|
||||
# Debian/Ubuntu
|
||||
sudo apt install systemd-container
|
||||
|
||||
# Fedora/Rocky
|
||||
sudo dnf install systemd-container
|
||||
```
|
||||
|
||||
4. **Rootfs directory missing or corrupt**
|
||||
```bash
|
||||
ls -la /var/lib/volt/containers/<name>/rootfs/
|
||||
```
|
||||
If empty or missing, recreate the container:
|
||||
```bash
|
||||
volt container delete <name>
|
||||
volt container create --name <name> --image <image> --start
|
||||
```
|
||||
|
||||
5. **Resource limits too restrictive**
|
||||
Try creating without limits, then add them:
|
||||
```bash
|
||||
volt container create --name test --image ubuntu:24.04 --start
|
||||
volt container update test --memory 512M
|
||||
```
|
||||
|
||||
### Container Starts But Process Exits Immediately
|
||||
|
||||
**Check the main process**:
|
||||
```bash
|
||||
volt container logs <name>
|
||||
volt container inspect <name>
|
||||
```
|
||||
|
||||
Common cause: the container has no init process or the specified command doesn't exist in the image.
|
||||
|
||||
```bash
|
||||
# Try interactive shell to debug
|
||||
volt container shell <name>
|
||||
```
|
||||
|
||||
### Can't Exec Into Container
|
||||
|
||||
**Symptom**: `volt container exec` fails.
|
||||
|
||||
1. **Container not running**:
|
||||
```bash
|
||||
volt ps --all | grep <name>
|
||||
volt container start <name>
|
||||
```
|
||||
|
||||
2. **Shell not available in image**:
|
||||
The default shell (`/bin/sh`) might not exist in minimal images. Check:
|
||||
```bash
|
||||
volt container exec <name> -- /bin/bash
|
||||
volt container exec <name> -- /bin/busybox sh
|
||||
```
|
||||
|
||||
### Container Resource Limits Not Working
|
||||
|
||||
Verify cgroup v2 is enabled:
|
||||
```bash
|
||||
mount | grep cgroup2
|
||||
# Should show: cgroup2 on /sys/fs/cgroup type cgroup2
|
||||
```
|
||||
|
||||
Check the cgroup settings:
|
||||
```bash
|
||||
volt container inspect <name> -o json | grep -i memory
|
||||
cat /sys/fs/cgroup/system.slice/volt-container@<name>.service/memory.max
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## VM Issues
|
||||
|
||||
### VM Won't Start
|
||||
|
||||
**Check prerequisites**:
|
||||
```bash
|
||||
# KVM available?
|
||||
ls -la /dev/kvm
|
||||
|
||||
# QEMU installed?
|
||||
which qemu-system-x86_64
|
||||
|
||||
# Kernel modules loaded?
|
||||
lsmod | grep kvm
|
||||
```
|
||||
|
||||
**If `/dev/kvm` doesn't exist**:
|
||||
```bash
|
||||
# Load KVM modules
|
||||
sudo modprobe kvm
|
||||
sudo modprobe kvm_intel # or kvm_amd
|
||||
|
||||
# Check BIOS: virtualization must be enabled (VT-x / AMD-V)
|
||||
dmesg | grep -i kvm
|
||||
```
|
||||
|
||||
**If permission denied on `/dev/kvm`**:
|
||||
```bash
|
||||
# Add user to kvm group
|
||||
sudo usermod -aG kvm $USER
|
||||
# Log out and back in
|
||||
|
||||
# Or check group ownership
|
||||
ls -la /dev/kvm
|
||||
# Should be: crw-rw---- 1 root kvm
|
||||
```
|
||||
|
||||
### VM Starts But No SSH Access
|
||||
|
||||
1. **VM might still be booting**. Wait 30-60 seconds for first boot.
|
||||
|
||||
2. **Check VM has an IP**:
|
||||
```bash
|
||||
volt vm list -o wide
|
||||
```
|
||||
|
||||
3. **SSH might not be installed/running in the VM**:
|
||||
```bash
|
||||
volt vm exec <name> -- systemctl status sshd
|
||||
```
|
||||
|
||||
4. **Network connectivity**:
|
||||
```bash
|
||||
# From host, ping the VM's IP
|
||||
ping <vm-ip>
|
||||
```
|
||||
|
||||
### VM Performance Issues
|
||||
|
||||
Apply a tuning profile:
|
||||
```bash
|
||||
volt tune profile apply <vm-name> --profile database
|
||||
```
|
||||
|
||||
Or tune individually:
|
||||
```bash
|
||||
# Pin CPUs
|
||||
volt tune cpu pin <vm-name> --cpus 4,5,6,7
|
||||
|
||||
# Enable hugepages
|
||||
volt tune memory hugepages --enable --size 2M --count 4096
|
||||
|
||||
# Set I/O scheduler
|
||||
volt tune io scheduler /dev/sda --scheduler none
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Service Issues
|
||||
|
||||
### Service Won't Start
|
||||
|
||||
```bash
|
||||
# Check status
|
||||
volt service status <name>
|
||||
|
||||
# View logs
|
||||
volt service logs <name>
|
||||
|
||||
# View the unit file for issues
|
||||
volt service show <name>
|
||||
```
|
||||
|
||||
Common causes:
|
||||
|
||||
1. **ExecStart path doesn't exist**:
|
||||
```bash
|
||||
which <binary-path>
|
||||
```
|
||||
|
||||
2. **User/group doesn't exist**:
|
||||
```bash
|
||||
id <service-user>
|
||||
# Create if missing
|
||||
sudo useradd -r -s /bin/false <service-user>
|
||||
```
|
||||
|
||||
3. **Working directory doesn't exist**:
|
||||
```bash
|
||||
ls -la <workdir-path>
|
||||
sudo mkdir -p <workdir-path>
|
||||
```
|
||||
|
||||
4. **Port already in use**:
|
||||
```bash
|
||||
ss -tlnp | grep <port>
|
||||
```
|
||||
|
||||
### Service Keeps Restarting
|
||||
|
||||
Check the restart loop:
|
||||
```bash
|
||||
volt service status <name>
|
||||
volt service logs <name> --tail 50
|
||||
```
|
||||
|
||||
If the service fails immediately on start, systemd may hit the start rate limit. Check:
|
||||
```bash
|
||||
# View full systemd status
|
||||
systemctl status <name>.service
|
||||
```
|
||||
|
||||
Temporarily adjust restart behavior:
|
||||
```bash
|
||||
volt service edit <name> --inline "RestartSec=10"
|
||||
```
|
||||
|
||||
### Can't Delete a Service
|
||||
|
||||
```bash
|
||||
# If it says "refusing to delete system unit"
|
||||
# Volt protects system services. Only user-created services can be deleted.
|
||||
|
||||
# If stuck, manually:
|
||||
volt service stop <name>
|
||||
volt service disable <name>
|
||||
volt service delete <name>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Networking Issues
|
||||
|
||||
### No Network Connectivity from Container
|
||||
|
||||
1. **Check bridge exists**:
|
||||
```bash
|
||||
volt net bridge list
|
||||
```
|
||||
If `volt0` is missing:
|
||||
```bash
|
||||
sudo volt net bridge create volt0 --subnet 10.0.0.0/24
|
||||
```
|
||||
|
||||
2. **Check IP forwarding**:
|
||||
```bash
|
||||
volt tune sysctl get net.ipv4.ip_forward
|
||||
# Should be 1. If not:
|
||||
sudo volt tune sysctl set net.ipv4.ip_forward 1 --persist
|
||||
```
|
||||
|
||||
3. **Check NAT/masquerade rules**:
|
||||
```bash
|
||||
sudo nft list ruleset | grep masquerade
|
||||
```
|
||||
|
||||
4. **Check container has an IP**:
|
||||
```bash
|
||||
volt container inspect <name>
|
||||
```
|
||||
|
||||
### Workloads Can't Resolve Names
|
||||
|
||||
1. **Check internal DNS**:
|
||||
```bash
|
||||
volt net dns list
|
||||
```
|
||||
|
||||
2. **Flush DNS cache**:
|
||||
```bash
|
||||
volt net dns flush
|
||||
```
|
||||
|
||||
3. **Check upstream DNS in config**:
|
||||
```bash
|
||||
volt config get network.dns.upstream
|
||||
```
|
||||
|
||||
### Port Forward Not Working
|
||||
|
||||
1. **Verify the forward exists**:
|
||||
```bash
|
||||
volt net port list
|
||||
```
|
||||
|
||||
2. **Check the target is running and listening**:
|
||||
```bash
|
||||
volt ps | grep <target>
|
||||
volt container exec <target> -- ss -tlnp
|
||||
```
|
||||
|
||||
3. **Check firewall rules**:
|
||||
```bash
|
||||
volt net firewall list
|
||||
```
|
||||
|
||||
4. **Check for host-level firewall conflicts**:
|
||||
```bash
|
||||
sudo nft list ruleset
|
||||
sudo iptables -L -n # if iptables is also in use
|
||||
```
|
||||
|
||||
### Firewall Rule Not Taking Effect
|
||||
|
||||
1. **List current rules**:
|
||||
```bash
|
||||
volt net firewall list
|
||||
```
|
||||
|
||||
2. **Rule ordering matters**. More specific rules should come first. If a broad `deny` rule precedes your `accept` rule, traffic will be blocked.
|
||||
|
||||
3. **Flush and recreate if confused**:
|
||||
```bash
|
||||
volt net firewall flush
|
||||
# Re-add rules in the correct order
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Daemon Issues
|
||||
|
||||
### Daemon Not Running
|
||||
|
||||
```bash
|
||||
volt daemon status
|
||||
# If not running:
|
||||
sudo volt daemon start
|
||||
```
|
||||
|
||||
Check systemd:
|
||||
```bash
|
||||
systemctl status volt.service
|
||||
journalctl -u volt.service --no-pager -n 50
|
||||
```
|
||||
|
||||
### Daemon Won't Start
|
||||
|
||||
1. **Socket in use**:
|
||||
```bash
|
||||
ls -la /var/run/volt/volt.sock
|
||||
# Remove stale socket
|
||||
sudo rm /var/run/volt/volt.sock
|
||||
sudo volt daemon start
|
||||
```
|
||||
|
||||
2. **Config file invalid**:
|
||||
```bash
|
||||
volt config validate
|
||||
```
|
||||
|
||||
3. **Missing directories**:
|
||||
```bash
|
||||
sudo mkdir -p /var/lib/volt /var/run/volt /var/log/volt /var/cache/volt /etc/volt
|
||||
```
|
||||
|
||||
4. **PID file stale**:
|
||||
```bash
|
||||
cat /var/run/volt/volt.pid
|
||||
# Check if that PID exists
|
||||
ps -p $(cat /var/run/volt/volt.pid)
|
||||
# If no process, remove it
|
||||
sudo rm /var/run/volt/volt.pid
|
||||
sudo volt daemon start
|
||||
```
|
||||
|
||||
### Commands Timeout
|
||||
|
||||
```bash
|
||||
# Increase timeout
|
||||
volt --timeout 120 <command>
|
||||
|
||||
# Or check if daemon is overloaded
|
||||
volt daemon status
|
||||
volt top
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Permission Issues
|
||||
|
||||
### "Permission denied" Errors
|
||||
|
||||
Most state-changing operations require root or `volt` group membership:
|
||||
|
||||
```bash
|
||||
# Add user to volt group
|
||||
sudo usermod -aG volt $USER
|
||||
# Log out and back in for group change to take effect
|
||||
|
||||
# Or use sudo
|
||||
sudo volt container create --name web --image ubuntu:24.04 --start
|
||||
```
|
||||
|
||||
### Read-Only Operations Work, Write Operations Fail
|
||||
|
||||
This is expected for non-root, non-`volt-group` users. These commands always work:
|
||||
|
||||
```bash
|
||||
volt ps # Read-only
|
||||
volt top # Read-only
|
||||
volt logs <name> # Read-only
|
||||
volt service list # Read-only
|
||||
volt config show # Read-only
|
||||
```
|
||||
|
||||
These require privileges:
|
||||
|
||||
```bash
|
||||
volt container create # Needs root/volt group
|
||||
volt service create # Needs root
|
||||
volt net firewall add # Needs root
|
||||
volt tune sysctl set # Needs root
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Storage Issues
|
||||
|
||||
### Disk Space Full
|
||||
|
||||
```bash
|
||||
# Check disk usage
|
||||
volt system info
|
||||
|
||||
# Clean up unused images
|
||||
volt image list
|
||||
volt image delete <unused-image>
|
||||
|
||||
# Clean CAS garbage
|
||||
volt cas gc --dry-run
|
||||
volt cas gc
|
||||
|
||||
# Clear cache (safe to delete)
|
||||
sudo rm -rf /var/cache/volt/*
|
||||
|
||||
# Check container sizes
|
||||
du -sh /var/lib/volt/containers/*/
|
||||
```
|
||||
|
||||
### CAS Integrity Errors
|
||||
|
||||
```bash
|
||||
# Verify CAS store
|
||||
volt cas verify
|
||||
|
||||
# If corrupted objects are found, re-pull affected images
|
||||
volt image delete <affected-image>
|
||||
volt image pull <image>
|
||||
```
|
||||
|
||||
### Volume Won't Attach
|
||||
|
||||
1. **Volume exists?**
|
||||
```bash
|
||||
volt volume list
|
||||
```
|
||||
|
||||
2. **Already attached?**
|
||||
```bash
|
||||
volt volume inspect <name>
|
||||
```
|
||||
|
||||
3. **Target workload running?**
|
||||
Volumes can typically only be attached to running workloads.
|
||||
|
||||
---
|
||||
|
||||
## Compose Issues
|
||||
|
||||
### `volt compose up` Fails
|
||||
|
||||
1. **Validate the compose file**:
|
||||
```bash
|
||||
volt compose config
|
||||
```
|
||||
|
||||
2. **Missing images**:
|
||||
```bash
|
||||
volt compose pull
|
||||
```
|
||||
|
||||
3. **Dependency issues**: Check that `depends_on` targets exist in the file and their conditions can be met.
|
||||
|
||||
4. **Network conflicts**: If subnets overlap with existing networks:
|
||||
```bash
|
||||
volt net list
|
||||
```
|
||||
|
||||
### Environment Variables Not Resolving
|
||||
|
||||
```bash
|
||||
# Check .env file exists in same directory as compose file
|
||||
cat .env
|
||||
|
||||
# Variables must be set in the host environment or .env file
|
||||
export DB_PASSWORD=mysecret
|
||||
volt compose up
|
||||
```
|
||||
|
||||
Undefined variables with no default cause an error. Use default syntax:
|
||||
```yaml
|
||||
environment:
|
||||
DB_PASSWORD: "${DB_PASSWORD:-defaultpass}"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Exit Codes
|
||||
|
||||
Use exit codes in scripts for error handling:
|
||||
|
||||
| Code | Meaning | Action |
|
||||
|------|---------|--------|
|
||||
| 0 | Success | Continue |
|
||||
| 2 | Bad arguments | Fix command syntax |
|
||||
| 3 | Not found | Resource doesn't exist |
|
||||
| 4 | Already exists | Resource name taken |
|
||||
| 5 | Permission denied | Use sudo or join `volt` group |
|
||||
| 6 | Daemon down | `sudo volt daemon start` |
|
||||
| 7 | Timeout | Retry with `--timeout` |
|
||||
| 9 | Conflict | Resource in wrong state |
|
||||
|
||||
```bash
|
||||
volt container start web
|
||||
case $? in
|
||||
0) echo "Started" ;;
|
||||
3) echo "Container not found" ;;
|
||||
5) echo "Permission denied — try sudo" ;;
|
||||
6) echo "Daemon not running — sudo volt daemon start" ;;
|
||||
9) echo "Already running" ;;
|
||||
*) echo "Error: $?" ;;
|
||||
esac
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Collecting Debug Info
|
||||
|
||||
When reporting issues, gather:
|
||||
|
||||
```bash
|
||||
# Version
|
||||
volt --version
|
||||
|
||||
# System info
|
||||
volt system info -o json
|
||||
|
||||
# Health check
|
||||
volt system health
|
||||
|
||||
# Daemon logs
|
||||
journalctl -u volt.service --no-pager -n 100
|
||||
|
||||
# Run the failing command with debug
|
||||
volt --debug <failing-command>
|
||||
|
||||
# Audit log
|
||||
tail -50 /var/log/volt/audit.log
|
||||
```
|
||||
|
||||
## Factory Reset
|
||||
|
||||
If all else fails, reset Volt to defaults. **This is destructive** — it stops all workloads and removes all configuration.
|
||||
|
||||
```bash
|
||||
sudo volt system reset --confirm
|
||||
```
|
||||
|
||||
After reset, reinitialize:
|
||||
```bash
|
||||
sudo volt daemon start
|
||||
volt system health
|
||||
```
|
||||
Reference in New Issue
Block a user