Capabilities-Based Testing¶
What Are Capabilities?¶
Instead of testing roles in isolation, Solti-Monitoring tests capabilities - complete functional systems that involve multiple roles working together.
Current Capabilities:
- logs - Log collection and forwarding system (Loki + Alloy)
- metrics - Metrics collection and storage system (InfluxDB + Telegraf)
This approach ensures that components work together, not just individually.
Why Test Capabilities?¶
Traditional approach (role-by-role):
✓ InfluxDB starts successfully
✓ Telegraf starts successfully
✗ Telegraf cannot connect to InfluxDB (authentication missing!)
Capabilities approach:
✓ InfluxDB starts and creates admin token
✓ Telegraf starts and retrieves token
✓ Telegraf writes metrics to InfluxDB
✓ Metrics query returns data
The second approach catches integration issues that unit tests miss.
How It Works¶
1. Define Capabilities¶
Capabilities are defined in molecule/vars/capabilities.yml:
monitoring_capabilities:
metrics:
roles: # What to deploy
- influxdb
- influxdb3
- telegraf
verify_role_tasks: # Per-role checks
influxdb:
- verify.yml
influxdb3:
- verify.yml
- verify-collecting.yml
verify_tasks: # Integration checks
- verify-metrics.yml
- verify-metrics-v3.yml
2. Two-Level Verification¶
Level 1: Role Verification (does the service work?)
- Check service is running
- Verify API responds
- Test configuration is valid
Level 2: Integration Verification (do services talk to each other?)
- Verify Telegraf connects to InfluxDB
- Write test data through Telegraf
- Query data back from InfluxDB
- Confirm metrics are flowing
3. Run Tests¶
cd solti-monitoring
# Test all capabilities
./run-podman-tests.sh
# Test specific capability
./run-podman-tests.sh --tests metrics
# Test on Proxmox VMs
PROXMOX_DISTRO=debian12 ./run-proxmox-tests.sh
Real-World Example: Metrics Capability¶
What Gets Deployed¶
┌─────────────┐
│ InfluxDB │ ← Metrics storage (v2)
│ :8086 │
└─────────────┘
↑
│ writes metrics
│
┌─────────────┐
│ Telegraf │ ← Metrics collector
│ (client) │
└─────────────┘
↑
│ also writes to
│
┌─────────────┐
│ InfluxDB v3 │ ← Next-gen storage
│ :8181 │
└─────────────┘
What Gets Tested¶
Role-Level Checks: - InfluxDB v2 service running? ✓ - InfluxDB v3 service running? ✓ - Telegraf service running? ✓ - Port 8086 listening? ✓ - Port 8181 listening? ✓
Integration Checks: - Telegraf connected to InfluxDB v2? ✓ - Telegraf connected to InfluxDB v3? ✓ - Can write test data to v2? ✓ - Can write test data to v3? ✓ - Can query data back? ✓
Test Platforms¶
Podman (Fast Local Testing)¶
Best for: - Quick feedback during development - Testing across multiple distributions - CI/CD pipeline
Characteristics: - Containers start in seconds - Full systemd support - Network isolation - Multiple distros in parallel
Run:
Proxmox (Production-Like Testing)¶
Best for: - Final validation before release - VM-specific scenarios - Production simulation
Characteristics: - Real VMs with complete OS - Cloud-init integration - Slower but higher fidelity - Tests VM provisioning workflow
Run:
Supported Distributions¶
Podman Testing¶
- Debian 12 (bookworm)
- Debian 13 (trixie)
- Rocky Linux 9
- Rocky Linux 10
- Ubuntu 24.04
Proxmox Testing¶
- Debian 12
- Rocky Linux 9
InfluxDB Dual-Version Testing¶
The metrics capability tests both InfluxDB versions simultaneously:
┌──────────────┐
│ Telegraf │
└──────┬───────┘
│
┌──────┴──────┐
│ │
┌────▼────┐ ┌───▼─────┐
│ InfluxDB│ │InfluxDB │
│ v2 │ │ v3 │
│ :8086 │ │ :8181 │
└─────────┘ └─────────┘
Why both? - Tests migration/coexistence scenario - Validates Telegraf can write to both - Real-world use case during transition - Ensures no port conflicts
Configuration:
telegraf_outputs: ['localhost', 'localhost_v3']
telgraf2influxdb_configs:
localhost: # InfluxDB v2
port: 8086
localhost_v3: # InfluxDB v3
port: 8181
Test Execution Flow¶
Phase 1: Destroy¶
Clean up any existing test infrastructure
Phase 2: Create¶
- Podman: Start systemd containers
- Proxmox: Clone VMs from cloud-init templates
Phase 3: Prepare¶
- Install required packages
- Configure SSH access
- Wait for systems to be ready
Phase 4: Converge¶
- Deploy roles based on capability
- Apply configuration
- Start services
Phase 5: Verify¶
- Run role-level verification tasks
- Run integration verification tasks
- Generate test reports
Phase 6: Destroy¶
Clean up test infrastructure
Authentication in Tests¶
Testing Mode¶
When telegraf_testing: true (default in molecule):
- Tokens are auto-discovered from filesystem
- InfluxDB v2: Read from
influx auth list - InfluxDB v3: Read from
/root/.influxdb3-credentials - No manual configuration needed
Production Mode¶
When telegraf_testing: false (production deployments):
- Tokens must be pre-configured in inventory
- Uses
telgraf2influxdb_configsfrom group_vars - No auto-discovery
Secure Logging¶
By default, all credentials are hidden in test logs:
Debug mode (show credentials):
Verification Reports¶
All test results are saved to verify_output/:
verify_output/
├── podman-test-20250128-143022.out # Full test log
├── latest_test.out -> podman-test... # Symlink to latest
├── debian12/
│ ├── influxdb3-verify-uut-ct0.yml # Role verification
│ ├── verify-metrics-v3-status.yml # Integration results
│ └── debian12-consolidated.md # Summary report
└── rocky9/
└── ...
Report Format¶
Role-level reports (YAML):
Integration reports (YAML):
verify_result: passed
telegraf_connection: established
write_test: passed
query_test: passed
cpu_metrics_5min: 1234
Consolidated reports (Markdown): Human-readable summary with pass/fail status for all tests.
Capability Selection¶
Test specific capabilities:
# Logs only (Loki + Alloy)
./run-podman-tests.sh --tests logs
# Metrics only (InfluxDB + Telegraf)
./run-podman-tests.sh --tests metrics
# Both (default)
./run-podman-tests.sh --tests logs,metrics
Behind the scenes:
# molecule.yml
testing_capabilities: "{{ lookup('env', 'MOLECULE_CAPABILITIES', default='logs,metrics') | split(',') }}"
The converge playbook uses this to deploy only selected capabilities.
Common Scenarios¶
Quick Local Test¶
Tests metrics capability on all supported distributions in Podman containersSingle Distribution¶
Tests all capabilities on Debian 12 onlyProduction Validation¶
Full integration test on Debian 12 VMDebug Failed Test¶
Shows credentials in output for troubleshootingKeep Test Environment¶
cd solti-monitoring
molecule converge -s podman
# (skips destroy, containers stay running)
podman exec -it uut-ct0 bash
Troubleshooting¶
Container Won't Start¶
# Check cgroup version
ls /sys/fs/cgroup/
# Verify podman supports systemd
podman --version # Need 3.0+
SSH Connection Refused¶
# Check SSH in container
podman exec uut-ct0 systemctl status sshd
# Check port mapping
podman port uut-ct0
Authentication Failures¶
# Enable debug logging
MOLECULE_SECURE_LOGGING=false ./run-podman-tests.sh
# Check token files exist
podman exec uut-ct0 ls -la /root/.influxdb*
Tests Pass But Reports Missing¶
# Check report directory
ls -la verify_output/
# Verify report_root in molecule.yml
grep report_root molecule/podman/molecule.yml
Best Practices¶
1. Test Real Scenarios¶
✓ Good: Test complete data flow (Telegraf → InfluxDB → Query) ✗ Bad: Test Telegraf and InfluxDB separately
2. Meaningful Assertions¶
✓ Good: assert: connection_check shows telegraf pid
✗ Bad: assert: command returned 0
3. Clean Test Data¶
Tests should be idempotent - safe to run multiple times without side effects.
4. Distribution-Aware¶
Use ansible facts for distribution-specific checks:
5. Clear Failure Messages¶
fail_msg: "Telegraf not connected to InfluxDB v3 on port 8181"
success_msg: "Telegraf successfully connected"
What's Next?¶
The capabilities-based approach is evolving:
Current State: - Two capabilities (logs, metrics) - Dual InfluxDB version testing - Podman and Proxmox platforms - Manual test script execution
Future: - More capabilities (alerts, dashboards) - Parallel test execution - Unified test reports - Performance benchmarks - Automated PR testing
Related Documentation¶
For developers and AI agents:
- MOLECULE_TESTING_ARCHITECTURE.md - Complete technical reference
- capabilities.yml - Capability definitions
For CI/CD integration:
- CI/CD Integration - GitHub Actions setup
- Platform Matrix - Distribution support
For verification details:
- Verification Tasks - How verification works
- Test Scenarios - Scenario configurations