Skip to content

Telegraf Role

Overview

Telegraf is a plugin-driven server agent designed for collecting and sending metrics. It supports a wide variety of input plugins for system metrics, application metrics, and custom data sources, making it a versatile tool for monitoring. Its primary purpose in the solti-monitoring collection is to collect various metrics and send them to InfluxDB for storage and analysis, operating as a lightweight agent with minimal resource overhead.

Requirements

  • Ansible 2.9+
  • Target system with internet access (for package installation).
  • A running InfluxDB instance (local or remote) to receive metrics.

Installation / Quick Start

The telegraf role installs and configures Telegraf on target hosts.

Basic Installation

- hosts: monitoring_clients
  become: true
  roles:
    - role: jackaltx.solti_monitoring.telegraf
      vars:
        telegraf_output_influxdb: true
        telegraf_output_url: "http://monitor.example.com:8086"
        telegraf_output_token: "{{ vault_telegraf_token }}" # Store in Ansible Vault
        telegraf_output_org: "myorg"
        telegraf_output_bucket: "telegraf"

Role Variables / Configuration

Output Configuration

Name Description Default
telegraf_output_influxdb Enable InfluxDB v2 output plugin. true
telegraf_output_url URL of the InfluxDB instance. ""
telegraf_output_token Authentication token for InfluxDB. ""
telegraf_output_org InfluxDB organization name. ""
telegraf_output_bucket InfluxDB bucket name. telegraf

Input Plugins

System metrics (cpu, disk, diskio, mem, net, system, processes) are enabled by default.

Name Description Default
telegraf_enable_docker Enable Docker container metrics. false
telegraf_enable_nginx Enable Nginx web server metrics. false
telegraf_enable_postgresql Enable PostgreSQL database metrics. false
telegraf_enable_redis Enable Redis metrics. false

Global Tags

Name Description Default
telegraf_global_tags Dictionary of custom tags to add to all collected metrics. {}

Usage

Testing Configuration

Validate Telegraf configuration without starting the service:

telegraf --config /etc/telegraf/telegraf.conf --test

Example Configurations

The role supports various configurations, such as basic system monitoring, remote collection via WireGuard, and multi-output setups.

Service Management

Telegraf runs as a systemd service.

# Check status
systemctl status telegraf

# Start/stop/restart
systemctl start telegraf
systemctl stop telegraf
systemctl restart telegraf

# Enable at boot
systemctl enable telegraf

Troubleshooting

Check Logs

journalctl -u telegraf -f

Test Connection to InfluxDB

curl -I http://monitor.example.com:8086/health

Verify Metrics Collection

telegraf --config /etc/telegraf/telegraf.conf --test --input-filter cpu,mem,disk

Common Issues

  1. Connection refused: InfluxDB not reachable (check network, InfluxDB status, firewall).
  2. Authentication failed: Invalid token (verify token permissions).
  3. No data in InfluxDB: Metrics not being sent (check logs, output config, test with --test).

Role-Specific Sections

Performance Tuning

  • Collection Interval: Adjust telegraf_interval and telegraf_flush_interval for collection frequency.
  • Metric Filtering: Use telegraf_metric_filters to reduce metric volume.

Security Considerations

  • Store tokens in Ansible Vault.
  • Use HTTPS endpoints when possible.
  • Employ WireGuard for remote collectors.
  • Grant minimum required permissions (least privilege).

Reference Deployment

Refer to the Reference Deployments chapter for real-world examples, such as monitor11.example.com (server with local Telegraf) and ispconfig3.example.com (client shipping metrics via WireGuard).

Reference

License

MIT

Author

Created by jackaltx and Claude.