Retention Policies
Overview¶
This document was prepared for me when I was rearching Log storage. It contains some useful advise.
Retention policies define how long data is kept before being deleted. Proper retention configuration balances storage costs, compliance requirements, and data availability.
Why Retention Matters¶
- Cost Control: Limit storage growth
- Compliance: Meet regulatory requirements
- Performance: Smaller datasets query faster
- Capacity Planning: Predictable storage needs
InfluxDB Retention¶
Bucket-Level Retention¶
Each InfluxDB bucket has its own retention policy:
# Short-term bucket for detailed metrics
influxdb_buckets:
- name: "telegraf_hourly"
retention: "7d"
description: "High-resolution metrics"
- name: "telegraf_daily"
retention: "90d"
description: "Daily aggregates"
- name: "telegraf_monthly"
retention: "365d"
description: "Monthly summaries"
Setting Retention via Role¶
- role: jackaltx.solti_monitoring.influxdb
vars:
influxdb_bucket: "telegraf"
influxdb_retention: "30d"
Updating Retention¶
Change retention for existing bucket:
Via API:
curl -X PATCH "http://localhost:8086/api/v2/buckets/BUCKET_ID" \
-H "Authorization: Token YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"retentionRules": [{"type": "expire", "everySeconds": 7776000}]}'
Infinite Retention¶
Warning: Infinite retention leads to unbounded storage growth.
Loki Retention¶
Global Retention¶
Configure retention for all log streams:
Stream-Level Retention¶
Configure different retention per log type (via labels):
loki_retention_config:
- selector: '{service_type="audit"}'
retention: "365d" # Keep audit logs for 1 year
- selector: '{service_type="debug"}'
retention: "7d" # Keep debug logs for 1 week
- selector: '{service_type="application"}'
retention: "30d" # Keep application logs for 1 month
Compaction and Deletion¶
Loki automatically:
- Marks chunks older than retention for deletion
- Waits for
retention_delete_delay(default: 2h) - Deletes chunks during compaction
Retention Strategy by Data Type¶
Metrics (InfluxDB)¶
System metrics (CPU, memory, disk):
- Retention: 30-90 days
- Rationale: Sufficient for troubleshooting and capacity planning
Application metrics (app-specific):
- Retention: 30-90 days
- Rationale: Correlate with logs and events
Business metrics (KPIs, analytics):
- Retention: 365+ days
- Rationale: Year-over-year comparison, trend analysis
Logs (Loki)¶
Application logs:
- Retention: 30 days
- Rationale: Recent troubleshooting
Security logs (fail2ban, auth):
- Retention: 90-365 days
- Rationale: Compliance, security audits
Audit logs (admin actions):
- Retention: 365+ days
- Rationale: Compliance requirements
Debug logs:
- Retention: 7 days
- Rationale: Temporary troubleshooting
Access logs (web, API):
- Retention: 30 days
- Rationale: Traffic analysis, debugging
Multi-Tier Retention¶
Downsampling Strategy¶
Keep detailed metrics short-term, aggregated metrics long-term:
Tier 1 (Raw data): 7 days, 10-second intervals Tier 2 (5-minute aggregates): 30 days Tier 3 (1-hour aggregates): 365 days Tier 4 (Daily summaries): Forever
influxdb_buckets:
- name: "telegraf_raw"
retention: "7d"
- name: "telegraf_5m"
retention: "30d"
- name: "telegraf_1h"
retention: "365d"
Create downsampling tasks:
option task = {name: "downsample_5m", every: 5m}
from(bucket: "telegraf_raw")
|> range(start: -10m)
|> aggregateWindow(every: 5m, fn: mean)
|> to(bucket: "telegraf_5m")
Compliance Considerations¶
Regulatory Requirements¶
GDPR: Personal data retention limits HIPAA: Healthcare data retention (6 years) SOX: Financial data retention (7 years) PCI-DSS: Payment card data retention limits
Implementing Compliance¶
- Identify regulated data: Which logs/metrics contain sensitive info
- Set appropriate retention: Match regulatory requirements
- Document policy: Maintain retention policy documentation
- Audit regularly: Verify retention settings
- Secure deletion: Ensure deleted data is unrecoverable
Example Compliance Configuration¶
# HIPAA-compliant retention
loki_retention_config:
- selector: '{data_class="phi"}'
retention: "2190d" # 6 years
# GDPR-compliant retention
- selector: '{data_class="pii"}'
retention: "90d" # Delete after purpose fulfilled
Monitoring Retention¶
Check Current Retention¶
InfluxDB:
Loki:
Storage Growth Tracking¶
Monitor storage growth to verify retention is working:
# InfluxDB storage
du -sh /var/lib/influxdb2
# Loki storage
du -sh /var/lib/loki
# S3 bucket size (if using S3)
aws s3 ls --summarize --recursive s3://influx11/
Alerts¶
Set up alerts for:
- Storage growth exceeding expected rate
- Retention deletion failures
- Storage approaching capacity
Adjusting Retention¶
When to Increase Retention¶
- Compliance requirements change
- Need longer historical data
- Storage costs decrease
- Business needs change
When to Decrease Retention¶
- Storage costs too high
- Running out of disk space
- Data rarely accessed
- Compliance allows shorter retention
Impact Assessment¶
Before changing retention:
- Query patterns: Check how old data is typically queried
- Storage impact: Calculate storage savings
- User impact: Notify users of retention changes
- Compliance: Verify changes meet requirements
Backup vs Retention¶
Retention: Automatic deletion of old data Backup: Separate copy for disaster recovery
Best practice: Retention ≠ Backup
- Set retention based on operational needs
- Create backups for disaster recovery
- Archive old data separately if needed for compliance
Cost Optimization¶
Storage Cost by Retention¶
Example calculation:
Current: 30d retention = 100 GB = $50/month
Option 1: 7d retention = 23 GB = $12/month (76% savings)
Option 2: 90d retention = 300 GB = $150/month (200% increase)
Retention Recommendations¶
Cost-optimized:
- System metrics: 7-14 days
- Application logs: 7-14 days
- Security logs: 30 days (minimum)
Balanced:
- System metrics: 30 days
- Application logs: 30 days
- Security logs: 90 days
Compliance-focused:
- System metrics: 90 days
- Application logs: 90 days
- Security logs: 365 days
Reference Deployment¶
See Reference Deployments chapter for retention configuration in production:
- monitor11.example.com - 30-day retention with S3 backend