Observability
Headwind provides comprehensive observability through Prometheus metrics, structured logging, and an optional integrated InfluxDB stack with real-time visualization.
Metrics Backends
Headwind supports multiple metrics backends with automatic discovery:
- Prometheus - Native Prometheus endpoint (always available)
- VictoriaMetrics - High-performance time-series database
- InfluxDB - Integrated observability stack (optional)
The observability dashboard automatically detects available backends and displays data accordingly.
Built-in Observability Dashboard
Access the dashboard at /observability on the Web UI:
# Via port-forward
kubectl port-forward -n headwind-system svc/headwind 8082:8082
# Visit: http://localhost:8082/observability
# Via ingress
# Visit: https://your-headwind-domain.com/observability
The dashboard provides:
- Real-time metrics cards - Current values for key metrics
- Time-series charts - Historical trends for all metrics
- Auto-refresh - Updates every 30 seconds
- Multi-backend support - Works with Prometheus, VictoriaMetrics, or InfluxDB
InfluxDB Integration
Quick Setup
Deploy Headwind with the integrated InfluxDB stack:
helm install headwind headwind/headwind \
-n headwind-system --create-namespace \
--set observability.create=true \
--set observability.influxdb.enabled=true \
--set observability.influxdb.storageSize=100Gi \
--set telegraf.enabled=true
This deploys:
- InfluxDB 2.7 - Time-series database for metrics storage
- Telegraf Sidecar - Scrapes Prometheus metrics and writes to InfluxDB
- Automatic Configuration - Token management, organization, and bucket setup
InfluxDB Configuration
observability:
create: true
influxdb:
enabled: true
version: "2.7"
# Data retention
retentionHours: 720 # 30 days
# Storage
storageSize: 100Gi
storageClass: "" # Use default storage class
# Organization and bucket
organization: "headwind"
bucket: "metrics"
# Admin credentials
adminUser: "admin"
adminPassword: "" # Auto-generated if empty
# Optional: Use existing secret
existingSecret: ""
Telegraf Configuration
The Telegraf sidecar automatically scrapes Headwind's Prometheus metrics and writes them to InfluxDB:
telegraf:
enabled: true
image:
repository: telegraf
tag: "1.28-alpine"
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 50m
memory: 64Mi
Token Management
InfluxDB tokens are automatically managed:
- First Install: A random 64-character token is generated
- Upgrades: Token persists across helm upgrades (using
lookupfunction) - ConfigMap: Token is automatically injected into ConfigMap for Headwind access
- Secret Retention: Secrets are kept on helm uninstall (
helm.sh/resource-policy: keep)
Verify token configuration:
# Check secret
kubectl get secret headwind-influxdb -n headwind-system \
-o jsonpath='{.data.admin-token}' | base64 -d
# Check ConfigMap
kubectl get configmap headwind-config -n headwind-system \
-o jsonpath='{.data.observability\.influxdb\.token}'
Accessing InfluxDB UI
Enable ingress for InfluxDB UI (optional):
observability:
influxdb:
ingress:
enabled: true
className: nginx
hosts:
- host: influxdb.example.com
paths:
- path: /
pathType: Prefix
Or use port-forward:
kubectl port-forward -n headwind-system headwind-influxdb-0 8086:8086
# Visit: http://localhost:8086
Login with:
- Username:
admin - Password: Retrieved from secret
kubectl get secret headwind-influxdb -n headwind-system \
-o jsonpath='{.data.admin-password}' | base64 -d
Prometheus Integration
Native Endpoint
Headwind exposes Prometheus metrics on port 9090:
kubectl port-forward -n headwind-system svc/headwind 9090:9090
curl http://localhost:9090/metrics
Prometheus Operator
Enable ServiceMonitor for automatic scraping:
serviceMonitor:
enabled: true
interval: 30s
scrapeTimeout: 10s
labels:
prometheus: kube-prometheus
PodMonitor
Alternatively, use PodMonitor:
podMonitor:
enabled: true
interval: 30s
labels:
prometheus: kube-prometheus
VictoriaMetrics Integration
Configure Headwind to query VictoriaMetrics:
configMap:
data:
observability.victoriametrics.enabled: "true"
observability.victoriametrics.url: "http://victoria-metrics.monitoring.svc.cluster.local:8428"
observability.metricsBackend: "victoriametrics"
Key Metrics
Resource Tracking
headwind_deployments_watched- Number of Deployments being monitoredheadwind_statefulsets_watched- Number of StatefulSets being monitoredheadwind_daemonsets_watched- Number of DaemonSets being monitoredheadwind_helm_releases_watched- Number of HelmReleases being monitored
Update Lifecycle
headwind_updates_pending- Pending update requestsheadwind_updates_applied_total- Successfully applied updatesheadwind_updates_failed_total- Failed updatesheadwind_updates_rejected_total- Rejected updatesheadwind_updates_skipped_interval_total- Updates skipped due to minimum interval
Event Processing
headwind_webhook_events_total- Total webhook events receivedheadwind_webhook_events_processed- Webhook events successfully processedheadwind_polling_cycles_total- Registry polling cycles completedheadwind_polling_new_tags_found_total- New image tags discovered via polling
Helm Charts
headwind_helm_chart_versions_checked_total- Chart versions checkedheadwind_helm_updates_found_total- New chart versions foundheadwind_helm_repository_queries_total- Repository queries performedheadwind_helm_repository_errors_total- Repository query errors
Notifications
headwind_notifications_sent_total- Total notifications sentheadwind_notifications_failed_total- Failed notification deliveriesheadwind_notifications_slack_sent_total- Slack notifications sentheadwind_notifications_teams_sent_total- Teams notifications sent
Performance
headwind_reconcile_duration_seconds- Controller reconciliation duration (histogram)headwind_reconcile_errors_total- Controller reconciliation errorsheadwind_helm_repository_query_duration_seconds- Helm repository query duration
See the complete metrics reference for all 35+ available metrics.
Structured Logging
Headwind uses structured JSON logging with configurable levels:
env:
RUST_LOG: "headwind=info,kube=info"
Log levels:
error- Errors onlywarn- Warnings and errorsinfo- Informational messages (default)debug- Debug informationtrace- Verbose trace logging
View logs:
# Main application logs
kubectl logs -n headwind-system deployment/headwind -c headwind
# Telegraf logs
kubectl logs -n headwind-system deployment/headwind -c telegraf
# InfluxDB logs
kubectl logs -n headwind-system headwind-influxdb-0
Audit Logging
All approval and rejection actions are logged with:
- Username - Who performed the action
- Action - approve or reject
- Resource - Type, namespace, and name
- Timestamp - When the action occurred
- Result - Success or failure
Example audit log:
{
"timestamp": "2025-11-08T23:00:00Z",
"username": "alice",
"action": "approve",
"resource_type": "Deployment",
"namespace": "default",
"name": "my-app",
"result": "success"
}
Filter audit logs:
kubectl logs -n headwind-system deployment/headwind -c headwind | grep '"target":"headwind::audit"'
Troubleshooting
InfluxDB 401 Unauthorized Errors
If you see unauthorized errors in logs:
-
Verify token matches between secret and ConfigMap:
# Token in secret
kubectl get secret headwind-influxdb -n headwind-system \
-o jsonpath='{.data.admin-token}' | base64 -d
# Token in ConfigMap
kubectl get configmap headwind-config -n headwind-system \
-o jsonpath='{.data.observability\.influxdb\.token}' -
If tokens don't match, restart Headwind:
kubectl rollout restart deployment/headwind -n headwind-system
Telegraf Not Writing Metrics
Check Telegraf logs:
kubectl logs -n headwind-system deployment/headwind -c telegraf
Common issues:
- Token mismatch: Verify InfluxDB token configuration
- Network issues: Check connectivity to InfluxDB service
- Permission issues: Verify Telegraf has write access to bucket
Metrics Not Showing in Dashboard
-
Verify backend is enabled:
kubectl get configmap headwind-config -n headwind-system -o yaml -
Check Headwind logs for query errors:
kubectl logs -n headwind-system deployment/headwind -c headwind | grep -i influx -
Test metric query directly:
kubectl exec -n headwind-system headwind-influxdb-0 -- \
influx query 'from(bucket: "metrics") |> range(start: -1h) |> limit(n: 5)'
Best Practices
- Storage Sizing: Plan for ~1GB per week of metrics with default retention
- Retention Policy: Set
retentionHoursbased on compliance requirements - Backup: Use persistent volumes with backup strategy for InfluxDB
- Resource Limits: Allocate sufficient CPU/memory for Telegraf sidecar
- Monitoring: Set up alerts on
headwind_updates_failed_totalandheadwind_reconcile_errors_total
Next Steps
- View Metrics Reference - Complete list of all metrics
- Helm Installation Guide - Install Headwind with observability enabled