Metrics Reference

Headwind exposes Prometheus metrics on port 9090 at /metrics for comprehensive monitoring and alerting.

Accessing Metrics

# Port forward metrics endpoint
kubectl port-forward -n headwind-system svc/headwind-metrics 9090:9090

# View metrics
curl http://localhost:9090/metrics

# Or open in browser
open http://localhost:9090/metrics

Webhook Metrics

Track webhook event processing:

`headwind_webhook_events_total`

Type: Counter

Description: Total webhook events received from container registries

Labels:

registry - Registry type (dockerhub, harbor, gitlab, etc.)

Example:

# Rate of webhook events per minute
rate(headwind_webhook_events_total[5m]) * 60

# Total events by registry
sum by (registry) (headwind_webhook_events_total)

`headwind_webhook_events_processed`

Type: Counter

Description: Successfully processed webhook events

Example:

# Processing success rate
rate(headwind_webhook_events_processed[5m]) / rate(headwind_webhook_events_total[5m])

Polling Metrics

Monitor registry polling operations:

`headwind_polling_cycles_total`

Type: Counter

Description: Total polling cycles completed

Example:

# Polling frequency
rate(headwind_polling_cycles_total[5m])

`headwind_polling_errors_total`

Type: Counter

Description: Polling errors encountered

Example:

# Error rate
rate(headwind_polling_errors_total[5m])

`headwind_polling_images_checked_total`

Type: Counter

Description: Container images checked during polling

Example:

# Images checked per polling cycle
rate(headwind_polling_images_checked_total[5m]) / rate(headwind_polling_cycles_total[5m])

`headwind_polling_new_tags_found_total`

Type: Counter

Description: New image tags discovered via polling

Example:

# Tag discovery rate
rate(headwind_polling_new_tags_found_total[1h])

`headwind_polling_helm_charts_checked_total`

Type: Counter

Description: Helm charts checked during polling

Example:

# Helm charts checked per cycle
rate(headwind_polling_helm_charts_checked_total[5m])

`headwind_polling_helm_new_versions_found_total`

Type: Counter

Description: New Helm chart versions discovered via polling

Example:

# Helm version discovery rate
rate(headwind_polling_helm_new_versions_found_total[1h])

`headwind_polling_resources_filtered_total`

Type: Counter

Description: Resources filtered out from polling due to event-source annotation

Details: Incremented when resources have event-source: webhook or event-source: none set. These resources are skipped during polling cycles to reduce unnecessary registry API calls.

Example:

# Resources filtered from polling
headwind_polling_resources_filtered_total

# Filter rate per polling cycle
rate(headwind_polling_resources_filtered_total[5m]) / rate(headwind_polling_cycles_total[5m])

# Percentage of resources using webhooks only
headwind_polling_resources_filtered_total /
  (headwind_polling_resources_filtered_total + headwind_polling_images_checked_total)

Use Cases:

Monitor adoption of webhook vs polling event sources
Track resource distribution across event source types
Optimize polling efficiency

Update Metrics

Track update requests and their lifecycle:

`headwind_updates_pending`

Type: Gauge

Description: Number of UpdateRequests currently awaiting approval

Example:

# Current pending updates
headwind_updates_pending

# Alert on too many pending updates
headwind_updates_pending > 20

`headwind_updates_approved_total`

Type: Counter

Description: Total approved updates

Example:

# Approval rate
rate(headwind_updates_approved_total[1h])

`headwind_updates_rejected_total`

Type: Counter

Description: Total rejected updates

Example:

# Rejection rate
rate(headwind_updates_rejected_total[1h])

# Approval vs rejection ratio
headwind_updates_approved_total / (headwind_updates_approved_total + headwind_updates_rejected_total)

`headwind_updates_applied_total`

Type: Counter

Description: Successfully applied updates

Labels:

kind - Workload kind (Deployment, StatefulSet, DaemonSet, HelmRelease)

Example:

# Update success rate
rate(headwind_updates_applied_total[1h])

# Updates by workload type
sum by (kind) (headwind_updates_applied_total)

`headwind_updates_failed_total`

Type: Counter

Description: Failed update attempts

Example:

# Failure rate
rate(headwind_updates_failed_total[5m])

# Update success rate
rate(headwind_updates_applied_total[5m]) / (rate(headwind_updates_applied_total[5m]) + rate(headwind_updates_failed_total[5m]))

`headwind_updates_skipped_interval_total`

Type: Counter

Description: Updates skipped due to minimum update interval not elapsed

Example:

# Rate of skipped updates
rate(headwind_updates_skipped_interval_total[1h])

Controller Metrics

Monitor Kubernetes controllers:

`headwind_reconcile_duration_seconds`

Type: Histogram

Description: Time spent in reconciliation loops

Buckets: 0.01, 0.05, 0.1, 0.5, 1.0, 2.0, 5.0

Example:

# 95th percentile reconciliation time
histogram_quantile(0.95, rate(headwind_reconcile_duration_seconds_bucket[5m]))

# Average reconciliation duration
rate(headwind_reconcile_duration_seconds_sum[5m]) / rate(headwind_reconcile_duration_seconds_count[5m])

`headwind_reconcile_errors_total`

Type: Counter

Description: Reconciliation errors

Example:

# Error rate
rate(headwind_reconcile_errors_total[5m])

Workload Watching Metrics

Track resources being monitored:

`headwind_deployments_watched`

Type: Gauge

Description: Number of Deployments being monitored by Headwind

Example:

headwind_deployments_watched

`headwind_statefulsets_watched`

Type: Gauge

Description: Number of StatefulSets being monitored

Example:

headwind_statefulsets_watched

`headwind_daemonsets_watched`

Type: Gauge

Description: Number of DaemonSets being monitored

Example:

headwind_daemonsets_watched

`headwind_helm_releases_watched`

Type: Gauge

Description: Number of HelmReleases being monitored

Example:

headwind_helm_releases_watched

# Total workloads watched
headwind_deployments_watched + headwind_statefulsets_watched + headwind_daemonsets_watched + headwind_helm_releases_watched

Helm Metrics

Track Helm chart version discovery and updates:

`headwind_helm_chart_versions_checked_total`

Type: Counter

Description: Helm chart version checks performed

Example:

rate(headwind_helm_chart_versions_checked_total[5m])

`headwind_helm_updates_found_total`

Type: Counter

Description: Helm chart updates discovered

Example:

rate(headwind_helm_updates_found_total[1h])

`headwind_helm_updates_approved_total`

Type: Counter

Description: Helm chart updates approved by policy

Example:

# Approval rate
headwind_helm_updates_approved_total / headwind_helm_updates_found_total

`headwind_helm_updates_rejected_total`

Type: Counter

Description: Helm chart updates rejected by policy

Example:

# Rejection rate
headwind_helm_updates_rejected_total / headwind_helm_updates_found_total

`headwind_helm_updates_applied_total`

Type: Counter

Description: Helm chart updates successfully applied

Example:

rate(headwind_helm_updates_applied_total[1h])

`headwind_helm_repository_queries_total`

Type: Counter

Description: Helm repository queries performed

Example:

rate(headwind_helm_repository_queries_total[5m])

`headwind_helm_repository_errors_total`

Type: Counter

Description: Helm repository query errors

Example:

# Error rate
rate(headwind_helm_repository_errors_total[5m]) / rate(headwind_helm_repository_queries_total[5m])

`headwind_helm_repository_query_duration_seconds`

Type: Histogram

Description: Helm repository query duration

Example:

# 95th percentile query time
histogram_quantile(0.95, rate(headwind_helm_repository_query_duration_seconds_bucket[5m]))

Rollback Metrics

Monitor rollback operations:

`headwind_rollbacks_total`

Type: Counter

Description: Total rollback operations (manual + automatic)

Example:

rate(headwind_rollbacks_total[1h])

`headwind_rollbacks_manual_total`

Type: Counter

Description: Manual rollback operations

Example:

rate(headwind_rollbacks_manual_total[1h])

`headwind_rollbacks_automatic_total`

Type: Counter

Description: Automatic rollback operations triggered by health failures

Example:

rate(headwind_rollbacks_automatic_total[1h])

# Automatic rollback ratio
headwind_rollbacks_automatic_total / headwind_rollbacks_total

`headwind_rollbacks_failed_total`

Type: Counter

Description: Failed rollback operations

Example:

# Rollback success rate
(headwind_rollbacks_total - headwind_rollbacks_failed_total) / headwind_rollbacks_total

`headwind_deployment_health_checks_total`

Type: Counter

Description: Deployment health checks performed after updates

Example:

rate(headwind_deployment_health_checks_total[5m])

`headwind_deployment_health_failures_total`

Type: Counter

Description: Deployment health check failures detected

Example:

# Health failure rate
rate(headwind_deployment_health_failures_total[5m]) / rate(headwind_deployment_health_checks_total[5m])

Notification Metrics

Track notification delivery:

`headwind_notifications_sent_total`

Type: Counter

Description: Total notifications sent successfully

Example:

rate(headwind_notifications_sent_total[5m])

`headwind_notifications_failed_total`

Type: Counter

Description: Total notification failures

Example:

# Failure rate
rate(headwind_notifications_failed_total[5m]) / rate(headwind_notifications_sent_total[5m])

`headwind_notifications_slack_sent_total`

Type: Counter

Description: Notifications sent to Slack

Example:

rate(headwind_notifications_slack_sent_total[5m])

`headwind_notifications_teams_sent_total`

Type: Counter

Description: Notifications sent to Microsoft Teams

Example:

rate(headwind_notifications_teams_sent_total[5m])

`headwind_notifications_webhook_sent_total`

Type: Counter

Description: Notifications sent via generic webhooks

Example:

rate(headwind_notifications_webhook_sent_total[5m])

Prometheus Alerts

Example alert rules for Headwind:

groups:
- name: headwind
  rules:
  # Update alerts
  - alert: HeadwindStaleUpdateRequests
    expr: headwind_updates_pending > 10
    for: 1h
    annotations:
      summary: "Many pending UpdateRequests"
      description: "{{ $value }} UpdateRequests pending for over 1 hour"

  - alert: HeadwindHighUpdateFailureRate
    expr: rate(headwind_updates_failed_total[5m]) > 0.1
    for: 5m
    annotations:
      summary: "High update failure rate"
      description: "Update failures detected"

  # Rollback alerts
  - alert: HeadwindAutomaticRollback
    expr: increase(headwind_rollbacks_automatic_total[5m]) > 0
    annotations:
      summary: "Automatic rollback triggered"
      description: "Headwind triggered an automatic rollback"

  - alert: HeadwindFrequentRollbacks
    expr: rate(headwind_rollbacks_total[1h]) > 3
    for: 5m
    annotations:
      summary: "Frequent rollbacks detected"
      description: "{{ $value }} rollbacks in the last hour"

  # Helm alerts
  - alert: HeadwindHelmRepositoryErrors
    expr: rate(headwind_helm_repository_errors_total[5m]) > 0
    for: 5m
    annotations:
      summary: "Helm repository query errors"
      description: "Errors querying Helm repositories"

  # Notification alerts
  - alert: HeadwindNotificationFailures
    expr: rate(headwind_notifications_failed_total[5m]) > 0
    for: 5m
    annotations:
      summary: "Notification failures detected"
      description: "Headwind notifications are failing"

  # Reconciliation alerts
  - alert: HeadwindSlowReconciliation
    expr: histogram_quantile(0.95, rate(headwind_reconcile_duration_seconds_bucket[5m])) > 5
    for: 10m
    annotations:
      summary: "Slow reconciliation loops"
      description: "95th percentile reconciliation time > 5s"

  - alert: HeadwindReconciliationErrors
    expr: rate(headwind_reconcile_errors_total[5m]) > 0.1
    for: 5m
    annotations:
      summary: "Reconciliation errors"
      description: "Controller reconciliation errors detected"

Grafana Dashboard

Example PromQL queries for a Grafana dashboard:

Overview Panel

# Pending updates
headwind_updates_pending

# Watched resources
sum(headwind_deployments_watched + headwind_statefulsets_watched + headwind_daemonsets_watched + headwind_helm_releases_watched)

# Update success rate (last hour)
rate(headwind_updates_applied_total[1h]) / (rate(headwind_updates_applied_total[1h]) + rate(headwind_updates_failed_total[1h]))

Update Activity Panel

# Updates approved (rate)
rate(headwind_updates_approved_total[5m])

# Updates applied by type
sum by (kind) (rate(headwind_updates_applied_total[5m]))

# Updates rejected (rate)
rate(headwind_updates_rejected_total[5m])

Rollback Panel

# Total rollbacks
rate(headwind_rollbacks_total[1h])

# Automatic vs Manual
rate(headwind_rollbacks_automatic_total[1h])
rate(headwind_rollbacks_manual_total[1h])

# Health check failure rate
rate(headwind_deployment_health_failures_total[5m]) / rate(headwind_deployment_health_checks_total[5m])

Performance Panel

# Reconciliation latency (p95)
histogram_quantile(0.95, rate(headwind_reconcile_duration_seconds_bucket[5m]))

# Helm repository query latency (p95)
histogram_quantile(0.95, rate(headwind_helm_repository_query_duration_seconds_bucket[5m]))

Scraping Configuration

Configure Prometheus to scrape Headwind metrics:

scrape_configs:
- job_name: 'headwind'
  kubernetes_sd_configs:
  - role: pod
    namespaces:
      names:
      - headwind-system
  relabel_configs:
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
    action: keep
    regex: true
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port]
    action: replace
    target_label: __address__
    regex: (.+):(.+)
    replacement: $1:9090
  - source_labels: [__meta_kubernetes_pod_name]
    action: replace
    target_label: pod
  - source_labels: [__meta_kubernetes_namespace]
    action: replace
    target_label: namespace

Next Steps

Notifications - Configure notifications
Rollback Configuration - Set up rollback
API Reference - REST API documentation

Accessing Metrics
Webhook Metrics
- headwind_webhook_events_total
- headwind_webhook_events_processed
Polling Metrics
Update Metrics
Controller Metrics
- headwind_reconcile_duration_seconds
- headwind_reconcile_errors_total
Workload Watching Metrics
Helm Metrics
Rollback Metrics
Notification Metrics
Prometheus Alerts
Grafana Dashboard
Scraping Configuration
Next Steps

Accessing Metrics​

Webhook Metrics​

headwind_webhook_events_total​

headwind_webhook_events_processed​

Polling Metrics​

headwind_polling_cycles_total​

headwind_polling_errors_total​

headwind_polling_images_checked_total​

headwind_polling_new_tags_found_total​

headwind_polling_helm_charts_checked_total​

headwind_polling_helm_new_versions_found_total​

headwind_polling_resources_filtered_total​

Update Metrics​

headwind_updates_pending​

headwind_updates_approved_total​

headwind_updates_rejected_total​

headwind_updates_applied_total​

headwind_updates_failed_total​

headwind_updates_skipped_interval_total​

Controller Metrics​

headwind_reconcile_duration_seconds​

headwind_reconcile_errors_total​

Workload Watching Metrics​

headwind_deployments_watched​

headwind_statefulsets_watched​

headwind_daemonsets_watched​

headwind_helm_releases_watched​

Helm Metrics​

headwind_helm_chart_versions_checked_total​

headwind_helm_updates_found_total​

headwind_helm_updates_approved_total​

headwind_helm_updates_rejected_total​

headwind_helm_updates_applied_total​

headwind_helm_repository_queries_total​

headwind_helm_repository_errors_total​

headwind_helm_repository_query_duration_seconds​

Rollback Metrics​

headwind_rollbacks_total​

headwind_rollbacks_manual_total​

headwind_rollbacks_automatic_total​

headwind_rollbacks_failed_total​

headwind_deployment_health_checks_total​

headwind_deployment_health_failures_total​

Notification Metrics​

headwind_notifications_sent_total​

headwind_notifications_failed_total​

headwind_notifications_slack_sent_total​

headwind_notifications_teams_sent_total​

headwind_notifications_webhook_sent_total​

Prometheus Alerts​

Grafana Dashboard​

Overview Panel​

Update Activity Panel​

Rollback Panel​

Performance Panel​

Scraping Configuration​

Next Steps​

Accessing Metrics

Webhook Metrics

`headwind_webhook_events_total`

`headwind_webhook_events_processed`

Polling Metrics

`headwind_polling_cycles_total`

`headwind_polling_errors_total`

`headwind_polling_images_checked_total`

`headwind_polling_new_tags_found_total`

`headwind_polling_helm_charts_checked_total`

`headwind_polling_helm_new_versions_found_total`

`headwind_polling_resources_filtered_total`

Update Metrics

`headwind_updates_pending`

`headwind_updates_approved_total`

`headwind_updates_rejected_total`

`headwind_updates_applied_total`

`headwind_updates_failed_total`

`headwind_updates_skipped_interval_total`

Controller Metrics

`headwind_reconcile_duration_seconds`

`headwind_reconcile_errors_total`

Workload Watching Metrics

`headwind_deployments_watched`

`headwind_statefulsets_watched`

`headwind_daemonsets_watched`

`headwind_helm_releases_watched`

Helm Metrics

`headwind_helm_chart_versions_checked_total`

`headwind_helm_updates_found_total`

`headwind_helm_updates_approved_total`

`headwind_helm_updates_rejected_total`

`headwind_helm_updates_applied_total`

`headwind_helm_repository_queries_total`

`headwind_helm_repository_errors_total`

`headwind_helm_repository_query_duration_seconds`

Rollback Metrics

`headwind_rollbacks_total`

`headwind_rollbacks_manual_total`

`headwind_rollbacks_automatic_total`

`headwind_rollbacks_failed_total`

`headwind_deployment_health_checks_total`

`headwind_deployment_health_failures_total`

Notification Metrics

`headwind_notifications_sent_total`

`headwind_notifications_failed_total`

`headwind_notifications_slack_sent_total`

`headwind_notifications_teams_sent_total`

`headwind_notifications_webhook_sent_total`

Prometheus Alerts

Grafana Dashboard

Overview Panel

Update Activity Panel

Rollback Panel

Performance Panel

Scraping Configuration

Next Steps