Skip to content

NetworkPolicy blocks metrics scraping from external pods #495

@mcsrobert

Description

@mcsrobert

Summary

The operator-generated NetworkPolicy for a Dragonfly instance restricts ingress on the admin/metrics port (9999) to only the operator's own controller-manager and peer Dragonfly pods. This makes it impossible for any external scraper (Prometheus, VictoriaMetrics, etc.) to reach the /metrics endpoint, even though exposing metrics is an explicitly supported feature of the admin port.

Environment

  • DragonflyDB Operator (1.4.0 and 1.5.0)
  • k3s with Flannel CNI
  • VictoriaMetrics (vmagent) as scraper, configured via PodMonitor

Steps to Reproduce

  1. Deploy a Dragonfly instance via the operator
  2. Configure a PodMonitor or ServiceMonitor targeting port 9999 (admin)
  3. Observe scrape errors: dial tcp <pod-ip>:9999: connect: connection refused

Root Cause

The operator creates a NetworkPolicy with the following ingress rules:

ingress:
  - ports:
      - port: 6379
        protocol: TCP
  - from:
      - namespaceSelector: {}
        podSelector:
          matchLabels:
            control-plane: controller-manager
      - podSelector:
          matchLabels:
            app: app-dragonfly
            app.kubernetes.io/name: dragonfly
            app.kubernetes.io/part-of: dragonfly
    ports:
      - port: 9999
        protocol: TCP

Port 6379 (Redis) is open to all pods, but port 9999 (admin/metrics) is scoped exclusively to the operator controller and peer Dragonfly pods. Any scraper pod — regardless of namespace — is refused at the network level before it can reach the listener.

This is confirmed by:

  • /proc/net/tcp showing the admin port correctly bound on 0.0.0.0:9999
  • kubectl port-forward successfully reaching /metrics (bypasses pod networking)
  • Cross-pod curl to the pod IP returning immediate connection refused (RST, not timeout — i.e. actively blocked, not dropped)
  • Removing/patching the NetworkPolicy immediately resolves scraping

Expected Behavior

The NetworkPolicy should permit ingress on port 9999 from any pod that can reach port 6379, or at minimum provide a documented mechanism (e.g. a field on the Dragonfly CR) to allow scraper pods access to the admin port.

Workaround

Manually add an additional ingress rule to the NetworkPolicy after deployment. Note that the operator may reconcile this away:

- ports:
    - port: 9999
      protocol: TCP

Or as a separate NetworkPolicy object (which the operator will not overwrite):

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: dragonfly-allow-metrics
  namespace: <your-namespace>
spec:
  podSelector:
    matchLabels:
      app.kubernetes.io/name: dragonfly
  ingress:
    - ports:
        - port: 9999
          protocol: TCP

Suggested Fix

Add a allowedMetricsSources (or similar) field to the Dragonfly CRD so users can specify a podSelector/namespaceSelector for their scraper, which the operator then includes in the generated NetworkPolicy.

Example in MetalLB: https://github.com/metallb/metallb/blob/7f8c364490486db29c2f588d466acf07ee06861f/charts/metallb/values.yaml#L54

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions