Skip to content

[ingest-pipeline-safety] AWS VPC Flow pipeline silently accepts unsupported custom-format records #18521

@github-actions

Description

@github-actions

Findings

1. aws/vpcflow has a silent parse miss when token count is outside hard-coded layouts

Location

  • packages/aws/data_stream/vpcflow/elasticsearch/ingest_pipeline/default.yml:37-71

Evidence

  • The pipeline computes token count, then only runs dissect for exact counts 6, 14, 17, 21, 29, 36, 39, 40:
- script:
    source: >-
      ctx._temp_ = new HashMap();
      ctx._temp_.message_token_count = ctx.event?.original.splitOnToken(" ").length;
...
- dissect: ... if: ctx?._temp_?.message_token_count == 14
- dissect: ... if: ctx?._temp_?.message_token_count == 6
- dissect: ... if: ctx?._temp_?.message_token_count == 17
- dissect: ... if: ctx?._temp_?.message_token_count == 21
- dissect: ... if: ctx?._temp_?.message_token_count == 29
- dissect: ... if: ctx?._temp_?.message_token_count == 36
- dissect: ... if: ctx?._temp_?.message_token_count == 39
- dissect: ... if: ctx?._temp_?.message_token_count == 40
  • There is no fallback/error processor when none of these branches match.
  • Pipeline-level on_failure exists (default.yml:395-404), but it is only invoked on processor exceptions. Non-matching token counts do not throw.

Concrete triggering document

  • This valid custom-format VPC Flow record has 15 tokens (not in the allowlist):
{
  "message": "2 123456789012 eni-0abcdeffedcba1234 2001:db8::1 2001:db8::2 443 51514 6 10 840 1710000000 1710000060 ACCEPT OK us-east-1"
}

What is wrong

  • For this record, no dissect processor runs, so aws.vpcflow.* fields are never created.
  • Downstream processors are mostly null-safe/ignore_missing, so the document continues indexing without pipeline_error or error.message.
  • The result is an untracked parse failure (document indexed, but effectively unparsed).

Why it matters

  • This violates the ingest safety guarantee: logs that are not parsed are not surfaced as failures.
  • Users with custom VPC Flow formats can lose key network/security fields silently, degrading detections and analytics without visibility.

Suggested fix (one line)

  • After all dissect branches, add a guard like: if no required parse key exists (e.g., aws.vpcflow.account_id/interface_id), set event.kind: pipeline_error and append a clear error.message.

Pipelines reviewed and found safe (for this audit)

  • packages/cisco_asa/data_stream/log/elasticsearch/ingest_pipeline/default.yml
  • packages/panw/data_stream/panos/elasticsearch/ingest_pipeline/default.yml and referenced child pipelines
  • packages/fortinet_fortigate/data_stream/log/elasticsearch/ingest_pipeline/default.yml and child pipelines
  • packages/aws/data_stream/cloudtrail/elasticsearch/ingest_pipeline/default.yml
  • packages/crowdstrike/data_stream/fdr/elasticsearch/ingest_pipeline/default.yml and child pipelines

These pipelines include explicit on_failure handling that sets event.kind: pipeline_error and appends error.message for processor exceptions.


What is this? | From workflow: Sweeper: Ingest Pipeline Null-Safety and Grok Robustness

Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.

  • expires on Apr 27, 2026, 9:49 AM UTC

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions