Skip to content

feat(parsers): add Qualys VMDR CSV parser#14453

Open
skywalke34 wants to merge 20 commits intoDefectDojo:devfrom
skywalke34:qualys-vmdr-parser
Open

feat(parsers): add Qualys VMDR CSV parser#14453
skywalke34 wants to merge 20 commits intoDefectDojo:devfrom
skywalke34:qualys-vmdr-parser

Conversation

@skywalke34
Copy link
Contributor

Summary

  • New parser for Qualys VMDR CSV exports
  • Supports both QID-centric and CVE-centric export formats with automatic detection
  • Handles Qualys non-standard CSV format (outer-quoted rows, "" field delimiters, multi-line records with embedded newlines)
  • Uses trailing-quote heuristic for reliable multi-line record boundary detection

Fields Mapped

  • QID format: title, severity, description, mitigation, impact, unique_id_from_tool, vuln_id_from_tool, component_name, service, endpoints (IPv4/IPv6), tags, date, active status
  • CVE format: all QID fields plus cvssv3_score, vulnerability_ids (CVE tracking)

Deduplication

  • Algorithm: DEDUPE_ALGO_UNIQUE_ID_FROM_TOOL_OR_HASH_CODE
  • Hashcode fields: title, component_name, vuln_id_from_tool

Test Results

33 unit tests covering:

  • Zero, one, and many findings for both QID and CVE formats
  • All severity level mappings (1-5 → Info through Critical)
  • Field mapping correctness (endpoints, tags, CVSS, dates, active/fixed status)
  • No-metadata CSV variant (header at line 1, no report title lines)
  • HTML tag stripping from impact field
  • Endpoint.clean() validation
  • Parser contract methods (get_scan_types, get_label, get_description)

Documentation

Parser documentation at docs/content/supported_tools/parsers/file/qualys_vmdr.md

Checklist

  • Rebased against the very latest dev
  • Submitted against dev branch
  • Meaningful PR name
  • Code is ruff compliant
  • Code is Python 3.13 compliant
  • Documentation included
  • No model changes, no migrations needed
  • Unit tests included (33 tests)
  • Deduplication configured in settings.dist.py
  • Label: Import Scans

🤖 Generated with Claude Code

Design document for new Qualys VMDR parser supporting QID and CVE
CSV export formats. Includes field mappings, architecture decisions,
and test strategy.

Authored by T. Walker - DefectDojo
Detailed TDD implementation plan with 13 tasks covering:
- Package structure and test files
- helpers.py, qid_parser.py, cve_parser.py, parser.py
- Comprehensive test coverage
- Documentation following enhanced format structure

Authored by T. Walker - DefectDojo
Authored by T. Walker - DefectDojo
TDD: Tests written before implementation.

Authored by T. Walker - DefectDojo
Shared utilities for severity mapping, date parsing, description
building, endpoint parsing, and tag handling.

Authored by T. Walker - DefectDojo
Parses QID-centric CSV exports from Qualys VMDR.

Authored by T. Walker - DefectDojo
Parses CVE-centric CSV exports with CVSS scores from NVD.

Authored by T. Walker - DefectDojo
Auto-detects QID vs CVE format and delegates to appropriate parser.

Authored by T. Walker - DefectDojo
Comprehensive tests for severity mapping, endpoints, tags, CVE fields.
Also fixed CSV test files to use standard format and updated parser
format detection for proper CVE format recognition.

Authored by T. Walker - DefectDojo
Includes field mapping tables, severity conversion, and processing notes.

Authored by T. Walker - DefectDojo
The Qualys VMDR export uses a non-standard CSV format where fields are
delimited by ,"" instead of the standard "," format. This caused the
parser to fail when processing real Qualys exports.

Changes:
- Add format detection to distinguish standard vs non-standard CSV
- Add custom parsing functions for non-standard Qualys format
- Handle multi-line records with embedded newlines
- Both parsers (QID and CVE) now use the unified parsing logic

The parser now correctly handles both test files (standard CSV) and
real Qualys exports (non-standard format).

Authored by T. Walker - DefectDojo
The previous parsing logic used simple string splitting on ,"" which
failed when field values contained escaped quotes (""""). This caused
field misalignment and empty/default values in parsed findings.

The fix:
1. Remove outer quotes from the row
2. Unescape row-level quote doubling ("" -> ")
3. Parse the result as standard CSV using Python's csv module

This correctly handles fields containing embedded quotes like:
  "Description with ""quoted text"" inside"

Authored by T. Walker - DefectDojo
The previous end-of-record detection incorrectly treated any line ending
with a single quote as a complete record. This caused multi-line records
(where Results field contains embedded newlines) to be split incorrectly.

In Qualys non-standard format, multi-field records always end with """
(the last field's closing "" plus the record's closing "). Single quote
endings within a record are just field content, not record terminators.

Authored by T. Walker - DefectDojo
Map the CVE field to unsaved_vulnerability_ids so it appears in the
Vulnerability IDs column in DefectDojo, in addition to vuln_id_from_tool.

Authored by T. Walker - DefectDojo
Add documentation that CVE is mapped to both vuln_id_from_tool and
unsaved_vulnerability_ids for proper CVE tracking in DefectDojo.

Authored by T. Walker - DefectDojo
…ction

Replace field-count-based record boundary detection in the Qualys VMDR
nonstandard CSV parser with a trailing-quote heuristic. The old approach
re-parsed accumulated rows each iteration and failed on malformed quote
patterns (e.g. #table cols=""3"") that produce incorrect field counts.

The new _is_record_end_line() helper counts trailing quotes: exactly 3
means record end, 4+ means record end only if preceded by a comma
(empty field). This is O(1) per line and correctly handles all known
Qualys export patterns. Also fixes pre-existing ruff lint issues in the
state machine parser.

Authored by T. Walker - DefectDojo
These design/plan files were used during development and should not
be included in the upstream PR.

Authored by T. Walker - DefectDojo
…R docs

Document the non-standard CSV format, multi-line record support,
metadata line detection, HTML stripping, and null marker filtering.

Authored by T. Walker - DefectDojo
@github-actions github-actions bot added settings_changes Needs changes to settings.py based on changes in settings.dist.py included in this PR docs unittests parser labels Mar 6, 2026
@valentijnscholten valentijnscholten added this to the 2.57.0 milestone Mar 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs parser settings_changes Needs changes to settings.py based on changes in settings.dist.py included in this PR unittests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants