Skip to content

Bug: file_based_stream_reader.filter_files_by_globs_and_start_date uses strict datetime parsing, rejecting valid ISO8601 dates #920

@devin-ai-integration

Description

@devin-ai-integration

Description

AbstractFileBasedStreamReader.filter_files_by_globs_and_start_date() in airbyte_cdk/sources/file_based/file_based_stream_reader.py (line 108) uses datetime.strptime(self.config.start_date, self.DATE_TIME_FORMAT) where DATE_TIME_FORMAT = "%Y-%m-%dT%H:%M:%S.%fZ".

This strictly requires microsecond digits in the start_date value. A valid ISO8601 date like 2025-01-01T00:00:00Z (without microseconds) is rejected with:

ValueError: time data '2025-01-01T00:00:00Z' does not match format '%Y-%m-%dT%H:%M:%S.%fZ'

Context

CDK v7.7.1 (via airbytehq/airbyte-python-cdk PR 887) fixed the spec validation side of this issue by:

  1. Updating the JSON Schema start_date pattern to accept flexible date formats
  2. Updating the Pydantic validator to use ab_datetime_try_parse

However, the runtime code path in filter_files_by_globs_and_start_date was not updated and still uses the strict datetime.strptime with %Y-%m-%dT%H:%M:%S.%fZ. This means config validation passes but the connector fails at runtime when listing/filtering files.

Impact

This affects all file-based connectors that inherit from AbstractFileBasedStreamReader, including:

  • source-sharepoint-enterprise
  • source-microsoft-sharepoint
  • source-microsoft-onedrive
  • source-s3
  • source-gcs
  • source-azure-blob-storage
  • source-sftp-bulk

The issue is triggered when the Terraform provider (or any other integration) normalizes datetime values by stripping microseconds (e.g., 2025-01-01T00:00:00.000000Z2025-01-01T00:00:00Z).

Steps to Reproduce

  1. Configure any file-based connector with start_date = "2025-01-01T00:00:00Z" (no microseconds)
  2. Run a sync or discover
  3. Observe ValueError from filter_files_by_globs_and_start_date

Suggested Fix

Update filter_files_by_globs_and_start_date in file_based_stream_reader.py to use the flexible ab_datetime_try_parse helper from datetime_helpers instead of strict datetime.strptime. This is consistent with the approach already taken for spec validation in CDK v7.7.1.

Related

  • Oncall issue: airbytehq/oncall#9390
  • CDK spec fix: airbytehq/airbyte-python-cdk PR 887 (CDK v7.7.1)

Requested by Aaron ("AJ") Steers (@aaronsteers).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions