-
Notifications
You must be signed in to change notification settings - Fork 40
Description
Description
AbstractFileBasedStreamReader.filter_files_by_globs_and_start_date() in airbyte_cdk/sources/file_based/file_based_stream_reader.py (line 108) uses datetime.strptime(self.config.start_date, self.DATE_TIME_FORMAT) where DATE_TIME_FORMAT = "%Y-%m-%dT%H:%M:%S.%fZ".
This strictly requires microsecond digits in the start_date value. A valid ISO8601 date like 2025-01-01T00:00:00Z (without microseconds) is rejected with:
ValueError: time data '2025-01-01T00:00:00Z' does not match format '%Y-%m-%dT%H:%M:%S.%fZ'
Context
CDK v7.7.1 (via airbytehq/airbyte-python-cdk PR 887) fixed the spec validation side of this issue by:
- Updating the JSON Schema
start_datepattern to accept flexible date formats - Updating the Pydantic validator to use
ab_datetime_try_parse
However, the runtime code path in filter_files_by_globs_and_start_date was not updated and still uses the strict datetime.strptime with %Y-%m-%dT%H:%M:%S.%fZ. This means config validation passes but the connector fails at runtime when listing/filtering files.
Impact
This affects all file-based connectors that inherit from AbstractFileBasedStreamReader, including:
- source-sharepoint-enterprise
- source-microsoft-sharepoint
- source-microsoft-onedrive
- source-s3
- source-gcs
- source-azure-blob-storage
- source-sftp-bulk
The issue is triggered when the Terraform provider (or any other integration) normalizes datetime values by stripping microseconds (e.g., 2025-01-01T00:00:00.000000Z → 2025-01-01T00:00:00Z).
Steps to Reproduce
- Configure any file-based connector with
start_date = "2025-01-01T00:00:00Z"(no microseconds) - Run a sync or discover
- Observe
ValueErrorfromfilter_files_by_globs_and_start_date
Suggested Fix
Update filter_files_by_globs_and_start_date in file_based_stream_reader.py to use the flexible ab_datetime_try_parse helper from datetime_helpers instead of strict datetime.strptime. This is consistent with the approach already taken for spec validation in CDK v7.7.1.
Related
- Oncall issue:
airbytehq/oncall#9390 - CDK spec fix:
airbytehq/airbyte-python-cdkPR 887 (CDK v7.7.1)
Requested by Aaron ("AJ") Steers (@aaronsteers).