fix(source-snowflake): normalize numeric type aliases to prevent silent stream drops#74066
Conversation
…nt stream drops Snowflake JDBC driver can return different type name strings (e.g. NUMBER vs INTEGER) for the same NUMBER(38,0) column depending on whether metadata is queried via DatabaseMetaData.getColumns() or ResultSetMetaData. This caused StateManagerFactory.toStream() to detect a FieldTypeMismatch and silently drop entire streams, resulting in syncs completing with 0 records. Fix: Normalize all Snowflake numeric type aliases (NUMBER, DECIMAL, NUMERIC, INT, INTEGER, BIGINT, SMALLINT, TINYINT, BYTEINT) to a consistent FieldType based on scale: scale=0 -> BigIntegerFieldType (INTEGER), scale>0 -> BigDecimalFieldType (NUMBER). Resolves: #74064 Co-Authored-By: bot_apk <apk@cognition.ai>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
Co-Authored-By: bot_apk <apk@cognition.ai>
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. 💡 Show Tips and TricksPR Slash CommandsAirbyte Maintainers (that's you!) can execute the following slash commands on your PR:
📚 Show Repo GuidanceHelpful Resources
|
Co-Authored-By: bot_apk <apk@cognition.ai>
|
Co-Authored-By: bot_apk <apk@cognition.ai>
|
Deploy preview for airbyte-docs ready! ✅ Preview Built with commit 5cd356f. |
|
↪️ Triggering Reason: Draft PR normalizes numeric type aliases to prevent silent stream drops when Snowflake JDBC returns inconsistent type names for NUMBER(38,0) columns. |
|
Fix Validation EvidenceOutcome: Could not Run Tests Evidence SummaryThe connector could not be built from this PR branch. All three build attempts (2 pre-release publishes, 1 regression test) failed at Static analysis of the fix is positive — the type normalization logic is correct and directly addresses the root cause. Next Steps
Connector & PR DetailsConnector: Evidence PlanProving CriteriaA sync on a connection with Disproving CriteriaRegression tests fail, or a live sync still emits 0 records after applying the fix, or new errors appear. Cases Attempted
Pre-flight Checks
Detailed Evidence LogBuild Failure Root Cause: Failed build artifacts:
Note: Connection IDs and detailed logs are recorded in the linked private issue. |
|
|
What
Resolves https://github.com/airbytehq/oncall/issues/11452:
Resolves #74064:
Snowflake source syncs complete with 0 records emitted due to a type mismatch between catalog discovery and the read phase. The Snowflake JDBC driver returns different type name strings for the same
NUMBER(38,0)column depending on which metadata API is used ("NUMBER"viaDatabaseMetaData.getColumns()vs"INTEGER"viaResultSetMetaData). The CDK'sStateManagerFactory.toStream()detects this as aFieldTypeMismatchand silently drops the entire stream.How
In
SnowflakeSourceOperations, all Snowflake numeric type aliases (NUMBER,DECIMAL,NUMERIC,INT,INTEGER,BIGINT,SMALLINT,TINYINT,BYTEINT) are now routed through a singlenumericType(scale)function that determines theJdbcFieldTypebased on the column's scale value rather than the type name string:scale > 0→BigDecimalFieldType(LeafAirbyteSchemaType.NUMBER)scale == 0ornull→BigIntegerFieldType(LeafAirbyteSchemaType.INTEGER)This ensures consistent type mapping regardless of which JDBC metadata API returns which type name string.
Version bumped to 1.0.9 with changelog entry.
Review guide
SnowflakeSourceOperations.kt— Core fix. Review theleafType()andnumericType()functions. All numeric type name aliases now share a single code path that dispatches onscaleinstead of type name string.SnowflakeSourceOperationsTest.kt— New unit tests. Parameterized tests verify all 9 numeric aliases produce the same type at scale=0, scale>0, and scale=null. Includes a test for the exact reported bug scenario (NUMBER vs INTEGER for the same column).metadata.yaml/snowflake.md— Version bump 1.0.8 → 1.0.9 and changelog entry.Human review checklist
ShortFieldType/ByteFieldType(usingShortAccessor/ByteAccessor), now mapped toBigIntegerFieldType(usingBigDecimalAccessor). The Airbyte schema type isINTEGERin all cases, but the JDBC getter changed. Verify this doesn't cause issues reading small integer values.User Impact
Positive:
NUMBER(38,0)columns (or other zero-scale numeric types) will no longer experience silent sync failures with 0 records emitted.Potential schema change:
NUMBER(p,0)columns discovered asNUMBERtype will now discover asINTEGERtype on the next schema refresh. This is a schema change but corrects the inconsistency that was causing failures. Users may need to refresh their schemas and potentially reset affected streams.Can this PR be safely reverted and rolled back?
Reverting would restore the previous behavior where type mismatches cause silent stream drops, but no data corruption or state issues would occur.
Devin session
Requested by: bot_apk (apk@cognition.ai)