Conversation
- Introduced `validation_alias` with `AliasChoices` for selected fields (`well_status`, `sampler`, `measurement_date_time`, `mp_height`) to allow alternate field names. - Ensured alignment with schema validation updates.
- Introduced unit tests for `WellInventoryRow` alias mappings. - Verified correct handling of alias fields like `well_hole_status`, `mp_height_ft`, and others. - Ensured canonical fields take precedence when both alias and canonical values are provided.
… and new fields - Added `flexible_lexicon_validator` to support case-insensitive validation of enum-like fields. - Introduced new fields: `OriginType`, `WellPumpType`, `MonitoringStatus`, among others. - Updated existing fields to use flexible lexicon validation for improved consistency. - Adjusted `WellInventoryRow` optional fields handling and validation rules. - Refined contact field validation logic to require `role` and `type` when other contact details are provided.
…dations - Refined validation error handling to provide more detailed feedback in test assertions. - Adjusted test setup to ensure accurate validation scenarios for contact and water level fields. - Updated contact-related tests to validate new composite field error messages.
- Renamed "Water" to "Water Bearing Zone" and refined its definition. - Added new term "Water Quality" under `note_type` category.
… to prevent cross-test collisions - Supports BDD test suite stability - Added hashing mechanism to append unique suffix to `well_name_point_id` for scenario isolation. - Integrated pandas for robust CSV parsing and content modifications when applicable. - Ensured handling preserves existing format for IDs ending with `-xxxx`. - Maintained existing handling for empty or non-CSV files.
…ollback side effects - Supports transaction management - Moved `session.refresh` calls under `commit` condition to streamline database session operations. - Reorganized `session.rollback` logic to properly align with commit flow.
…ory source fields in support of schema alignment and database mapping - Update well inventory CSV files to correct data inconsistencies and improve schema alignment. - Added support for `Sample`, `Observation`, and `Parameter` objects within well inventory processing. - Enhanced elevation handling with optional and default value logic. - Introduced `release_status`, `monitoring_status`, and validation for derived fields. - Updated notes handling with new cases and refined content categorization. - Improved `depth_to_water` processing with associated sample and observation creation. - Refined lexicon updates and schema field adjustments for better data consistency.
…h 1 well - Updated BDD tests to reflect changes in well inventory bulk upload logic, allowing the import of 1 well despite validation errors. - Modified step definitions for more granular validation on imported well counts. - Enhanced error message detail in responses for validation scenarios. - Adjusted sample CSV files to match new import logic and validation schema updates. - Refined service behavior to improve handling of validation errors and partial imports.
There was a problem hiding this comment.
Pull request overview
This PR updates the well-inventory CSV ingestion pipeline to better align schema validation with lexicon-backed enums, persist previously-missed fields (notes, monitoring/public availability, water level observations), and change the import behavior to “best-effort” so individual row failures don’t abort the entire upload.
Changes:
- Added row-level savepoints and improved error reporting so invalid rows are skipped while valid rows are persisted.
- Updated
WellInventoryRowschema to relax/adjust requirements and introduce lexicon-backed enum parsing plus CSV field aliases. - Updated BDD/unit tests and test CSV fixtures to reflect new validation rules and partial-success behavior.
Reviewed changes
Copilot reviewed 34 out of 34 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
services/well_inventory_csv.py |
Best-effort import via nested savepoints; mapping updates for release status, notes, monitoring status, and water level observation persistence |
schemas/well_inventory.py |
Schema optionality updates, lexicon enum coercion, contact + water level validation adjustments, and alias handling |
services/thing_helper.py |
Adds monitoring status history writes; adjusts commit/rollback behavior for outer-transaction support |
services/contact_helper.py |
Adjusts commit/rollback behavior for outer-transaction support |
cli/service_adapter.py |
Improves exit code and stderr reporting for partial success and validation failures |
tests/test_well_inventory.py |
Adds schema alias tests and helper row builder |
tests/features/well-inventory-csv.feature |
Updates expected partial-success outcomes in negative scenarios |
tests/features/steps/*.py |
Updates step definitions for partial success and loosens validation-error matching |
tests/features/data/*.csv |
Refreshes fixture data to match new lexicon/validation expectations |
core/lexicon.json |
Adds a new note_type term |
… unique well name suffixes in well inventory scenarios - Updated `pd.read_csv` calls with `keep_default_na=False` to retain empty values as-is. - Refined logic for suffix addition by excluding empty and `-xxxx` suffixed IDs. - Improved test isolation by maintaining scenario-specific unique identifiers.
…nd `DataQuality` - Changed `SampleMethodField` to validate against `SampleMethod` instead of `OriginType` - Changed `DataQualityField` to validate against `DataQuality` instead of `OriginType`
… import - Make contact.role and contact.contact_type nullable in the ORM and migrations - Update contact schemas and well inventory validation to accept missing values - Allow contact import when name or organization is present without role/type
- Stop round-tripping CSV fixtures through pandas to avoid rewriting structural test cases - Preserve repeated header rows and duplicate column fixtures so importer validation is exercised correctly - Keep the blank contact name/organization scenario focused on a single invalid row for stable assertions
…n errors - Prevent one actual validation error from satisfying multiple expected assertions (avoids false positives) - Keep validation matching order-independent while requiring distinct matches (preserves flexibility) - Tighten BDD error checks without relying on exact error text (improves test precision)
…behavior - Update partial-success scenarios to expect valid rows to import alongside row-level validation errors - Reflect current importer behavior for invalid lexicon, invalid date, and repeated-header cases - Keep BDD coverage focused on user-visible import outcomes instead of outdated all-or-nothing assumptions
…sitive parsing - Update unit expectations to accept lowercase placeholder tokens that are now supported - Document normalization of mixed-case and spaced placeholder formats to uppercase prefixes - Keep test coverage aligned with importer behavior and reduce confusion around valid autogen inputs
…DataQuality` - Adjust test data to reflect updated descriptions for `sample_method` and `data_quality` fields.
…ization scenarios - Add test to ensure contact creation returns None when both name and organization are missing - Add test to verify contact creation with organization only, ensuring proper dict structure - Update assertions for comprehensive validation of contact fields
|
@jirhiker Recent updates to the CLI command tests are failing on my branch. Since these changes don't seem to impact the well inventory ingestion, should I resolve the test failures within this PR, or would you prefer I handle them in a separate one before re-opening for review? |
…ence - Promote well_status to lexicon-backed validation with well_hole_status alias support - Prevent invalid well_hole_status values from surfacing as DB constraint errors - Align BDD fixtures and assertions with stable user-facing validation behavior
|
|
||
| existing_well = _find_existing_imported_well(session, model) | ||
| if existing_well is not None: | ||
| return existing_well.name |
There was a problem hiding this comment.
The idempotency shortcut returns existing_well.name, which gets appended to wells and counted in total_rows_imported. On reruns this reports rows as “imported” even though nothing new was created, which is misleading for CLI summary/metrics. Consider returning None (and optionally adding a warning entry) when a row is detected as already imported so total_rows_imported reflects newly-created records only.
| return existing_well.name | |
| logging.info( | |
| "Well '%s' already imported; skipping creation for idempotent CSV row", | |
| existing_well.name, | |
| ) | |
| return None |
| expected_errors = [ | ||
| { | ||
| "field": "composite field error", | ||
| "error": "Value error, contact_1_role must be provided if name is provided", | ||
| "error": "Value error, contact_1_role is required when contact fields are provided", | ||
| } |
There was a problem hiding this comment.
This step expects the error text "contact_1_role is required when contact fields are provided", but the WellInventoryRow validator raises "... when contact data is provided". Because _handle_validation_error matches by substring, this wording mismatch will prevent the step from matching the actual validation error. Update the expected string to match the current validator message (or use a shorter substring that’s stable).
| @then("the command exits with a non-zero exit code") | ||
| def step_impl_command_exit_nonzero(context): | ||
| assert context.cli_result.exit_code != 0 | ||
| assert context.cli_result.exit_code != 0, context.cli_result.exit_code |
There was a problem hiding this comment.
The assertion message here only echoes the numeric exit code, which doesn’t help diagnose failures. Consider including context.cli_result.stderr (and/or .stdout) so a failing non-zero exit assertion shows the underlying CLI error output.
…jab-bdms-626-mp-height
- Emit validation and import progress during interactive CLI runs - Report per-project import progress and periodic row counts - Keep non-interactive callers and tests quiet by default
BDMS-626: mp height fixes
…stion-updates_v2' into kas-well-BDMS-626-inventory-ingestion-updates_v2
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 39 out of 39 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (1)
services/well_inventory_csv.py:796
model.completion_sourceis now anOriginTypeenum (seeWellInventoryRow), butCreateWell.well_completion_date_sourceis typed asstr | None. Passing the enum object here can serialize to something likeOriginType.<name>instead of the underlying lexicon term, causing incorrect persistence or FK failures. Convert to the enum.value(or changeCreateWell.well_completion_date_sourcetoOriginType) before callingCreateWell(...)for consistency with the other enum-backed fields.
well_completion_date=model.date_drilled,
well_completion_date_source=model.completion_source,
well_pump_type=model.well_pump_type,
…t currently monitored" - Treat source monitoring_frequency value Complete as no monitoring frequency - Map Complete rows to monitoring_status = Not currently monitored - Add schema and import regression coverage for the normalization - Add unit and BDD coverage for the normalization behavior
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 39 out of 39 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (1)
services/well_inventory_csv.py:803
WellInventoryRow.completion_sourceis now anOriginTypeenum, butCreateWell.well_completion_date_sourceis typed asstr | Noneand is later written toDataProvenance.origin_typeinadd_thing. Passing the enum through here risks persisting the enum repr (e.g.,OriginType.<member>) instead of the lexicon term. Convert to the enum's.value(orNone) before buildingCreateWell, similar to how other lexicon-backed fields are normalized above (e.g.,elevation_method,depth_source).
well_completion_date=model.date_drilled,
well_completion_date_source=model.completion_source,
well_pump_type=model.well_pump_type,
… to Unknown Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 39 out of 39 changed files in this pull request and generated 5 comments.
Comments suppressed due to low confidence (1)
tests/features/steps/well-inventory-csv.py:1
selectis used but not imported in this file (at least in the provided diff context). Addfrom sqlalchemy import select(or equivalent) near the top so this step doesn’t raiseNameError.
| elevation_m = convert_ft_to_m(elevation_ft) | ||
| elevation_ft = model.elevation_ft | ||
| elevation_m = ( | ||
| convert_ft_to_m(float(elevation_ft)) if elevation_ft is not None else 0.0 |
There was a problem hiding this comment.
When elevation_ft is missing, this writes an elevation of 0.0 meters, which is a real value and can be indistinguishable from valid near-sea-level elevations. Prefer persisting None (or omitting the field) when elevation is absent; if the database requires a non-null elevation, consider enforcing it as required at the schema level instead of silently defaulting to 0.
| convert_ft_to_m(float(elevation_ft)) if elevation_ft is not None else 0.0 | |
| convert_ft_to_m(float(elevation_ft)) if elevation_ft is not None else None |
| if commit: | ||
| session.commit() | ||
| session.refresh(thing) | ||
|
|
||
| for note in thing.notes: | ||
| session.refresh(note) | ||
| else: | ||
| session.flush() |
There was a problem hiding this comment.
This changes add_thing(commit=False) behavior: previously the function refreshed thing (and its notes) even when only flushing. If any callers rely on refreshed relationships/attributes after commit=False, they may now observe stale/incomplete state. Consider restoring session.refresh(thing) (and any required refreshes) after session.flush() as well, while keeping the conditional rollback behavior.
| WellStatusField: TypeAlias = Annotated[ | ||
| Optional[MonitoringStatus], | ||
| BeforeValidator(flexible_lexicon_validator(MonitoringStatus)), |
There was a problem hiding this comment.
WellStatusField typed as MonitoringStatus is semantically confusing and makes it hard to reason about which lexicon category is intended for well status vs monitoring status. Consider introducing a dedicated enum/type alias (e.g., WellStatus or StatusValue) and use that for well_status, leaving MonitoringStatus for monitoring-specific status.
| WellStatusField: TypeAlias = Annotated[ | |
| Optional[MonitoringStatus], | |
| BeforeValidator(flexible_lexicon_validator(MonitoringStatus)), | |
| WellStatus: TypeAlias = MonitoringStatus | |
| WellStatusField: TypeAlias = Annotated[ | |
| Optional[WellStatus], | |
| BeforeValidator(flexible_lexicon_validator(WellStatus)), |
| progress_callback, "No valid rows were available for import." | ||
| ) | ||
| except Exception as exc: | ||
| logging.exception("Unexpected error in _import_well_inventory_csv") |
There was a problem hiding this comment.
On unexpected exceptions this returns early without explicitly rolling back the session. Since the function is operating on an open SQLAlchemy session (and may have pending/failed nested transactions), explicitly calling session.rollback() before returning helps ensure the connection is returned to the pool cleanly and avoids “transaction is aborted” issues in subsequent operations within the same context.
| logging.exception("Unexpected error in _import_well_inventory_csv") | |
| logging.exception("Unexpected error in _import_well_inventory_csv") | |
| session.rollback() |
|
|
||
| existing_well = _find_existing_imported_well(session, model) | ||
| if existing_well is not None: | ||
| return existing_well.name |
There was a problem hiding this comment.
Returning a truthy value for already-imported rows makes the importer treat “skipped” rows as “imported” (e.g., imported_count += 1 and total_rows_imported can include duplicates on reruns). Consider returning a sentinel (e.g., None) or a structured result that distinguishes created vs skipped, so summaries and progress reporting remain accurate while still surfacing which well IDs were detected as duplicates.
| return existing_well.name | |
| logging.info( | |
| "Skipping import for existing well '%s' (id=%s)", existing_well.name, getattr(existing_well, "id", None) | |
| ) | |
| return None |
Why
This PR addresses the following problem / context:
public_availability_acknowledgement,monitoring_status,well/water notes, and water level observations were not being persisted correctly.monitoring_frequencyvalues likeCompletethat do not map cleanly to the current lexicon and needed importer-side normalization.How
Implementation summary - the following was changed / added / removed:
Database mapping:
public_availability_acknowledgementnow maps toLocation.release_status(True → public, False → private, unset → draft).monitoring_statusis now written to theStatusHistorytable.well_notesandwater_notesare now stored as polymorphic notes on theThing.WellInventoryRowschema:site_name,elevation_ft,elevation_method,monitoring_point_height_ft, anddepth_to_water_ftoptional.strtypes with lexicon-based enums forelevation_method,depth_source,well_pump_type,monitoring_status,sample_method, anddata_quality.flexible_lexicon_validatorfor case-insensitive, whitespace-tolerant matching.contact_nameorcontact_organization.Noneinstead of persisting invalid empty strings.Best-effort logic:
validation_errorslist.UnboundLocalErrorcaused by auto-generated well IDs.commit=Falsesupport soservices/thing_helper.pyandservices/contact_helper.pyparticipate correctly in the outer best-effort transaction without prematurely committing.Lexicon and import compatibility:
monitoring_frequency = Completesource values so they do not create a monitoring frequency record and instead setmonitoring_statustoNot currently monitored.BDD test suite:
tests/features/data/to use valid lexicon terms and properly quoted comma-containing values.Givensteps to isolate tests and prevent primary key conflicts.well-inventory-csv.featureto assert partial success (e.g., "1 well is imported") in negative scenarios where best-effort behavior is expected.Notes
Any special considerations, workarounds, or follow-up work to note?
ocotilloapi_testafter schema or lexicon changes to avoid stale test database state.