Modular metadata schema components for the Astromat Data Archive (ADA), built using the OGC Building Blocks pattern.
_sources/techniqueProtocols/
analyteColumns/ ← shared: schema:PropertyValueSpecification per analyte column
parameterTemplates/ ← shared: PropertyValueSpecification (readOnly:true params)
parameterValues/ ← shared: schema:PropertyValue (readOnly:false params)
vocab/ ← shared: schema:DefinedTermSet per vocabulary
tappDefinition/ ← base TAPP definition (JSON-LD class ada:TAPPDefinition)
empaTAPP/ ← first concrete TAPP profile (EMPA)
<future>TAPP/ ← additional TAPPs `$ref` the four catalog dirs above
The four catalog dirs are shared dictionary resources — multiple TAPP profiles $ref the same files when their definitions match. The tooling's share_or_write_catalog helper lets a TAPP regen overwrite its own entries (matched by $id ownership) but errors out on a collision with an entry originated by a different TAPP, so a new TAPP either reuses identical catalog entries or surfaces a renaming requirement.
tappDefinition— base TAPP. Defines theWorkflowHowTo/WorkflowStep/MethodParameter/AnalyteColumn/AnalyteIdentifierColumn$defsthat concrete TAPP profiles extend.empaTAPP— Electron Microprobe Analysis. ExtendstappDefinitionviaallOfwith EPMA top-level properties +ada:methodParameters/ada:analyteTemplate.ada:analyteColumnsconstraints referencing the shared catalog dirs. Generated fromdocs/TAPP_EPMA_filled.xlsx(the canonical TAPP template). 11 examples ship with the BB (10 publication-derived instances + a comprehensive synthetic example).
Per-dataset detail blocks pair with a TAPP definition and carry the per-instance values:
detailEMPA— paired withempaTAPP. CarriesreadOnly:falseparameter values asschema:additionalProperty[]PropertyValue entries (catalog atparameterValues/). References the empaTAPP definition viaschema:measurementTechniqueanyOf(by@idref or inline). 11 paired examples (exampledetailEMPA-P1.json…-P10.json+-all.json).
The split was made on 2026-04-28: parameters in the TAPP spreadsheet route to empaTAPP/methodParameters[] (readOnly:true) or detailEMPA/schema:additionalProperty[] (readOnly:false). Method-level constants (the ada:xxxDefault top-level properties) stay on the TAPP.
profiles/geochemProfiles/ (alongside profiles/adaProfiles/) holds technique profiles that compose a TAPP definition + detail block on top of adaProduct:
empaProfile— extendsadaProductwithschema:measurementTechniqueanyOfpointing at empaTAPP and aschema:distribution.schema:hasPartbranch that letsdetailEMPAappear.
Property building blocks that define ADA-specific metadata elements: file types, instrument details, technique-specific data structures, spatial registration, and more.
Key building blocks that extend CDIF core BBs:
- instrument — extends core CDIF instrument (
schema:Productwithnxs:BaseClass/NXinstrumentinadditionalType) - laboratory — extends core CDIF spatialExtent (
schema:Placewithnxs:BaseClass/NXsourceinadditionalType)
Metadata profiles that compose property building blocks with CDIF base schemas:
- adaProduct — base ADA product profile, composes via
allOf:cdifCore— core metadata propertiescdifDataDescription— variableMeasured with DDI-CDI extensions,@idrequirementcdifArchiveDistribution— archive distribution withhasPartcomponent filescdifProvenance—prov:wasGeneratedByprovenance activities- ADA-specific: technique types, instrument/lab/sample overlays,
ada:componentType
- 35 technique profiles — technique-specific constraints on
ada:componentTypevalues (e.g., adaSEM, adaXRD, adaICPMS, adaTEM)
Each archive hasPart item carries an ada:componentType (a single string like ada:EMPAImageMap) that classifies the file. The architecture enforces a two-level constraint:
-
File type ↔ componentType mapping — each file-type building block (
image,imageMap,tabularData,collection,dataCube,document,supDocImage,otherFile) declares a sealedenumof valid componentType values. The enum is derived from the Components worksheet ofamds-ldeo/metadata/ADA-AnalyticalMethodsAndAttributes.xlsx(the canonical mapping). E.g.ada:EMPAImageMapis valid only on parts whose@typeincludesada:imageMap. -
Profile-level constraint — a technique profile's
schema:hasPart.itemsuses a schema-levelanyOfwith three kinds of branch: (a)$reftoadaProduct/$defs/universalComponentTypeBranch(factored once, used everywhere) for universal componentTypes; (b) inline string-enum for technique-specific componentTypes that have no detail block; (c)$refto a technique-specific detail schema (e.g.detailEMPA) which pinsada:componentTypeto its technique consts and contributes detail-specific sibling properties (e.g.ada:spectrometersUsed,ada:signalUsed) flat on the hasPart item — not nested inside componentType.
After editing the Components worksheet:
python tools/apply_componentType_enums.py --refresh \
--xlsx ../../amds-ldeo/metadata/ADA-AnalyticalMethodsAndAttributes.xlsx
python tools/regenerate_schema_json.py
python tools/resolve_schema.py --all
python tools/validate_examples.py
The cached mapping at tools/componentType_enum_cache.json is committed so the apply step works on a fresh clone without spreadsheet access.
This repository imports shared schema.org and CDIF property building blocks from metadataBuildingBlocks via the OGC Building Blocks import mechanism. All external references use absolute URLs (https://cross-domain-interoperability-framework.github.io/metadataBuildingBlocks/_sources/...).
Browse the building blocks at: https://usgin.github.io/geochemBuildingBlocks/
The end-to-end pipeline for adding a new technique profile from a filled-in TAPP template spreadsheet. See docs/TAPP_TEMPLATE_GUIDE.md for what to put in the spreadsheet.
python tools/build_TAPP_from_spreadsheet.py [TAPP_NAME] [XLSX_PATH] [--pub Pn]… # 1. TAPP BB + catalogs
python tools/build_detail_BB.py [TAPP_NAME] [XLSX_PATH] [--pub Pn]… # 2. detail BB + parameterValues
python tools/build_profile_BB.py [TAPP_NAME] # 3. profile BB scaffold
python tools/build_dataset_template.py <tapp-instance.json> # 4. xlsx data-entry template
[<out.xlsx>]
All four scripts default to empaTAPP / docs/TAPP_EPMA_filled.xlsx for back-compat. The shared library at tools/_tapp_lib.py does the heavy lifting (parser, catalog emit helpers, scaffolders); the four drivers are thin wrappers.
--pub <code> (repeatable) on scripts 1 and 2 limits which publication-derived examples get regenerated — useful when migrating pub columns one at a time.
python tools/interpret_pub_analytes.py # preview only (review files)
python tools/interpret_pub_analytes.py --apply # also rewrite source xlsx
Reads publication columns whose analyte axis isn't explicitly populated and infers it from rows 48 / 59 / 64 (Halogen Correction / Primary Calibration Standard / Typical Detection Limit). Default-mode outputs:
docs/TAPP_EPMA_filled-interp.xlsx— side workbook with each<pub>-interpcolumn inserted right after its source pub for side-by-side review.build/interp-review/example<empaTAPP|detailEMPA>-<pub>-interp.json— paired review JSON instances built from the inferred data.
With --apply, additionally rewrites rows 32 / 40 / 59 / 64 of each inferred pub column in docs/TAPP_EPMA_filled.xlsx to the pipe-delim convention. After migration, the regular pipeline (build_TAPP_from_spreadsheet.py etc.) reproduces the same rich examples directly from the source — no interp loop needed.
Detection-limit values keep their full text per element (e.g. "SiO2: 0.02 wt%", "<0.03 wt% for TiO2") so context isn't lost in the migration.
tools/generate_profiles.py— generates technique-specific profile building blocks from configuration datatools/resolve_schema.py— resolve all$refinto single resolvedSchema.json filestools/regenerate_schema_json.py— generate *Schema.json from schema.yaml sources (YAML→JSON + ref rewrite)
tools/audit_building_blocks.py— comprehensive audit: file completeness, schema consistency, resolvedSchema freshness, SHACL coveragetools/audit_shacl_coverage.py— check SHACL rules cover all schema.yaml properties; reports missing/extra shapestools/validate_examples.py— validate example JSON files against resolved schemastools/validate_instance.py— profile-aware validation of ADA metadata instancestools/compare_schemas.py— detect drift between schema.yaml and *Schema.json
tools/download_ecl_methods.py— download analytical method Excel workbooks from the EarthChem Library. Reads methods list from Google Sheets, downloads available workbooks. Supports--dry-run,--output-dir.
tools/augment_register.py— add resolvedSchema URLs to build/register.json for the viewertools/generate_custom_report.py— generate HTML validation report with granular SHACL severity breakdowntools/cors_server.py— local HTTP server with CORS headers for testing the viewer
resolve_schema.py and regenerate_schema_json.py are synced from the canonical copies in metadataBuildingBlocks/tools/. Do not edit locally — update the canonical copy and run python tools/sync_resolve_schema.py --apply from the metadataBuildingBlocks repo. The audit, validation, and report tools were also sourced from that repository.
The tappDefinition building block at techniqueProtocols/tappDefinition/ defines a registry-backed Technique-Aligned Protocol Profile (TAPP) definition schema (v3). Was previously methodDefinition. A TAPP definition is modeled as a cdi:Activity + schema:Action + ada:TAPPDefinition + bios:LabProtocol.
- TAPP identity (top level) — name, DOI, version,
schema:measurementTechnique,schema:object(target materials), instrument,schema:location(laboratory/facility), software (bios:computationalTool), reagents (bios:reagent), agent - Standard workflow (
schema:actionProcess) — aschema:HowTocontaining orderedcdi:Activity+schema:Actionsteps: sample preparation, calibration, data acquisition, data processing, quality control - Parameters — typed as
schema:PropertyValueSpecificationwithschema:readonlyValue,schema:valueRequired,schema:defaultValue,schema:minValue/maxValue,schema:inDefinedTermSet(SKOS vocabulary link), andada:fieldScope(method/session/element) - Analyte template (
ada:analyteTemplate) — per-element column definitions (alsoPropertyValueSpecification) and default analyte rows - Quality metrics (
dqv:hasQualityMeasurement) — at method level and on workflow steps
Example files use the sibling example<bbName>-<variant>.json pattern (validated by tools/validate_examples.py):
exampletappDefinition-concord-glass-v1-0-6.json— EPMA WDS tephra glass (Concord University)exampletappDefinition-nmnh-spinel-oxybar-v1.json— EPMA WDS spinel oxybarometry (Smithsonian NMNH)exampletappDefinition-uoc-laicpms-glass-v1.json— LA-ICP-MS volcanic glass trace elements (University of Cologne)
For the empaTAPP profile: 10 publication-derived examples (exampleempaTAPP-P1.json … exampleempaTAPP-P10.json) plus exampleempaTAPP-all.json, a hand-authored comprehensive synthetic instance that exercises every property allowed by the resolved schema. Use the latter as a structural reference when authoring new TAPP profiles or onboarding new authors.
- Bioschemas —
bios:LabProtocol,bios:LabProcess,bios:computationalTool,bios:reagent - DDI-CDI —
cdi:Activityfor workflow steps - W3C DQV —
dqv:hasQualityMeasurementfor quality metrics - schema.org —
PropertyValueSpecificationfor parameter definitions,Action/HowTo/HowToStepfor workflow