Skip to content

hypertrial/honestroles

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

83 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HonestRoles

HonestRoles is a deterministic, config-driven pipeline runtime for job data with Polars and explicit plugin manifests.

Start With the App

Use the HonestRoles app first: honestroles.com.

Choose Your Path

  • App users: start in the browser at honestroles.com
  • Developers and integrators: use the CLI/SDK sections below

Install (Developer)

$ python -m venv .venv
$ . .venv/bin/activate
$ python -m pip install --upgrade pip
$ pip install honestroles

5-Minute First Run (Developer)

From the repository root:

$ python examples/create_sample_dataset.py
$ honestroles run --pipeline-config examples/sample_pipeline.toml --plugins examples/sample_plugins.toml
$ ls -lh examples/jobs_scored.parquet

Expected CLI diagnostics include stage_rows, plugin_counts, and final_rows.

CLI

$ honestroles ingest sync --source greenhouse --source-ref stripe --quality-policy ingest_quality.toml --strict-quality --merge-policy updated_hash --retain-snapshots 30 --prune-inactive-days 90 --format table
$ honestroles ingest validate --source greenhouse --source-ref stripe --quality-policy ingest_quality.toml --strict-quality --format table
$ honestroles ingest sync-all --manifest ingest.toml --format table
$ honestroles recommend build-index --input-parquet dist/ingest/greenhouse/stripe/jobs.parquet --policy recommendation.toml --format table
$ honestroles recommend match --index-dir dist/recommend/index/<index_id> --candidate-json examples/candidate.json --top-k 25 --include-excluded --format table
$ honestroles recommend evaluate --index-dir dist/recommend/index/<index_id> --golden-set examples/recommend_golden_set.json --thresholds recommend_eval.toml --format table
$ honestroles recommend feedback add --profile-id jane_doe --job-id 12345 --event interviewed --format table
$ honestroles publish neondb migrate --database-url-env NEON_DATABASE_URL --schema honestroles_api --format table
$ honestroles publish neondb sync --database-url-env NEON_DATABASE_URL --schema honestroles_api --jobs-parquet dist/ingest/greenhouse/stripe/jobs.parquet --index-dir dist/recommend/index/<index_id> --sync-report dist/ingest/greenhouse/stripe/sync_report.json --require-quality-pass --format table
$ honestroles publish neondb verify --database-url-env NEON_DATABASE_URL --schema honestroles_api --format table
$ honestroles init --input-parquet data/jobs.parquet --pipeline-config pipeline.toml --plugins-manifest plugins.toml
$ honestroles doctor --pipeline-config pipeline.toml --plugins plugins.toml --format table
$ honestroles reliability check --pipeline-config pipeline.toml --plugins plugins.toml --strict --format table
$ honestroles run --pipeline-config pipeline.toml --plugins plugins.toml
$ honestroles plugins validate --manifest plugins.toml
$ honestroles config validate --pipeline pipeline.toml
$ honestroles report-quality --pipeline-config pipeline.toml
$ honestroles runs list --limit 10 --command ingest.sync --format table
$ honestroles scaffold-plugin --name my-plugin --output-dir .

Python API

from honestroles import (
    HonestRolesRuntime,
    build_retrieval_index,
    evaluate_relevance,
    migrate_neondb,
    match_jobs,
    publish_neondb_sync,
    record_feedback_event,
    sync_source,
    sync_sources_from_manifest,
    summarize_feedback,
    validate_ingestion_source,
    verify_neondb_contract,
)

ingest = sync_source(
    source="greenhouse",
    source_ref="stripe",
    quality_policy_file="ingest_quality.toml",
    strict_quality=False,
    merge_policy="updated_hash",
    retain_snapshots=30,
    prune_inactive_days=90,
)
print(ingest.rows_written, ingest.output_parquet)

validation = validate_ingestion_source(
    source="greenhouse",
    source_ref="stripe",
    quality_policy_file="ingest_quality.toml",
    strict_quality=True,
)
print(validation.report.status, validation.rows_evaluated)

batch = sync_sources_from_manifest(manifest_path="ingest.toml")
print(batch.status, batch.total_sources, batch.fail_count)

index = build_retrieval_index(
    input_parquet="dist/ingest/greenhouse/stripe/jobs.parquet",
    policy_file="recommendation.toml",
)
matches = match_jobs(
    index_dir=index.index_dir,
    candidate_json="examples/candidate.json",
    top_k=25,
    include_excluded=True,
)
print(matches.status, len(matches.results))

evaluation = evaluate_relevance(
    index_dir=index.index_dir,
    golden_set="examples/recommend_golden_set.json",
    thresholds_file="recommend_eval.toml",
)
print(evaluation.status, evaluation.metrics)

record_feedback_event(profile_id="jane_doe", job_id="12345", event="interviewed")
print(summarize_feedback(profile_id="jane_doe").weights)

print(migrate_neondb(database_url_env="NEON_DATABASE_URL").status)
publish_result = publish_neondb_sync(
    database_url_env="NEON_DATABASE_URL",
    jobs_parquet="dist/ingest/greenhouse/stripe/jobs.parquet",
    index_dir=index.index_dir,
    sync_report="dist/ingest/greenhouse/stripe/sync_report.json",
)
print(publish_result.batch_id, verify_neondb_contract(database_url_env="NEON_DATABASE_URL").status)

runtime = HonestRolesRuntime.from_configs(
    pipeline_config_path="pipeline.toml",
    plugin_manifest_path="plugins.toml",
)
result = runtime.run()

print(result.diagnostics)
print(result.dataset.to_polars().head())
print(result.application_plan[:3])

Documentation

Development

$ pip install -e ".[dev,docs]"
$ pytest -q
$ pytest tests/docs -q
$ bash scripts/check_docs_refs.sh
# Optional live connector smoke (requires refs):
# HONESTROLES_SMOKE_GREENHOUSE_REF, HONESTROLES_SMOKE_LEVER_REF,
# HONESTROLES_SMOKE_ASHBY_REF, HONESTROLES_SMOKE_WORKABLE_REF
$ bash scripts/run_ingest_smoke.sh
# Optional Neon DB smoke (requires NEON_DATABASE_URL):
$ PYTHON_BIN=.venv/bin/python DATABASE_URL_ENV=NEON_DATABASE_URL SCHEMA=honestroles_api bash scripts/run_neondb_smoke.sh

For local profiling data, keep large parquet inputs under data/ and write generated artifacts under dist/ (both are ignored by git).

Maintainer Notes

  • PyPI publishing is manual and token-based via bash scripts/publish_pypi.sh.
  • The script reads PYPI_API_KEY (or PYPI_API_TOKEN) from env/.env.
  • The GitHub Release workflow is manual (workflow_dispatch) only.
  • Before publish, run deterministic gate:
$ PYTHON_BIN=.venv/bin/python bash scripts/run_coverage.sh
  • Full maintainer runbook: docs/for-maintainers/release-and-pypi.md.

License

MIT

About

Clean, filter, label, and rate job description data

Topics

Resources

License

Stars

Watchers

Forks

Contributors