A Databricks-native cloud inventory platform built on StackQL. Engineers define SQL queries against cloud provider APIs, schedule them as Databricks Jobs, and surface the results as governed data in Unity Catalog - powering dashboards and Genie spaces without any external infrastructure.
Cloud teams typically have no single view of their infrastructure across providers. They rely on fragmented scripts, manual exports, or expensive third-party CSPM tools to answer basic questions like "what EC2 instances are running in us-east-1?" or "which GCP service accounts have owner-level bindings?".
StackQL Cloud Inventory solves this by treating cloud provider APIs as SQL tables. Engineers write standard SQL queries using StackQL syntax, test them interactively in a built-in IDE, and schedule them to run on a recurring basis. Results land in a managed Postgres instance (Lakebase) inside the Databricks workspace, federated into Unity Catalog, and immediately available for dashboards, Genie spaces, and ad-hoc DBSQL queries.
The entire platform runs inside a single Databricks workspace. There is no external query engine, no separate database to manage, and no additional infrastructure to operate.
A typical workflow looks like this:
-
An engineer opens the app and navigates to Provider Config to map cloud provider credentials (AWS, GCP, Azure, Databricks, etc.) to Databricks secret scope keys. Credentials never leave the workspace.
-
The engineer opens the SQL IDE and writes a StackQL query - for example, listing all running EC2 instances across regions. They select the relevant provider, hit Run, and see results inline as a dataframe.
-
If the query is useful, they save it to the query library and navigate to Schedules to configure it as a recurring Databricks Job. They name the output table, set a cron expression, and the app provisions the Job automatically.
-
The Job runs on schedule, writes results to a Lakebase table in the
stackql_inventoryschema, and refreshes any materialised views over that table. -
The inventory data is available in Unity Catalog via Lakehouse Federation. Data consumers query it via DBSQL, build dashboards, or use a Genie space to ask natural language questions about their cloud infrastructure.
At any point, the engineer can use the built-in AI assistant (powered by Claude) to help write StackQL queries from a plain-English description, explain an existing query, or interpret what the results mean in operational terms.
flowchart LR
A([Engineer]) -->|writes query| B[SQL IDE]
B -->|ad-hoc run| C[pystackql]
C -->|REST/GraphQL| D[(Cloud Provider APIs\nAWS · GCP · Azure · Databricks)]
B -->|saves query + schedule| E[(Lakebase\nstackql_app schema)]
E -->|provisions| F[Databricks Job]
F -->|runs on schedule| C
C -->|writes results| G[(Lakebase\nstackql_inventory schema)]
G -->|federated read| H[Unity Catalog\nForeign Catalog]
H --> I[DBSQL Dashboards]
H --> J[Genie Space]
A([Engineer]) -->|natural language| K[AI Assistant]
K -->|generates StackQL| B
| Component | Technology | Purpose |
|---|---|---|
| Web app | Streamlit (Databricks Apps) | IDE, schedules, inventory browser, provider config |
| SQL IDE | Monaco editor via streamlit-code-editor | Write and test StackQL queries interactively |
| AI assistant | Anthropic Claude API | Query writing help and results interpretation |
| StackQL execution | pystackql | Executes StackQL queries against cloud provider APIs |
| App metadata store | Lakebase (managed Postgres) | Queries, schedules, provider config |
| Inventory store | Lakebase (managed Postgres) | Cloud inventory tables and materialised views |
| Data governance | Unity Catalog (Lakehouse Federation) | Federated catalog over inventory schema |
| Job scheduling | Databricks Jobs | Runs scheduled inventory queries |
| Secret management | Databricks Secrets | Cloud provider credentials and API keys |
| App infra | Databricks Asset Bundles | Lakebase, federation, app deployment, Job template |
| Demo infra | stackql-deploy | Cloud provider resources for demo/test environments |
Any provider available in the StackQL Provider Registry is supported. Common targets:
- AWS (EC2, IAM, S3, RDS, Lambda, and 60+ other services)
- Google Cloud (Compute, IAM, Storage, BigQuery, and others)
- Azure (Compute, Networking, IAM, and others)
- Databricks (workspaces, clusters, jobs, SQL warehouses)
- GitHub, Cloudflare, Okta, and others
Provider auth is configured per-provider via Databricks secret scope references. The app resolves credentials at query execution time and does not persist them in the application layer.
- Databricks workspace with Databricks Apps enabled
- Lakebase (managed Postgres) provisioned in the workspace
- Unity Catalog enabled with Lakehouse Federation
- Databricks CLI v0.200+ for deployment
- A Databricks secret scope (
stackql-inventory) containing:- Lakebase connection credentials
- Cloud provider auth credentials (one or more providers)
- Anthropic API key (for AI assistant)
streamlit- app frameworkstreamlit-code-editor- Monaco SQL editor componentpystackql- StackQL Python wrapperanthropic- Claude API clientdatabricks-sdk- workspace client (Jobs, Secrets)sqlalchemy+psycopg2-binary- Lakebase connectionpandas- query result handlingcroniter- cron expression parsing and previewpython-dotenv- local development env loading
- Databricks CLI with Asset Bundles support
stackql-deploy(for demo/test cloud provider resource provisioning)
- Docker (local Postgres standing in for Lakebase)
- Python 3.11+
stackql-cloud-inventory/
|- app/ <- Databricks App source
| |- src/
| | |- pages/ <- Streamlit pages
| | |- components/ <- Reusable UI components
| | |- db/ <- Lakebase models and service
| | |- services/ <- StackQL, Jobs, AI service wrappers
| | |- main.py
| |- requirements.txt
| |- app.yml <- Databricks App manifest
|- infra/
| |- bundles/ <- Databricks Asset Bundles
| | |- databricks.yml
| | |- resources/
| |- stackql-deploy/ <- Demo/test cloud provider resources
| |- stackql_manifest.yml
| |- resources/
|- scripts/
| |- init_lakebase.sql <- Lakebase schema DDL
|- .vscode/
| |- launch.json <- Local run and debug configs
|- .gitignore
|- README.md
|- ARCHITECTURE.md
|- CLAUDE.md
The app can be run locally against a real Databricks workspace (for secrets and Jobs) with a local Postgres instance standing in for Lakebase. See ARCHITECTURE.md for the local dev capability matrix and CLAUDE.md for setup instructions.
databricks apps run-local --prepare-environment --debug
or simply:
cd app && streamlit run src/main.py
Infrastructure is deployed in two steps:
1. Databricks-side resources via Asset Bundles:
cd infra/bundles
databricks bundle deploy --target dev
This provisions Lakebase, Lakehouse Federation, the UC foreign catalog, secret scope bindings, the Databricks App, and the template Job definition.
2. App deployment:
databricks apps deploy stackql-cloud-inventory \
--source-code-path /Workspace/Users/<you>/...
3. (Optional) Demo cloud provider resources via stackql-deploy:
cd infra/stackql-deploy
stackql-deploy build stackql_manifest.yml --env dev
MIT