Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
06fe5fe
initial benchmarking and inference layer
RishikeshRanade Apr 2, 2026
ba7ed71
adding visualization layer and improving readme
RishikeshRanade Apr 2, 2026
35adf9d
adding line plot visualization
RishikeshRanade Apr 2, 2026
480655c
fixing issues with visualization and merging workflows
RishikeshRanade Apr 2, 2026
d6915d0
refactoring nim evaluation
RishikeshRanade Apr 3, 2026
37e48ed
adding headers
RishikeshRanade Apr 3, 2026
08166df
adding caching capability and updating docstrings
RishikeshRanade Apr 3, 2026
f3ba6d5
refactoring code
RishikeshRanade Apr 6, 2026
336f3c1
adding distributed calculation and cleaning up
RishikeshRanade Apr 6, 2026
c84c911
Merge pull request #2 from RishikeshRanade/visualization-layer
RishikeshRanade Apr 6, 2026
15296e3
renaming example and adding matrix evaluation configs
RishikeshRanade Apr 6, 2026
fca40cd
Revise README for model evaluation and benchmarking
ram-cherukuri Apr 9, 2026
e8769ad
Revise README for OOB Benchmarking section
ram-cherukuri Apr 10, 2026
4d4703b
domain-scoped metrics, aggregate volume visual, and naming cleanup
ktangsali Apr 10, 2026
362ee52
Revise README for clarity and customization options
ram-cherukuri Apr 10, 2026
cb8f8ac
Update README for benchmarking workflow sections
ram-cherukuri Apr 10, 2026
d0c37f4
update api
ktangsali Apr 10, 2026
0124bee
remove xmgn and fgnet volume, because they don't exist
ktangsali Apr 10, 2026
eb0be99
add notebooks after validation
ktangsali Apr 10, 2026
be2d3f6
add last notebook
ktangsali Apr 10, 2026
a4d49a7
use pnemo functionals for knn
ktangsali Apr 14, 2026
d9948fe
add deprecation notice
ktangsali Apr 14, 2026
f172a8a
add files for DrivAerML
ktangsali Apr 14, 2026
b8a3e6e
cleaning up readme, adding ci tests and contributing details
RishikeshRanade Apr 14, 2026
2dd3806
add tutorial notebook on adding a dataset adaptor
ktangsali Apr 17, 2026
101f476
add notebook showing adding of a new model
ktangsali Apr 21, 2026
91e9c5a
add tutorials for adding a new metric
ktangsali Apr 22, 2026
f197230
fixing point models issue
RishikeshRanade Apr 24, 2026
4073653
enable automated checkpoint download from HF and NGC
RishikeshRanade Apr 24, 2026
69bc4e8
refactoring model/inference backbones and fixing NGC/HF paths
RishikeshRanade Apr 24, 2026
e190acf
minor update to readme
RishikeshRanade Apr 24, 2026
6f74850
add API docs
ktangsali Apr 24, 2026
4fe809b
add files after testing hugging face checkpoints
ktangsali Apr 29, 2026
9ebe7e9
add files after testing surface and volume checkpoints
Apr 30, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
175 changes: 175 additions & 0 deletions .cursor/skills/create-custom-metric/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
---
name: create-custom-metric
description: >-
Create a custom metric for the PhysicsNeMo CFD benchmarking workflow.
Use when the user wants to add a new evaluation metric, implement a custom
error measure, compute force coefficients, or extend the benchmark with
domain-specific quantities.
---

# Create a Custom Metric

Guide the user through adding a new metric to the benchmarking workflow.

## Reference files to read first

- `physicsnemo/cfd/postprocessing_tools/metric_registry.py` — `register_metric`, `get_metric`, `MetricFn`
- `physicsnemo/cfd/evaluation/metrics/builtin/forces.py` — `drag_error`, `lift_error` (dict-returning, mesh-based)
- `physicsnemo/cfd/evaluation/metrics/builtin/l2.py` — L2 metrics (scalar-returning, numpy fallback)
- `physicsnemo/cfd/evaluation/metrics/mesh_bridge.py` — `build_comparison_mesh`
- `physicsnemo/cfd/postprocessing_tools/metrics/aero_forces.py` — `compute_force_coefficients` (normals, areas, integration)
- `workflows/benchmarking_workflow/notebooks/adding_a_new_metric.ipynb` — end-to-end tutorial

## Metric function signature

Metrics are plain callables, no base class:

```python
MetricFn = Callable[..., float | dict[str, float]]
```

**Modern signature** (accepts extended engine kwargs):

```python
def my_metric(
ground_truth: dict, # canonical GT: {"pressure": ..., "shear_stress": ...}
predictions: dict, # canonical predictions from decode_outputs
*,
case: Any = None, # CanonicalCase from the dataset adapter
comparison_mesh: Any = None, # PyVista mesh with GT + pred arrays attached
metric_dtype: str | None = None, # "cell" or "point"
output: Any = None, # OutputConfig with field name mappings
**_: object, # absorb unknown kwargs
) -> float | dict[str, float]:
...
```

**Return types**:
- `float` — single scalar value (e.g., L2 error)
- `dict[str, float]` — multiple values; keys are auto-flattened by the engine: `{"error": 0.1, "pred": 42.0}` from metric `side_force` becomes `side_force_error` and `side_force_pred` in results

## Step 1: Write the metric function

### Simple array-based metric (no mesh needed)

```python
import numpy as np

def mae_pressure(ground_truth, predictions, **_):
gt = np.asarray(ground_truth.get("pressure", []), dtype=np.float64).ravel()
pred = np.asarray(predictions.get("pressure", []), dtype=np.float64).ravel()
if gt.size == 0 or pred.size == 0 or gt.shape != pred.shape:
return float("nan")
return float(np.mean(np.abs(gt - pred)))
```

### Mesh-based metric (uses normals, areas, geometry)

Use `_resolve_mesh` pattern to get the comparison mesh, then access arrays:

```python
from physicsnemo.cfd.evaluation.metrics.mesh_bridge import build_comparison_mesh

def _resolve_mesh(predictions, *, case, comparison_mesh, metric_dtype, output):
if comparison_mesh is not None and metric_dtype is not None:
return comparison_mesh, metric_dtype
if case is not None and output is not None:
return build_comparison_mesh(case, predictions, output)
return None, None

def my_force_metric(ground_truth, predictions, *, case=None, comparison_mesh=None,
metric_dtype=None, output=None, **_):
mesh, dtype = _resolve_mesh(predictions, case=case, comparison_mesh=comparison_mesh,
metric_dtype=metric_dtype, output=output)
if mesh is None or output is None:
return float("nan")

# Access fields by VTK array name from output config
p = mesh.cell_data[output.mesh_field_names["pressure"]]
wss = mesh.cell_data[output.mesh_field_names["shear_stress"]]

# Access mesh geometry
mesh = mesh.compute_normals().compute_cell_sizes()
normals = mesh["Normals"] # (N, 3)
areas = mesh["Area"] # (N,)

# Compute your metric...
return float(result)
```

## Step 2: Register the metric

```python
from physicsnemo.cfd.postprocessing_tools.metric_registry import register_metric

register_metric("my_metric", my_metric_fn, domain="surface") # or "volume" or None
```

- `domain="surface"` — only used when model's inference domain is surface
- `domain="volume"` — only used for volume inference
- `domain=None` — domain-agnostic fallback
- Same name can be registered for both domains with different functions (like `l2_pressure`)

## Step 3: Use in benchmark config

Add the metric name to the `metrics` list:

```python
config = Config.from_dict({
...
"metrics": ["l2_pressure", "drag", "lift", "my_metric"],
...
})
```

Or in YAML:
```yaml
metrics:
- l2_pressure
- my_metric
```

Per-metric kwargs can be passed as a dict:
```yaml
metrics:
- name: my_metric
some_param: 42
```

## Step 4: Make permanent (optional)

Add to `physicsnemo/cfd/evaluation/metrics/builtin/` and register from `builtin/__init__.py`:

```python
def register_my_metrics():
register_metric("my_metric", my_fn, domain="surface")

# In __init__.py:
def register_all_builtin_metrics():
register_l2_metrics()
register_force_metrics()
register_physics_metrics()
register_my_metrics() # add this
```

## Existing built-in metrics

| Name | Domain(s) | Returns |
|------|-----------|---------|
| `l2_pressure` | surface, volume | `float` |
| `l2_shear_stress` | surface | `dict` |
| `l2_pressure_area_weighted` | surface | `float` |
| `l2_velocity` | volume | `dict` |
| `l2_turbulent_viscosity` | volume | `float` |
| `drag` | surface | `dict` (error, true, pred) |
| `lift` | surface | `dict` (error, true, pred) |
| `continuity_residual_l2` | volume | `float` |
| `momentum_residual_l2` | volume | `float` |

## Gotchas

- **Dict flattening**: if metric returns `{"error": 0.1, "true": 5.0}`, engine stores as `metricname_error` and `metricname_true`. An empty string key `""` maps to just `metricname`.
- **NaN handling**: return `float("nan")` for failures; engine accumulates NaN gracefully.
- **Legacy fallback**: engine tries extended kwargs first; on `TypeError` it falls back to `fn(gt, predictions, **mkwargs)` only. Modern metrics should accept `**_` to absorb unknowns.
- **Results JSON format**: `benchmark_results.json` is a plain `list[dict]`, not `{"results": [...]}`.
- **OutputConfig field names**: surface uses `output.mesh_field_names` / `output.ground_truth_mesh_field_names`; volume uses `output.volume_mesh_field_names` / `output.ground_truth_volume_mesh_field_names`.
170 changes: 170 additions & 0 deletions .cursor/skills/create-dataset-adapter/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
---
name: create-dataset-adapter
description: >-
Create a new dataset adapter for the PhysicsNeMo CFD benchmarking workflow.
Use when the user wants to add a new CFD dataset, write a DatasetAdapter,
integrate a new mesh format, or benchmark models on custom data.
---

# Create a Dataset Adapter

Guide the user through adding a new CFD dataset to the benchmarking workflow by writing a `DatasetAdapter` subclass.

## Reference files to read first

Before starting, read these files for context:

- `physicsnemo/cfd/evaluation/datasets/adapter_registry.py` — base class and registry
- `physicsnemo/cfd/evaluation/datasets/schema.py` — `CanonicalCase` and `build_predictions_dict`
- `physicsnemo/cfd/evaluation/datasets/adapters/drivaerml.py` — reference adapter implementation
- `workflows/benchmarking_workflow/notebooks/adding_a_new_dataset.ipynb` — end-to-end tutorial

## Step 1: Explore the new dataset

Ask the user for the dataset path, then inspect one file:

```python
import pyvista as pv
mesh = pv.read("<path_to_one_file>")
print(f"Type: {type(mesh).__name__}, Points: {mesh.n_points}, Cells: {mesh.n_cells}")
print(f"Cell arrays: {list(mesh.cell_data.keys())}")
print(f"Point arrays: {list(mesh.point_data.keys())}")
```

Identify these differences from the canonical schema:

| Question | What to look for |
|----------|-----------------|
| File format | `.vtp`, `.vtu`, `.vtk`, or other? Model wrappers expect `.vtp` (surface) or `.vtu` (volume) XML format. |
| Directory layout | Flat directory? Nested `run_<id>/` dirs? How are case IDs derived from filenames? |
| Pressure field name | The canonical key is `pressure`. What is the VTK array name? |
| WSS field name | The canonical key is `shear_stress` (N, 3). Is it a single vector or separate scalar components? |
| Sign conventions | Compare field ranges with DrivAerML. Are normals, WSS, or pressure flipped? |
| Extra arrays | Are there explicit `Normals` or `Area` arrays? DrivAerML has none — remove them if present. |
| STL files | Are separate STL geometry files available? If not, the surface mesh itself is the geometry. |
| Inference domain | Surface (`.vtp`) or volume (`.vtu`)? |

## Step 2: Write the adapter class

Subclass `DatasetAdapter` with these methods:

```python
from pathlib import Path
from physicsnemo.cfd.evaluation.datasets.adapter_registry import DatasetAdapter, register_adapter
from physicsnemo.cfd.evaluation.datasets.schema import CanonicalCase

class MyDatasetAdapter(DatasetAdapter):
def __init__(self, root: str, **kwargs):
self._root = Path(root)

@classmethod
def inference_domain_from_kwargs(cls, kwargs=None):
return "surface" # or "volume"

def list_cases(self, split=None):
# Return list of case ID strings
...

def load_case(self, case_id: str) -> CanonicalCase:
# 1. Read the mesh file
# 2. Build ground_truth dict with canonical keys:
# - "pressure": np.float32 array
# - "shear_stress": np.float32 array of shape (N, 3)
# For volume: "pressure", "velocity" (N,3), "turbulent_viscosity"
# 3. Return CanonicalCase(case_id, mesh_path, mesh_type, ground_truth, inference_domain)
...
```

### Common transformations in `load_case`

**Format conversion** (legacy `.vtk` → `.vtp`):
```python
mesh = pv.read(vtk_path).extract_surface()
mesh.save(vtp_path)
```

**Combining separate WSS scalars into a vector:**
```python
wss = np.stack([mesh.cell_data["WSSx"], mesh.cell_data["WSSy"], mesh.cell_data["WSSz"]], axis=1)
```

**Removing explicit Normals/Area** (DrivAerML convention):
```python
for key in ["Normals", "Area"]:
if key in mesh.cell_data:
del mesh.cell_data[key]
```

**Creating STL from surface mesh** (when no STL is shipped):
```python
mesh.extract_surface().triangulate().save(stl_path)
```

The STL must be named `drivaer_{int(case_id)}.stl` in the same directory as the VTP for the model wrappers to find it.

### Caching pattern

Do expensive conversions lazily and cache:

```python
def _prepare_case(self, case_id):
prepared_path = self._root / "_prepared" / f"{case_id}.vtp"
if not prepared_path.exists():
# ... convert and save
return str(prepared_path)
```

## Step 3: Register and test

```python
register_adapter("my_dataset", MyDatasetAdapter)

adapter = MyDatasetAdapter(root="/path/to/data")
cases = adapter.list_cases()
case = adapter.load_case(cases[0])
assert case.ground_truth is not None
assert "pressure" in case.ground_truth
```

## Step 4: Run inference and benchmark

Build a config and run:

```python
from physicsnemo.cfd.evaluation.config import Config
from physicsnemo.cfd.evaluation.benchmarks.engine import run_benchmark

config = Config.from_dict({
"run": {"device": "cuda:0", "output_dir": "results"},
"model": {"name": "<model_name>", "inference_domain": "<surface|volume>", ...},
"dataset": {"name": "my_dataset", "root": "/path/to/data", "case_ids": cases[:2]},
"output": {
"ground_truth_mesh_field_names": {"pressure": "<vtk_gt_name>", "shear_stress": "<vtk_gt_name>"},
"mesh_field_names": {"pressure": "<vtk_pred_name>", "shear_stress": "<vtk_pred_name>"},
},
"metrics": ["l2_pressure", "l2_shear_stress", "drag", "lift"],
"reports": {"enabled": False},
})
results = run_benchmark(config)
```

## Step 5: Make permanent (optional)

Save the adapter to `physicsnemo/cfd/evaluation/datasets/adapters/<name>.py` and register in `adapters/__init__.py`:

```python
from physicsnemo.cfd.evaluation.datasets.adapters.<name> import MyDatasetAdapter
register_adapter("my_dataset", MyDatasetAdapter)
```

## Why conventions must match the training data

The field name mappings, sign conventions, and format conversions in the adapter exist because the model checkpoint was trained on a specific dataset (e.g., DrivAerML) with specific conventions. The adapter bridges the gap between the new dataset's conventions and the training data's conventions — not some abstract standard. If a model is retrained directly on the new dataset, the adapter would not need these transformations. When writing an adapter, always ask: "What conventions did the model's training data use?" and map to those.

## Gotchas

- **DistributedManager**: Model wrappers call `DistributedManager.initialize()`. In notebooks without `torchrun`, set env vars first: `WORLD_SIZE=1`, `RANK=0`, `LOCAL_RANK=0`, `MASTER_ADDR=localhost`, `MASTER_PORT=12355`.
- **STL naming**: DoMINO looks for `drivaer_{tag}.stl`, GeoTransolver looks for `drivaer_{tag}_single_solid.stl` then `*.stl`. Both now fall back to any `*.stl` in the directory.
- **VTP vs VTK**: Model wrappers use VTK XML readers internally. Legacy `.vtk` files must be converted to `.vtp`/`.vtu`.
- **Checkpoint loading**: Some wrappers need `trusted_torch_load_context()` for PyTorch 2.6+ checkpoint compatibility.
- **Domain-scoped metrics**: `l2_pressure` resolves to different implementations for surface vs volume based on `inference_domain`. Use the same metric name for both.
Loading