NCAR · jsschreck · Mar 6, 2026 · Mar 6, 2026 · Mar 6, 2026 · Mar 6, 2026
diff --git a/README.md b/README.md
@@ -1,5 +1,9 @@
 # **E**arth **C**omputing **H**yperparameter **O**ptimization (ECHO): A distributed hyperparameter optimization package build with Optuna
 
+<p align="center">
+  <img src="docs/echo_logo.png" alt="ECHO logo" width="400"/>
+</p>
+
 ### Install
 
 To install a stable version of ECHO from PyPI, use the following command:
@@ -97,7 +101,7 @@ pbs:
   kernel: "ncar_pylib /glade/work/schreck/py37"
   bash: ["module load ncarenv/1.3 gnu/8.3.0 openmpi/3.1.4 python/3.7.5 cuda/10.1"]
   batch:
-    l: ["select=1:ncpus=8:ngpus=1:mem=128GB", "walltime=12:00:00"]
+    l: ["select=1:ncpus=8:ngpus=1:mem=128GB:gpu_type=a100_80gb", "walltime=12:00:00"]
     A: "NAML0001"
     q: "casper"
     N: "echo_trial"
@@ -164,24 +168,41 @@ The save_path field sets the location where all generated data will be saved.
 The log field allows you to save the logging details to file to save_path; they will always be printed to stdout. If this field is removed, logging details will only be printed to stdout.
 * log: boolean to save log.txt in save_path.
 
-The subfields within "pbs" and slurm" should mostly be familiar to you. In this example there would be 10 jobs submitted to pbs queue and 15 jobs to the slurm queue. Most HPCs just use one or the other, so make sure to only speficy what your system supports. The kernel field is optional and can be any call(s) to activate a conda/python/ncar_pylib/etc environment. Additional snippets that you might need in your launch script can be added to the list in the "bash" field. For example, as in the example above, loading modules before training a model is required. Note that the bash options will be run in order, and before the kernel field. Remove or leave the kernel field blank if you do not need it.
+The subfields within "pbs" and "slurm" should mostly be familiar to you. In this example there would be 10 jobs submitted to pbs queue and 15 jobs to the slurm queue. Most HPCs just use one or the other, so make sure to only specify what your system supports. The kernel field is optional and can be any call(s) to activate a conda/python/ncar_pylib/etc environment. Additional snippets that you might need in your launch script can be added to the list in the "bash" field. For example, as in the example above, loading modules before training a model is required. Note that the bash options will be run in order, and before the kernel field. Remove or leave the kernel field blank if you do not need it.
+
+**Casper GPU type selection**: On NCAR Casper, the GPU architecture is specified using `gpu_type` inside the PBS select string. The exact values and their node configurations are:
+
+| `gpu_type=` value | GPU | Notes |
+|---|---|---|
+| `v100_32gb` | V100 32 GB | |
+| `a100_40gb` | A100 40 GB | Data & Viz queue |
+| `a100_80gb` | A100 80 GB | 4 GPUs per node (ML/GPGPU) |
+| `h100_80gb` | H100 80 GB | 4 GPUs per node |
+| `gp100_16gb` | GP100 16 GB | Data & Viz queue |
+| `l40_45gb` | L40 48 GB | Data & Viz queue |
+| `mi300a` | MI300A 128 GB | Must request `ngpus=4`, exclusive-use only |
+
+Example: `-l select=1:ncpus=8:ngpus=1:mem=128GB:gpu_type=a100_80gb`
+
+See the [Casper node types documentation](https://ncar-hpc-docs.readthedocs.io/en/latest/compute-systems/casper/casper-node-types/) for the authoritative list.
 
 The subfields within the "optuna" field have the following functionality:
 * storage: sqlite or mysql destination.
 * study_name: The name of the study.
-* storage_type: Choose "sqlite" or "maria" if a MariaDB is setup. 
- * If "sqlite", the storage field will automatically be appended to the save_path field (e.g. sql:///{save_path}/mlp.db)
- * If "maria", specify the full path including username:password in the storage field (for example, mysql://user:pw@someserver.ucar.edu/optuna).
+* storage_type: Choose the backend that matches your environment:
+ * `"sqlite"` — the storage field is joined to save_path automatically (e.g. `sqlite:///{save_path}/mlp.db`). Simple and works well for single-node runs.
+ * `"maria"` — specify a full MariaDB/MySQL URL in the storage field (e.g. `mysql://user:pw@someserver.ucar.edu/optuna`). Best for large distributed studies with many concurrent workers.
+ * `"nfs"` — uses Optuna's `JournalFileBackend` (a plain append-only file) instead of a relational database. The storage field is joined to save_path. **Recommended when SQLite locking is unreliable on your shared filesystem** (common on Lustre/GPFS mounts). Example: `storage: "study_journal.log"`, `storage_type: "nfs"`.
 * objective: The path to the user-supplied objective class
 * metric: The metric to be used to determine the model performance. 
-* direction: Indicates which direction the metric must go to represent improvement (pick from maximimize or minimize)
+* direction: Indicates which direction the metric must go to represent improvement (pick from `maximize` or `minimize`). For **multi-objective optimization**, supply a list of directions (e.g. `direction: ["minimize", "maximize"]`) and a list of metrics (e.g. `metric: ["val_loss", "val_accuracy"]`). Multi-objective studies use `TPESampler` or `NSGAIISampler` and produce a Pareto front. Note: the `optuna.multi_objective` submodule was removed in optuna 4.0 — all multi-objective support now goes through the unified `create_study(directions=[...])` API, which ECHO uses automatically.
 * n_trials: The number of trials in the study.
 * gpu: Set to true to obtain the GPUs and their IDs
 * sampler
   + type: Choose how optuna will do parameter estimation. The default choice both here and in optuna is the [Tree-structured Parzen Estimator Approach](https://towardsdatascience.com/a-conceptual-explanation-of-bayesian-model-based-hyperparameter-optimization-for-machine-learning-b8172278050f), [e.g. TPESampler](https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf). See the optuna documentation for the different options. For some samplers (e.g. GridSearch) additional fields may be included (e.g. search_space). 
 * parameters
-  + type: Option to select an optuna trial setting. See the [optuna Trial documentation](https://optuna.readthedocs.io/en/stable/reference/generated/optuna.trial.Trial.html?highlight=suggest#optuna.trial.Trial.suggest_uniform) for what is available. Currently, this package supports the available options from optuna: "categorical", "discrete_uniform", "float", "int", "loguniform", and "uniform".
-  + settings: This dictionary field allows you to specify any settings that accompany the optuna trial type. In the example above, the named num_dense parameter is stated to be an integer with values ranging from 0 to 10. To see all the available options, consolt the [optuna Trial documentation](https://optuna.readthedocs.io/en/stable/reference/generated/optuna.trial.Trial.html?highlight=suggest#optuna.trial.Trial.suggest_uniform)
+  + type: Option to select an optuna trial setting. See the [optuna Trial documentation](https://optuna.readthedocs.io/en/stable/reference/generated/optuna.trial.Trial.html) for what is available. Currently, this package supports: "categorical", "float", "int", "loguniform", "uniform", and "discrete_uniform". Note: "loguniform", "uniform", and "discrete_uniform" were removed from optuna 4.0 — ECHO maps them internally to `suggest_float` for backward compatibility, but prefer "float" for new configs (use `log: True` in settings for log-uniform sampling).
+  + settings: This dictionary field allows you to specify any settings that accompany the optuna trial type. In the example above, the named num_dense parameter is stated to be an integer with values ranging from 0 to 10. To see all the available options, consult the [optuna Trial documentation](https://optuna.readthedocs.io/en/stable/reference/generated/optuna.trial.Trial.html).
 * enqueue: [Optional] Adding this option will allow the user to add trials with pre-defined values when the study is first initialized, that will be run in order according to their id. Each entry added must be structured as a dictionary with the paramater names exactly matching all the hyperparameter name field in the parameters field.
 
 Lastly, the "log" field allows you to save the logging details to file; they will always be printed to stdout. If this field is removed, logging details will only be printed to stdout.
@@ -233,12 +254,15 @@ def custom_updates(trial, conf):
     hyperparameters = conf["optuna"]["parameters"]
 
     # Now update some via custom rules
-    num_dense = trial.suggest_discrete_uniform(**hyperparameters["num_dense"])
+    # Note: suggest_discrete_uniform was removed in optuna 4.0; use suggest_int or
+    # suggest_float(..., step=q) instead.
+    settings = hyperparameters["num_dense"]["settings"]
+    num_dense = trial.suggest_int(settings["name"], settings["low"], settings["high"])
 
     # Update the config based on optuna's suggestion
-    conf["model"]["dense_hidden_dims"] = [1000 for k in range(num_dense)]        
+    conf["model"]["dense_hidden_dims"] = [1000 for k in range(num_dense)]
 
-    return conf 
+    return conf
 ```
 
 The method should be called first thing in the custom Objective.train method (see the example Objective above). You may have noticed that the configuration (named conf) contains both hyperparameter and model fields. This package will copy the hyperparameter optuna field to the model configuration for convenience, so that we can reduce the total number of class and method dependencies (which helps me keep the code generalized). This occurs in the run.py script.

diff --git a/docs/echo_logo.png b/docs/echo_logo.png
diff --git a/echo/examples/keras/hyperparameter.yml b/echo/examples/keras/hyperparameter.yml
@@ -7,7 +7,7 @@ pbs:
   gpus_per_node: 1
   bash: ["source ~/.bashrc", "conda activate echo"]
   batch:
-    l: ["select=1:ncpus=8:ngpus=1:mem=64GB", "walltime=12:00:00"]
+    l: ["select=1:ncpus=8:ngpus=1:mem=64GB:gpu_type=a100_80gb", "walltime=12:00:00"]
     A: "NAML0001"
     q: "casper"
     N: "keras_example"

diff --git a/echo/examples/torch/hyperparameter.yml b/echo/examples/torch/hyperparameter.yml
@@ -7,7 +7,7 @@ pbs:
   gpus_per_node: 1
   bash: ["source ~/.bashrc", "conda activate echo"]
   batch:
-    l: ["select=1:ncpus=8:ngpus=1:mem=64GB", "walltime=12:00:00"]
+    l: ["select=1:ncpus=8:ngpus=1:mem=64GB:gpu_type=a100_80gb", "walltime=12:00:00"]
     A: "NAML0001"
     q: "casper"
     N: "torch_example"

diff --git a/echo/optimize.py b/echo/optimize.py
@@ -198,70 +198,67 @@ def fix_broken_study(
 
 
 def generate_batch_commands(
-    hyper_config, batch_type, aiml_path, jobid, batch_commands = []
+    hyper_config, batch_type, aiml_path, jobid, batch_commands=None
 ) -> List[str]:
+    """Build the list of shell commands that run echo-run inside a job script.
+
+    When ``gpus_per_node`` is set, each GPU gets exactly one process pinned to
+    it via ``CUDA_VISIBLE_DEVICES=<device>``.  All processes are launched in
+    the background (``&``) and a single ``wait`` is appended so the job does
+    not exit until every worker finishes.
+
+    When no GPUs are present but ``tasks_per_worker > 1``, that many CPU-only
+    workers are launched in parallel the same way.
+
+    Parameters
+    ----------
+    hyper_config : dict
+        Full hyperparameter config.
+    batch_type : str
+        ``"pbs"`` or ``"slurm"`` — selects the sub-dict of ``hyper_config``.
+    aiml_path : str
+        The ``echo-run <hyper.yml> <model.yml>`` command string.
+    jobid : str
+        Scheduler job-ID variable (e.g. ``"$PBS_JOBID"`` or
+        ``"$SLURM_JOB_ID"``).
+    batch_commands : list, optional
+        Accumulator list; header lines (shebang, #PBS/#SBATCH directives, bash
+        setup) are passed in here and the run commands are appended.  Defaults
+        to a new empty list.
+
+    Returns
+    -------
+    list[str]
+        The complete script as a list of lines.
+    """
+    if batch_commands is None:
+        batch_commands = []
 
-    # Check if "gpus_per_node" is specified in hyper_config[batch_type]
     if "gpus_per_node" in hyper_config[batch_type]:
-        # Get the list of GPU devices, or convert a single integer to a list
-        gpus_per_node = list(range(hyper_config[batch_type]["gpus_per_node"]))
-
-        # Check if "tasks_per_worker" is specified in hyper_config[batch_type]
-        if (
-            "tasks_per_worker" in hyper_config[batch_type]
-            and hyper_config[batch_type]["tasks_per_worker"] > 1
-        ):
-            # Warn about the experimental nature of tasks_per_worker
-            logging.warning(
-                "The tasks_per_worker is experimental; be advised that some runs may fail."
-            )
-            logging.warning(
-                "Check the log and stdout/err files if simulations are dying to see the errors."
+        # One process per GPU, each explicitly pinned via CUDA_VISIBLE_DEVICES.
+        # gpus_per_node is an integer; convert to a 0-based device list.
+        gpus = list(range(hyper_config[batch_type]["gpus_per_node"]))
+        for device in gpus:
+            batch_commands.append(
+                f"CUDA_VISIBLE_DEVICES={device} {aiml_path} -n {jobid} &"
             )
+        batch_commands.append("wait")
 
-            # Loop over the specified number of trials
-            for copy in range(hyper_config[batch_type]["tasks_per_worker"]):
-                # Loop over each GPU device
-                for device in gpus_per_node:
-                    # Append the command with CUDA_VISIBLE_DEVICES={device} to batch_commands
-                    batch_commands.append(
-                        f"CUDA_VISIBLE_DEVICES={device}, {aiml_path} -n {jobid} &"
-                    )
-                # Allow some time between calling instances of run
-                batch_commands.append("sleep 0.5")
-            # Wait for all background jobs to finish
-            batch_commands.append("wait")
-        else:
-            # Loop over each GPU device without multiple trials
-            for device in gpus_per_node:
-                # Append the command with CUDA_VISIBLE_DEVICES={device} to batch_commands
-                batch_commands.append(
-                    f"CUDA_VISIBLE_DEVICES={device}, {aiml_path} -n {jobid} &"
-                )
-            batch_commands.append("wait")
     elif (
         "tasks_per_worker" in hyper_config[batch_type]
         and hyper_config[batch_type]["tasks_per_worker"] > 1
     ):
-        # Warn about the experimental nature of tasks_per_worker
-        logging.warning(
-            "The trails_per_job is experimental, be advised that some runs may fail."
-        )
+        # Multiple CPU-only workers sharing the node.
         logging.warning(
-            "Check the log and stdout/err files if simulations are dying to see the errors."
+            "tasks_per_worker without gpus_per_node: launching %d CPU workers in parallel.",
+            hyper_config[batch_type]["tasks_per_worker"],
         )
-        # Loop over the specified number of trials
-        for copy in range(hyper_config[batch_type]["tasks_per_worker"]):
-            # Append the command to batch_commands
-            batch_commands.append(
-                f"{aiml_path} -n {jobid} &"
-            )
-            # Allow some time between calling instances of run
-            batch_commands.append("sleep 0.5")
-        # Wait for all background jobs to finish
+        for _ in range(hyper_config[batch_type]["tasks_per_worker"]):
+            batch_commands.append(f"{aiml_path} -n {jobid} &")
         batch_commands.append("wait")
+
     else:
-        # Append the default command to batch_commands
+        # Single worker — no GPU pinning, no background execution needed.
         batch_commands.append(f"{aiml_path} -n {jobid}")
 
     return batch_commands
@@ -305,7 +302,7 @@ def prepare_pbs_launch_script(hyper_config: str, model_config: str) -> List[str]
                 pbs_options.append(f"#PBS -{arg} {val}")
         elif arg in ["o", "e"]:
             if val != "/dev/null":
-                _val = os.path.append(hyper_config["save_path"], val)
+                _val = os.path.join(hyper_config["save_path"], val)
                 # info?
                 pbs_options.append(f"#PBS -{arg} {_val}")
             else: