Skip to content

[Bug]: 500 server error when re-applying replica groups service #3676

@r4victor

Description

@r4victor

Steps to reproduce

  1. Have an old replica group service submitted before Set explicit GPU defaults in ResourcesSpec and improve default GPU vendor selection #3573
  2. Re-apply the same service with the same name – the server will fail with 500 and the following log:
  Top-level `resources` is not allowed when `replicas` is a list. Specify `resources` in each replica group instead. (type=value_error)
__root__
  Missing configuration (type=value_error)

Actual behaviour

The problem is the old run can no longer pass the replica groups check because default resources changed in #3573

resources = values.get("resources")
default_resources = ResourcesSpec()
if resources and resources.dict() != default_resources.dict():
raise ValueError(
"Top-level `resources` is not allowed when `replicas` is a list. "
"Specify `resources` in each replica group instead."
)

Expected behaviour

Although we can handle the error, the check itself is very fragile since it depends on default ResourcesSpec() staying the same.

dstack version

master

Server logs

File "/Users/r4victor/Projects/dstack/dstack/src/dstack/_internal/server/routers/runs.py", line 132, in get_plan
    run_plan = await runs.get_plan(
               ^^^^^^^^^^^^^^^^^^^^
  File "/Users/r4victor/Projects/dstack/dstack/src/dstack/_internal/server/services/runs/__init__.py", line 347, in get_plan
    current_resource = await get_run_by_name(
                       ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/r4victor/Projects/dstack/dstack/src/dstack/_internal/server/services/runs/__init__.py", line 297, in get_run_by_name
    return run_model_to_run(run_model, return_in_api=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/r4victor/Projects/dstack/dstack/src/dstack/_internal/server/services/runs/__init__.py", line 757, in run_model_to_run
    run_spec = get_run_spec(run_model)
               ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/r4victor/Projects/dstack/dstack/src/dstack/_internal/server/services/runs/__init__.py", line 117, in get_run_spec
    return RunSpec.__response__.parse_raw(run_model.run_spec)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pydantic/main.py", line 572, in pydantic.main.BaseModel.parse_raw
  File "pydantic/main.py", line 549, in pydantic.main.BaseModel.parse_obj
  File "pydantic/main.py", line 364, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 2 validation errors for RunSpecResponse
configuration -> ServiceConfigurationResponse -> __root__
  Top-level `resources` is not allowed when `replicas` is a list. Specify `resources` in each replica group instead. (type=value_error)
__root__
  Missing configuration (type=value_error)

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions