Skip to content

fix: extend LISA trainer to support missing model architectures#968

Open
Bhavyashah20 wants to merge 1 commit intoOptimalScale:mainfrom
Bhavyashah20:fix/lisa-trainer-missing-architectures
Open

fix: extend LISA trainer to support missing model architectures#968
Bhavyashah20 wants to merge 1 commit intoOptimalScale:mainfrom
Bhavyashah20:fix/lisa-trainer-missing-architectures

Conversation

@Bhavyashah20
Copy link
Copy Markdown

Found this while trying to use LISA with a Gemma2 model. The
class_to_layers_map only covered 7 architectures — anything else
hits a cryptic AssertionError with no guidance on how to fix it.

Traced the issue through DynamicLayerActivationCallback.init
and rewrote the resolution logic to handle modern architectures.

Changes:

  • Expanded CLASS_TO_LAYERS_MAP from 7 to 20 architectures (Gemma2,
    Phi3, DeepSeek V2/V3, OLMo, Falcon, Cohere, Qwen2-MoE, GPT-NeoX)
  • Added dynamic fallback via getattr introspection for unknown models
  • Fixed resolution order so user-supplied lisa_layers_attribute always
    wins over the map
  • Replaced eval() calls with getattr traversal
  • Added pytest suite with 7 test cases

The class_to_layers_map in DynamicLayerActivationCallback covered only 7
model families. Attempting LISA with Gemma2, Phi3, DeepSeek, OLMo, Falcon
or any other modern architecture would hit a cryptic AssertionError with
no guidance on how to fix it.

Changes:
- Expand CLASS_TO_LAYERS_MAP from 7 to 20 architectures
- Add _get_layers_attribute() with three-tier resolution: user override
  first, then map lookup, then dynamic fallback via getattr introspection
- Replace eval() calls with _resolve_layers() using getattr traversal
- Move user override to highest priority so it can correct map entries
- Replace print() with logger.info() for distributed training compatibility
- Add pytest suite covering map lookup, fallback, override, DataParallel
  unwrapping, and non-default paths (Falcon, GPT-NeoX)

Fixes OptimalScale#862
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant