Llama3 like weight init by le1nux · Pull Request #435 · Modalities/modalities

le1nux · 2026-03-04T17:55:05Z

What does this PR do?

This PR ..

General Changes

..

Breaking Changes

..

Checklist before submitting final PR

My PR is minimal and addresses one issue in isolation
I have merged the latest version of the target branch into this feature branch
I have reviewed my own code w.r.t. correct implementation, missing type hints, proper documentation, etc.
I have run a sample config for model training
I have checked that all tests run through (python tests/tests.py)
I have updated the internal changelog (CHANGELOG_DEV.md)

…cy check

tests/test_initialization_fsdpx.py

src/modalities/registry/components.py

src/modalities/models/gpt2/llama3_like_initialization.py

BlueCrescent · 2026-03-04T19:43:54Z

src/modalities/models/gpt2/llama3_like_initialization.py

+                    match_count += 1
+                    hits[weight_regex] += 1
+            if match_count == 0:
+                logger.warning(f"Parameter {parameter_name} did not match any regex for initialization")


should we add a flag which turns this into an error?

Since the norms are initialized within the model factory via reset_parametersthis would always throw an error.

src/modalities/models/gpt2/llama3_like_initialization.py

AbasKhan · 2026-03-04T22:12:27Z

src/modalities/models/gpt2/llama3_like_initialization.py

+                b=2,
+            ),
+            # final attention projection in attention block
+            r"transformer\.h\.\d+\.attn\.c_proj\.weight": partial(


This corresponds to following right ?, but in there you can see for out projection its std=init_std , which can be intialized differently and defaults to depth_init , because here we pass weight_init_std , which default to depth_init in titan here. If we dont want depth init then it matches scaled out_projections logic when depth_init is False for titan

I implemented depth_init to be fully compliant

src/modalities/models/gpt2/llama3_like_initialization.py

AbasKhan · 2026-03-04T22:55:06Z

src/modalities/models/gpt2/llama3_like_initialization.py

+    def __init__(self, num_layers: int, n_embd: int, bias: bool) -> None:
+        super().__init__()
+
+        self.regex_to_init = {


we also need regex patterns for attention_norm, ffn_norm, and the final lm_head_normnai ?. Something like

r"transformer\.h\.\d+\.(attention_norm|ffn_norm)\.weight": nn.init.ones_, r"transformer\.lm_head_norm\.weight": nn.init.ones_,

modalities/src/modalities/models/model_factory.py

Line 269 in b704331

module.reset_parameters()

we already call this here.

and due to recursion we also call it for the RMSNorm.
https://github.com/pytorch/pytorch/blob/65762ca85745d786ab6b20e9cb060242b51e872d/torch/nn/modules/normalization.py#L407

AbasKhan · 2026-03-06T09:19:16Z

src/modalities/models/gpt2/llama3_like_initialization.py

+                if re.fullmatch(weight_regex, parameter_name):
+                    init_fn, arg_dict = regex_to_init[weight_regex]
+                    if arg_dict["std"] is not None and callable(arg_dict["std"]):
+                        if not depth_init:


Isnt this dead code now ? , std becomes a callable only when depth_init is True right ?, so this check is not needed

I added it as a safety check but you're right it's kinda redundant. I'll remove it!

le1nux added 3 commits March 4, 2026 18:36

feat: implemented Llama3-like initialization for GPT2 models

5f5e616

feat: implemented llama3 weight init tests

4dea496

feat: added Llama3-like initialization test config

b704331

le1nux marked this pull request as ready for review March 4, 2026 17:55

le1nux added 2 commits March 4, 2026 19:48

chore: improved test coverage for llama-like weight init

34a8621

refactor: we only init bias if set to true in config. added consisten…

c7bcaaa

…cy check

le1nux requested a review from AbasKhan March 4, 2026 18:52

le1nux changed the base branch from main to improve_data_writeout_perf March 4, 2026 19:22

BlueCrescent requested changes Mar 4, 2026

View reviewed changes

AbasKhan requested changes Mar 4, 2026

View reviewed changes

le1nux added 2 commits March 5, 2026 22:41

refactor: changed the intialization to allow for depth_init

2a171aa

refactor: removed bias from llama3 init test and added depth_init tests

43a9d50

le1nux requested review from AbasKhan and BlueCrescent March 5, 2026 21:47

AbasKhan requested changes Mar 6, 2026

View reviewed changes

chore: removed redundant consistency check

2549e8b

le1nux merged commit 6a17097 into improve_data_writeout_perf Mar 6, 2026
3 checks passed

le1nux deleted the llama3_like_weight_init branch March 6, 2026 09:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama3 like weight init#435

Llama3 like weight init#435
le1nux merged 8 commits intoimprove_data_writeout_perffrom
llama3_like_weight_init

le1nux commented Mar 4, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BlueCrescent Mar 4, 2026

Uh oh!

le1nux Mar 5, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AbasKhan Mar 4, 2026

Uh oh!

le1nux Mar 5, 2026

Uh oh!

Uh oh!

AbasKhan Mar 4, 2026

Uh oh!

le1nux Mar 5, 2026 •

edited

Loading

Uh oh!

AbasKhan Mar 6, 2026

Uh oh!

le1nux Mar 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

le1nux commented Mar 4, 2026

What does this PR do?

General Changes

Breaking Changes

Checklist before submitting final PR

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BlueCrescent Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

le1nux Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AbasKhan Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

le1nux Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

AbasKhan Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

le1nux Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AbasKhan Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

le1nux Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

le1nux Mar 5, 2026 •

edited

Loading