-
Notifications
You must be signed in to change notification settings - Fork 26
NVFP4 recipe with GEMM via BF16 dequant #518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
matthiasdiener
wants to merge
101
commits into
dev
Choose a base branch
from
mdiener/nvfp4-gemm
base: dev
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+594
−64
Open
Changes from all commits
Commits
Show all changes
101 commits
Select commit
Hold shift + click to select a range
d954c6d
Typo fix (#397)
Micky774 7b5cf20
ROCm UserBuffers for Comm Overlap
alextmagro 640f7e8
Copyrights and cleanup
alextmagro 82faeec
test guards
alextmagro b6a3ae4
Cleanup and RS flag race condition fix
alextmagro 9e32d3a
Debugging midpoint
alextmagro 84209ad
Cleanup and workspace fix
alextmagro c669bd2
Guard layer registration in UB
alextmagro 8040909
Cleanup of profiling example for rocm
alextmagro e375923
Readd example script and update custom_map
alextmagro c6bd974
fix typo
alextmagro d76aa06
MI300 test skips due to jittery results
alextmagro ae979d0
Comment regarding sm_margin performance
alextmagro b58cbd1
Variable renamed, pybind fix, tolerance tightening
alextmagro e5d7446
Remove git conflict
alextmagro 7734ce5
Address style and hip/cu specific paths
alextmagro c169c75
HIP guards
alextmagro 80e0aab
initial impl
matthiasdiener de7863a
Merge remote-tracking branch 'origin/dev' into mdiener/fp4_hadamard
matthiasdiener bda7b13
test update
matthiasdiener 7ddb539
Update extensions.h
alextmagro 63c7a48
amax opt
matthiasdiener a260459
simplify
matthiasdiener 3dd8af9
Merge pull request #367 from ROCm/userbuffer_epic
alextmagro ab217cb
Merge remote-tracking branch 'origin/dev' into mdiener/fp4_hadamard
matthiasdiener 26c5fb7
simplify pt 2
matthiasdiener 2087f24
expand test
matthiasdiener 05cedb7
compute amax from BF16-rounded outputs
matthiasdiener 67b93a8
TE building over TheRock (#511)
ipanfilo 465d547
Typo fix (#397)
Micky774 9fb21f9
Add NVTE_UB_WITH_MPI to rocm build path
alextmagro 2f66594
Merge pull request #513 from ROCm/ub_mpi_hotfix
alextmagro 986d8ba
NVFP4: hadamard_transform_cast_fusion_columnwise
matthiasdiener b339c86
unify hadamard_transform_cast_fusion_columnwise
matthiasdiener f74a0ab
Merge remote-tracking branch 'origin/dev' into mdiener/fp4_hadamard
matthiasdiener e9426cd
Merge remote-tracking branch 'origin/dev' into mdiener/nvfp4-cast_fusion
matthiasdiener 1d0a70e
Rebase onto dev
aris134 6e3eea5
Enable NVFP4 recipe
matthiasdiener 9c3dc2f
NVFP4 GEMM via BF16 dequant
matthiasdiener e3a2502
Merge remote-tracking branch 'origin/dev' into mdiener/fp4_hadamard
matthiasdiener 3a63f32
add explanation to wht16
matthiasdiener 35ef81c
comment and test
matthiasdiener 9559131
Merge branch 'amartin/nvfp4-dequant' into mdiener/nvfp4-gemm
matthiasdiener e8ff6bd
enable use_fused_bulk_alloc
matthiasdiener e26ffc8
compute random sign mask on device
matthiasdiener c7cc488
CI: enable CI runs on every PR
matthiasdiener 7c68bd8
Avoid duplicate entry when opening PR
matthiasdiener a19dd60
fix stream capture error in GEMM
matthiasdiener 17d50ee
Merge branch 'dev' into mdiener/fp4_hadamard
matthiasdiener e32a758
merge errors
matthiasdiener 4857721
Merge branch 'dev' into mdiener/fp4_hadamard
matthiasdiener b243b4c
merge
matthiasdiener 5d39b27
Merge remote-tracking branch 'origin/dev' into mdiener/nvfp4-gemm
matthiasdiener 141eadc
Merge remote-tracking branch 'origin/dev' into mdiener/nvfp4-gemm
matthiasdiener 6527004
Merge branch 'dev' into mdiener/fp4_hadamard
matthiasdiener ca1aacf
change to __builtin_bit_cast
matthiasdiener c8e6c72
more fixes
matthiasdiener 1b0fe3e
fix dequant buffer
matthiasdiener bc9f0a3
remove copyright header
matthiasdiener 167c311
fix triton rmsnorm
matthiasdiener 203ef86
Merge remote-tracking branch 'origin/dev' into mdiener/nvfp4-gemm
matthiasdiener 4b0550d
mi300 fixes
matthiasdiener 5da621a
software fallbacks for SR on gfx942
matthiasdiener 9f1851d
Merge remote-tracking branch 'origin/dev' into mdiener/fp4_hadamard
matthiasdiener 287708d
more gfx942 fixes
matthiasdiener 81c45c9
ensure columnwise data for dgrad GEMM
matthiasdiener b75e066
fix mi350
matthiasdiener 5095971
Merge remote-tracking branch 'origin/dev' into mdiener/nvfp4-gemm
matthiasdiener 739a20d
Merge remote-tracking branch 'origin/dev' into mdiener/fp4_hadamard
matthiasdiener 2225c72
replace dequant allocation with workspace
matthiasdiener 42bf230
Merge remote-tracking branch 'origin/dev' into mdiener/nvfp4-gemm
matthiasdiener f269097
enable tests
matthiasdiener 346beb1
Merge remote-tracking branch 'origin/dev' into mdiener/fp4_hadamard
matthiasdiener cf2c8f6
address reviewer comments
matthiasdiener 2772834
minor fixes
matthiasdiener 26c5cb1
PreRhtAmax optimizations
matthiasdiener 071aa4b
Merge branch 'mdiener/fp4_hadamard' into mdiener/nvfp4-cast_fusion
matthiasdiener 018d24f
use ZeroAmaxKernel
matthiasdiener 3efd532
Merge remote-tracking branch 'origin/dev' into mdiener/fp4_hadamard
matthiasdiener b835818
Merge branch 'mdiener/fp4_hadamard' into mdiener/nvfp4-cast_fusion
matthiasdiener f2caca7
Merge branch 'mdiener/nvfp4-cast_fusion' into mdiener/nvfp4-gemm
matthiasdiener 95518ea
Merge branch 'dev' into mdiener/nvfp4-gemm
matthiasdiener feff829
undo hadamard_fusion
matthiasdiener 1ba9474
fixes
matthiasdiener e1ba512
cleanups, cleaner mi300 LDS workaround
matthiasdiener da093bd
more cleanups
matthiasdiener 75a4738
re-fix null tensor_amax
matthiasdiener 5315506
minor cleanups
matthiasdiener 7254ba4
Merge remote-tracking branch 'origin/dev' into mdiener/nvfp4-gemm
matthiasdiener a91eaf0
address review comments
matthiasdiener 29cacef
Merge remote-tracking branch 'origin/dev' into mdiener/nvfp4-gemm
matthiasdiener 0f53a9d
address review comments
matthiasdiener 2851bd9
Merge remote-tracking branch 'origin/dev' into mdiener/nvfp4-gemm
matthiasdiener 3deb3ff
Merge remote-tracking branch 'origin/dev' into mdiener/nvfp4-gemm
matthiasdiener a08e8c5
use maxNorm
matthiasdiener fae76d3
factor out FP4 staging
matthiasdiener 81d3cbd
Merge remote-tracking branch 'origin/dev' into mdiener/nvfp4-gemm
matthiasdiener 0afd821
Merge remote-tracking branch 'upstream/dev' into mdiener/nvfp4-gemm
matthiasdiener a6f4787
address review comments
matthiasdiener b1575bd
Merge remote-tracking branch 'upstream/dev' into mdiener/nvfp4-gemm
matthiasdiener eae6e95
Merge remote-tracking branch 'upstream/dev' into mdiener/nvfp4-gemm
matthiasdiener File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.