Skip to content

feat(ops): implement CausalSoftmax operator with Hygon backend.#34

Open
gongchensu wants to merge 3 commits intoInfiniTensor:feat/dev-infrafrom
gongchensu:feat/hygon-causal_softmax
Open

feat(ops): implement CausalSoftmax operator with Hygon backend.#34
gongchensu wants to merge 3 commits intoInfiniTensor:feat/dev-infrafrom
gongchensu:feat/hygon-causal_softmax

Conversation

@gongchensu
Copy link
Copy Markdown

No description provided.

@gongchensu gongchensu self-assigned this Mar 25, 2026
@gongchensu gongchensu changed the title Feat/hygon causal softmax feat(ops): implement CausalSoftmax operator with Hygon backend. Mar 25, 2026
@gongchensu gongchensu force-pushed the feat/hygon-causal_softmax branch 2 times, most recently from 57c6620 to a553db4 Compare March 26, 2026 09:03
- Add `WITH_HYGON` build support and a Hygon `Add` backend that reuses the shared CUDA implementation.
- Detect DTK `nvcc` from the Hygon toolkit layout and auto-detect the GPU arch from `rocminfo`.
- Treat Hygon as a CUDA-like backend in shared data type, cast, and kernel helper headers.
- Skip the Hygon `gemm` example for now and ignore `build-*` temporary directories.
- Verified with `pip install -e .[dev]` and `pytest tests/test_add.py`.
- add a Hygon `Gemm` backend on top of the shared CUDA BLAS path
- use DTK-friendly compute and algo settings for fp32/fp16 gemm
- fall back to `cublasGemmEx` for single-batch Hygon gemm to avoid DTK crashes
- release Hygon cublas handles after each call and re-enable the `gemm` example
- verified with `pip install -e .[dev]`, `pytest tests/test_gemm.py -k cuda`, and `pytest tests/test_gemm.py`
@gongchensu gongchensu force-pushed the feat/hygon-causal_softmax branch from a553db4 to fe102f7 Compare March 30, 2026 02:31
@gongchensu gongchensu force-pushed the feat/hygon-causal_softmax branch from fe102f7 to b59d9db Compare March 31, 2026 06:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant