Skip to content

[WIP] Optimize Paged Attention.#18

Open
yunzhongOvO wants to merge 1 commit intomainfrom
paged_attn_opt
Open

[WIP] Optimize Paged Attention.#18
yunzhongOvO wants to merge 1 commit intomainfrom
paged_attn_opt

Conversation

@yunzhongOvO
Copy link
Copy Markdown
Collaborator

  1. add more tuning space for AMD GPU
  2. remove unused load/store masks
  3. introduce loop unrolling in GQA scenario

@scxiao
Copy link
Copy Markdown
Contributor

scxiao commented Aug 20, 2024

Do you have some perf numbers for this version?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants