Skip to content
Discussion options

You must be logged in to vote

Hello, thanks for your question! Today the best way to run collectives with NKI kernels is using the newly released NKI collectives API here.

For some inspiration and samples, I'd suggest checking out the NKI Kernel Library here.

A nice example is this all_gather for sbuf2sbuf.

You should not go through the SPMD grid for this use case. When you want to push data across multiple Neuron cores, however you have set their logical config, it's better to go through collectives.

Also - when you are using the XLA stack it's probably easier to keep things scoped within the NxD Model Builder API. You can use that to set the distributed process group.

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by EmilyWebber
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants