Update tutorials for Neuron SDK 2.29 release (NKI 0.3.0)#122
Update tutorials for Neuron SDK 2.29 release (NKI 0.3.0)#122mrkcath-aws wants to merge 1 commit intomainfrom
Conversation
|
|
||
| def softmax_isa(data, axis=(1,)): |
There was a problem hiding this comment.
How is this one different from nl.softmax?
| # scores @ V, contract along seqlen_kv | ||
| attn_out: nt.tensor[seqlen_q, d_head] = nl.matmul(scores, v_sbuf_t, transpose_x=False) | ||
|
|
||
| # nl.matmul with transpose_x=False internally transposes to PSUM which |
There was a problem hiding this comment.
Doesn't that make this language API a bit moot then? This API should change if what it does in its implementation doesn't work on gen3+?
Given that nc_matmul also has a is_transpose mode telling the TensorE to transpose the tile in sbuf, I'm wondering if maybe we could switch to that instead?
| qk = nl.ndarray((seqlen_q, seqlen_kv), dtype=nl.float32, buffer=nl.psum) | ||
| nisa.nc_matmul(dst=qk, stationary=q_sbuf, moving=k_sbuf) | ||
| qk_sbuf = nl.ndarray(qk.shape, dtype=nl.float32, buffer=nl.sbuf) | ||
| nisa.tensor_copy(dst=qk_sbuf, src=qk) |
There was a problem hiding this comment.
Nit: Our docs says tensor_reduce and tensor_scalar (used by softmax) can be in PSUM. I wonder if this copy is even necessary.
|
|
||
| attn_out[...] = nisa.tensor_scalar(data=attn_out_psum, op0=nl.multiply, | ||
| operand0=inverse_sum_row, engine=nisa.vector_engine) | ||
| nisa.tensor_scalar(dst=attn_out[...], data=attn_out_psum, op0=nl.multiply, |
There was a problem hiding this comment.
nit: While it works, using dst=attn_out[...] instead of dst=attn_out is a bit weird to me
| row_max_kv = nl.ndarray((PMAX, num_kv_tiles), dtype=nl.float32, buffer=nl.sbuf) | ||
| for i_tile_kv in range(num_kv_tiles): | ||
| qk_sbuf = nl.ndarray((PMAX, FMAX_MOVING), dtype=nl.float32, buffer=nl.sbuf) | ||
| nisa.tensor_copy(dst=qk_sbuf, src=qk_tiles[i_tile_kv]) |
There was a problem hiding this comment.
Similar to above: tensor_reduce mentions it accepts PSUM input, so is this copy is necessary?
| exp_row[:, nl.ds(i_tile_kv*FMAX_MOVING, FMAX_MOVING)] = nisa.activation( | ||
| for i_tile_kv in range(num_kv_tiles): | ||
| qk_sbuf = nl.ndarray((PMAX, FMAX_MOVING), dtype=nl.float32, buffer=nl.sbuf) | ||
| nisa.tensor_copy(dst=qk_sbuf, src=qk_tiles[i_tile_kv]) |
There was a problem hiding this comment.
Same here for activation, where the data tile can be an SBUF or PSUM tile
| @@ -362,9 +386,12 @@ def nki_matmul_fully_optimized_( | |||
| BLOCK_K = TILE_K * TILES_IN_BLOCK_K | |||
|
|
|||
| # Verify the size is a multiple of block size | |||
There was a problem hiding this comment.
I'm waiting on Doug to merge https://github.com/aws-neuron/private-aws-neuron-sdk-staging/pull/2998, but once it is I believe we can change the sample here with the updated version of the fully_optimized_ kernel (Not blocking this PR but just to remember)
| offset_i_x = nl.program_id(0) * 128 | ||
| offset_i_y = nl.program_id(1) * 512 |
There was a problem hiding this comment.
Are we intentionally dropping the LNC stuff here?
Issue #, if available:
N/A
Description of changes:
Update samples for NKI 0.3.0
Testing:
Please see detailed unit test requirements in the CONTRIBUTING.md
nki.baremetalnki.benchmarkPull Request Checklist