From 848d0c083bbb54f547f17b7ebb4114684c381fef Mon Sep 17 00:00:00 2001 From: Laurence Tratt Date: Fri, 13 Mar 2026 09:06:00 +0000 Subject: [PATCH] Insert locations at the end of a FOR loop. The intuition here is that if we've gone around one iteration of a for loop, we're more likely to close a "full, proper" iteration, whereas if we have the location on entry, we're likely to hit the "nothing to do" case. This is -- from memory! -- the same thing that PyPy does. There is a trade-off here: it means every time we execute a loop we do one iteration in the interpreter. Probably because of that, benchmarks are mixed, but IMHO show a small improvement. b15: ``` storage/lua/1000 4.95% faster richards/lua/100 4.27% faster sieve/lua/3000 2.12% faster cd/lua/250 1.19% faster bounce/lua/1500 1.84% slower knucleotide/lua/ 3.97% slower permute/lua/1000 4.30% slower ``` b16: ``` binarytrees/lua/15 4.93% faster storage/lua/1000 3.98% faster queens/lua/1000 3.44% faster cd/lua/250 2.24% faster spectralnorm/lua/1000 1.54% faster json/lua/100 3.04% slower knucleotide/lua/ 5.87% slower HashIds/lua/6000 6.13% slower nbody/lua/250000 13.60% slower ``` nbody is very nondeterministic so it can be hard to draw conclusions; that said, it does seem on b16 to have meaningfully slowed down. On b15, the slowdown is within the margin of noise, though on the edge of it: I think it may well have slowed down, but by perhaps 5-8%. So whether this holds on other machines is a bit unclear. --- src/lparser.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/lparser.c b/src/lparser.c index 40032ce..512ccf2 100644 --- a/src/lparser.c +++ b/src/lparser.c @@ -848,7 +848,7 @@ void ykifyCode(lua_State *L, Proto *f, int num_insts) { loc_pc = GETARG_sJ(i) + pc + 2 - 1; } else if ((GET_OPCODE(i) == OP_FORLOOP) || (GET_OPCODE(i) == OP_TFORLOOP)) { lua_assert(pc - GETARG_Bx(i) + 2 - 1 < pc); - loc_pc = pc - GETARG_Bx(i) + 2 - 1; + loc_pc = pc; } else { continue; }