From 848d0c083bbb54f547f17b7ebb4114684c381fef Mon Sep 17 00:00:00 2001
From: Laurence Tratt <laurie@tratt.net>
Date: Fri, 13 Mar 2026 09:06:00 +0000
Subject: [PATCH] Insert locations at the end of a FOR loop.

The intuition here is that if we've gone around one iteration of a for
loop, we're more likely to close a "full, proper" iteration, whereas if
we have the location on entry, we're likely to hit the "nothing to do"
case. This is -- from memory! -- the same thing that PyPy does.

There is a trade-off here: it means every time we execute a loop we do
one iteration in the interpreter. Probably because of that, benchmarks
are mixed, but IMHO show a small improvement. b15:

```
storage/lua/1000        4.95% faster
richards/lua/100        4.27% faster
sieve/lua/3000          2.12% faster
cd/lua/250              1.19% faster
bounce/lua/1500         1.84% slower
knucleotide/lua/        3.97% slower
permute/lua/1000        4.30% slower
```

b16:

```
binarytrees/lua/15      4.93% faster
storage/lua/1000        3.98% faster
queens/lua/1000         3.44% faster
cd/lua/250              2.24% faster
spectralnorm/lua/1000   1.54% faster
json/lua/100            3.04% slower
knucleotide/lua/        5.87% slower
HashIds/lua/6000        6.13% slower
nbody/lua/250000        13.60% slower
```

nbody is very nondeterministic so it can be hard to draw conclusions;
that said, it does seem on b16 to have meaningfully slowed down. On b15,
the slowdown is within the margin of noise, though on the edge of it: I
think it may well have slowed down, but by perhaps 5-8%. So whether this
holds on other machines is a bit unclear.
---
 src/lparser.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/lparser.c b/src/lparser.c
index 40032ce..512ccf2 100644
--- a/src/lparser.c
+++ b/src/lparser.c
@@ -848,7 +848,7 @@ void ykifyCode(lua_State *L, Proto *f, int num_insts) {
       loc_pc = GETARG_sJ(i) + pc + 2 - 1;
     } else if ((GET_OPCODE(i) == OP_FORLOOP) || (GET_OPCODE(i) == OP_TFORLOOP)) {
       lua_assert(pc - GETARG_Bx(i) + 2 - 1 < pc);
-      loc_pc = pc - GETARG_Bx(i) + 2 - 1;
+      loc_pc = pc;
     } else {
       continue;
     }