Conversation
|
Claude finished @ZanCorDX's task in 1m 39s —— View job PR Review: Better replacement latency
SummaryThis PR increases Issues foundMetrics skew on cancelled tasks ( Stale comment ( Silent drop of cancelled orders ( Typo ( See inline comments for details and suggestions. |
| error!("Sim channel is closed, dropping order"); | ||
| false | ||
| // @Metric | ||
| if let Some(cancel_handle) = self.in_flight_orders.get(&order_id) { |
There was a problem hiding this comment.
When a popped order is not in in_flight_orders (already cancelled), it's silently dropped here. This is correct behavior, but it would benefit from a trace! log for debuggability — cancelled orders vanishing without a trace could make production issues harder to diagnose.
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
|
Claude finished @ZanCorDX's task in 1m 36s —— View job Re-review after update
SummaryThe latest commits addressed 2 of the 4 issues from the previous review:
Remaining issues
No new issues found in the update — the core cancellation mechanism using |
| while let Ok(cancellable_task) = current_sim_context.requests.recv() { | ||
| if let Some(task) = cancellable_task.into_request() { | ||
| let sim_thread_wait_time = last_sim_finished.elapsed(); | ||
| let sim_start = Instant::now(); |
There was a problem hiding this comment.
Still an issue from previous review: when cancellable_task.into_request() returns None (cancelled), last_sim_finished is not updated. The next real task's sim_thread_wait_time will include the time spent dequeuing cancelled tasks, inflating utilization metrics.
| while let Ok(cancellable_task) = current_sim_context.requests.recv() { | |
| if let Some(task) = cancellable_task.into_request() { | |
| let sim_thread_wait_time = last_sim_finished.elapsed(); | |
| let sim_start = Instant::now(); | |
| while let Ok(cancellable_task) = current_sim_context.requests.recv() { | |
| if let Some(task) = cancellable_task.into_request() { | |
| let sim_thread_wait_time = last_sim_finished.elapsed(); | |
| let sim_start = Instant::now(); |
Consider updating last_sim_finished in an else branch when skipping cancelled tasks, or moving the last_sim_finished = Instant::now() to just before into_request() is called.
dvush
left a comment
There was a problem hiding this comment.
The problem that this solves is that our cancellations are handled after simulations so its non-optimal. This makes sure that order is not simulated if its cancelled. We clear all cancelled order in this "pre flights state".
My main question were how do we handle this scenarios:
- make sure that when replacements are really fast, do we simulate anything at all?
- we send O_1, O_2, O_3, ... that replace each other very fast
- will simulation thread take the latest O_k that is available right now when its draining new orders and simulate it?
- it seems like this is handled this way: sim threads will drain channel and skip every cancelled one so it should be able to reach O_k eventually that is not cancelled and process that.
- same as above but how do we cancel things that were already sent to the builder?
- we send O_1, O_2, O_3 very fast but O_1 was simulated and sent to the builder
- O_2, O_3 all cancelled in flight
- Are we able to cancel O_1 in time?
- it seems that we can: because O_1 will not be in the "in_flight_orders" map and in this case we send cancellation to the builder thread directly
1 - O_1, O_2, O_3 into 6 messages O_1, C_1 (cancel),O_2,C_2,O3. 2 - The generation of the O and C are sequential so C_1 will always come between O_1 and O_2. |
|
Claude finished @ZanCorDX's task in 1m 53s —— View job PR Review: Better replacement latency
SummaryRe-reviewed after the latest push (4 commits total). The core cancellation mechanism using Issues found
No other new issues. The two items flagged in the previous review (metrics skew on cancelled tasks, silent drop logging) were discussed and dismissed by the author — no objection to those decisions. |
| /// Orders we got via new_order_sub and are still being processed (they could be inside the SimTree or in the sim queue) | ||
| /// and were not cancelled. | ||
| in_flight_orders: HashSet<OrderId>, | ||
| /// When we remove the order we set the associated bool to false (which is checked before simulating) |
There was a problem hiding this comment.
Incorrect comment: the AtomicBool represents "is cancelled" and is set to true when cancelling (line 273: cancel_handle.store(true, ...)). The comment says "set the associated bool to false" which is the opposite of what happens.
| /// When we remove the order we set the associated bool to false (which is checked before simulating) | |
| /// When we remove the order we set the associated bool to true (cancelled) which is checked before simulating. |
| /// Orders that blocking_consume_next_commands will consume. | ||
| /// A slow algorithm would check approx every 200ms, to fill this batch size it would take | ||
| /// 8192/.2 = 40960 order/sec which is even more than what we see in the whole slot for a busy block. | ||
| const ORDERS_CONSUMED_PER_BATCH: usize = 8192; |
There was a problem hiding this comment.
PR description says this was increased to 4096 but the actual value is 8192. Worth updating the PR description to match.
📝 Summary
Increassed the ORDERS_CONSUMED_PER_BATCH from 1024 to 4096.
Implemented cancellation in orders queued for simlulation.
✅ I have completed the following steps:
make lintmake test