Batch bus-related queries to eliminate N+1 in GetStationsByLineIdList#1446
Batch bus-related queries to eliminate N+1 in GetStationsByLineIdList#1446TinyKitten merged 6 commits intodevfrom
Conversation
When transport_type=RailAndBus with many line_ids, the endpoint suffered from 100+ sequential DB queries for bus stop enrichment. This change batches all bus-related queries before the main loop: 1. Add get_bus_stops_near_stations using LATERAL JOIN + UNNEST to fetch all nearby bus stops in a single query instead of per-station calls 2. Pre-fetch all bus lines in one get_by_station_group_id_vec call 3. Merge bus company IDs into the Phase 2 company fetch 4. Add get_by_station_group_id_vec_no_types to LineRepository to skip unnecessary station_station_types JOIN when skip_types_join=true Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthrough駅グループIDでタイプ情報を付与しないライン取得メソッドと、複数駅座標への一括バス停候補取得メソッドをリポジトリに追加し、ユースケース側でバッチ取得・キャッシュ化してバス候補→バス路線を一括で解決する処理へ置換しています。(DBクエリ追加とユースケースの集約ロジック変更) 変更内容
シーケンス図sequenceDiagram
autonumber
participant Client
participant Interactor as QueryInteractor
participant StationRepo as StationRepository
participant LineRepo as LineRepository
participant DB as Database
Client->>Interactor: update_station_vec_with_attributes(stations, RailAndBus)
activate Interactor
Interactor->>Interactor: 鉄道駅座標から unique_bus_coords を抽出
Interactor->>StationRepo: get_bus_stops_near_stations(coords, limit)
activate StationRepo
StationRepo->>DB: UNNEST coords + LATERAL nearest-by-PostGIS (LIMIT per coord)
DB-->>StationRepo: Vec<(source_g_cd, Station)>
StationRepo-->>Interactor: バッチ候補一覧
deactivate StationRepo
Interactor->>Interactor: 全候補の station_g_cd を集約
Interactor->>LineRepo: get_by_station_group_id_vec_no_types(all_g_cd)
activate LineRepo
LineRepo->>DB: SELECT DISTINCT ON (l.line_cd, s.station_g_cd) (type_cd=NULL, line_group_cd=NULL)
DB-->>LineRepo: Vec<Line>
LineRepo-->>Interactor: all_bus_lines
deactivate LineRepo
Interactor->>Interactor: bus_candidate_cache / bus_lines_by_g_cd を構築
Interactor->>Client: 更新済み stations を返却
deactivate Interactor
推定コードレビュー作業量🎯 4 (複雑) | ⏱️ ~45 分 ウサギの詩
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
The LATERAL JOIN SQL hardcoded transport_type=2 (proto Bus value) instead of using the DB value (TransportType::Bus = 1). Changed to parameterized bind using TransportType::Bus as i32 to match the existing get_by_coordinates pattern. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
stationapi/src/infrastructure/station_repository.rs (1)
1276-1385: このクエリは実 DB の実行計画だけ見てから出したいです。
LATERAL側のORDER BY point(s.lat, s.lon) <-> point(ic.lat, ic.lon)が期待どおりに空間インデックスを使えないと、N+1 を消しても 1 クエリが重くなって相殺されます。Line 1285 以降はEXPLAIN ANALYZEで既存のget_by_coordinates(..., Bus)相当と比較しておくと安全です。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@stationapi/src/infrastructure/station_repository.rs` around lines 1276 - 1385, The new get_bus_stops_near_stations function issues a spatial nearest-neighbor ORDER BY using ORDER BY point(s.lat, s.lon) <-> point(ic.lat, ic.lon) (see the ORDER BY line in get_bus_stops_near_stations) which may not use the spatial index and could make this single query expensive; run EXPLAIN ANALYZE on this SQL and compare its plan to the existing get_by_coordinates(..., Bus) query to confirm index usage and plan shape, and if the plan does not use the expected GiST/GiN index, adjust the query to use the same index-friendly expression/operator (or add the proper index) as get_by_coordinates (e.g., use the same point/geometry construction, PostGIS ST_DWithin + <-> or the same operator class) so the planner can perform an indexed nearest-neighbor search instead of a full scan.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@stationapi/src/use_case/interactor/query.rs`:
- Around line 1177-1200: The bug is that converting nearby_bus_g_cds to a
HashSet and collecting bus_stop_by_line_cd via collect::<HashMap<_,_>>() loses
input order and makes the representative stop for a line nondeterministic;
instead preserve the original nearby_bus_stops order and pick the first
candidate per line_cd: iterate nearby_bus_stops in order (not via
nearby_bus_g_cds HashSet), for each station take its station_g_cd once (track
seen g_cds) to fetch lines from bus_lines_by_g_cd and push TransportType::Bus
lines while deduping via seen_bus_line_cds, and when building
bus_stop_by_line_cd insert (line_cd -> station) only if the key is absent so the
first encountered stop wins; update references: nearby_bus_g_cds,
bus_lines_by_g_cd, seen_bus_line_cds, bus_lines, nearby_bus_stops,
bus_stop_by_line_cd accordingly.
- Around line 1020-1025: The branch that switches to
get_stations_by_group_id_vec_no_types /
get_lines_by_station_group_id_vec_no_types when skip_types_join is true loses
the original "pass=1" exclusion in the new line query, so station.lines can
include non-stopping (pass-through) lines; fix by either (A) restoring the same
pass/exclusion condition in the new query implementation in line_repository.rs
used by get_lines_by_station_group_id_vec_no_types (ensure the SQL/filters
replicate the original pass=1 logic), or (B) avoid using the _no_types variants
in this branch and call the standard get_stations_by_group_id_vec /
get_lines_by_station_group_id_vec so the pass filtering is preserved; reference
the functions skip_types_join, get_stations_by_group_id_vec_no_types,
get_lines_by_station_group_id_vec_no_types, get_bus_stops_near_stations and
get_stations_by_line_id_vec when making the change (note the mock at lines
around 2135 delegates _no_types to normal so update tests/mocks accordingly).
---
Nitpick comments:
In `@stationapi/src/infrastructure/station_repository.rs`:
- Around line 1276-1385: The new get_bus_stops_near_stations function issues a
spatial nearest-neighbor ORDER BY using ORDER BY point(s.lat, s.lon) <->
point(ic.lat, ic.lon) (see the ORDER BY line in get_bus_stops_near_stations)
which may not use the spatial index and could make this single query expensive;
run EXPLAIN ANALYZE on this SQL and compare its plan to the existing
get_by_coordinates(..., Bus) query to confirm index usage and plan shape, and if
the plan does not use the expected GiST/GiN index, adjust the query to use the
same index-friendly expression/operator (or add the proper index) as
get_by_coordinates (e.g., use the same point/geometry construction, PostGIS
ST_DWithin + <-> or the same operator class) so the planner can perform an
indexed nearest-neighbor search instead of a full scan.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 38817b07-0a29-44f4-92f4-70ad48b86634
📒 Files selected for processing (5)
stationapi/src/domain/repository/line_repository.rsstationapi/src/domain/repository/station_repository.rsstationapi/src/infrastructure/line_repository.rsstationapi/src/infrastructure/station_repository.rsstationapi/src/use_case/interactor/query.rs
- Revert lines query to use get_lines_by_station_group_id_vec (with station_station_types JOIN) to preserve pass=1 exclusion filtering. The _no_types variant incorrectly removed this filter. - Iterate nearby_bus_stops in distance order (not HashSet) when collecting bus lines to preserve deterministic ordering. - Use entry().or_insert() for bus_stop_by_line_cd so the first (closest) bus stop per line_cd wins instead of the last. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
stationapi/src/infrastructure/station_repository.rs (1)
1285-1361: 既存の近傍検索と順序まで揃えるなら、外側の順序を固定したいです。LATERAL 側では距離順ですが、返り値全体には明示的な
ORDER BYがないので、Vec<(u32, Station)>の並びは SQL 上は保証されません。旧来の 1 件ずつの近傍検索と同じ nearest-first を前提にしているなら、source ごとの順序を明示するか、少なくとも実 DB で件数・並びが一致することは確認しておきたいです。順序を明示するならこんな形です
- FROM ( - SELECT unnest($1::int[]) AS source_g_cd, - unnest($2::float8[]) AS lat, - unnest($3::float8[]) AS lon - ) ic, - LATERAL ( + FROM unnest($1::int[], $2::float8[], $3::float8[]) WITH ORDINALITY + AS ic(source_g_cd, lat, lon, ord) + CROSS JOIN LATERAL ( SELECT s.* FROM stations s WHERE s.e_status = 0 AND COALESCE(s.transport_type, 0) = $5 ORDER BY point(s.lat, s.lon) <-> point(ic.lat, ic.lon) LIMIT $4 ) s JOIN lines AS l ON s.line_cd = l.line_cd LEFT JOIN line_aliases AS la ON la.station_cd = s.station_cd - LEFT JOIN aliases AS a ON a.id = la.alias_cd"#; + LEFT JOIN aliases AS a ON a.id = la.alias_cd + ORDER BY ic.ord, point(s.lat, s.lon) <-> point(ic.lat, ic.lon)"#;順序非依存の設計なら、その前提が壊れないように
get_by_coordinates(..., Some(TransportType::Bus))との同値性を integration test か実 DB 比較で固定しておくと安心です。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@stationapi/src/infrastructure/station_repository.rs` around lines 1285 - 1361, The SQL in query_str relies on the LATERAL subquery ordering but the outer SELECT has no ORDER BY, so the returned Vec<(u32, Station)> order is not guaranteed; update the query_str to enforce ordering per source (e.g. append "ORDER BY ic.source_g_cd, point(s.lat, s.lon) <-> point(ic.lat, ic.lon)" to the outer query) so results are nearest-first for each source_g_cd, and then add or update an integration test that compares this batched nearest query against the single-source get_by_coordinates(...) behavior (e.g. get_by_coordinates(..., Some(TransportType::Bus))) to verify identical ordering and counts.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@stationapi/src/infrastructure/station_repository.rs`:
- Around line 1285-1361: The SQL in query_str relies on the LATERAL subquery
ordering but the outer SELECT has no ORDER BY, so the returned Vec<(u32,
Station)> order is not guaranteed; update the query_str to enforce ordering per
source (e.g. append "ORDER BY ic.source_g_cd, point(s.lat, s.lon) <->
point(ic.lat, ic.lon)" to the outer query) so results are nearest-first for each
source_g_cd, and then add or update an integration test that compares this
batched nearest query against the single-source get_by_coordinates(...) behavior
(e.g. get_by_coordinates(..., Some(TransportType::Bus))) to verify identical
ordering and counts.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 10a2813d-058b-4f37-89a2-8db06e8015c3
📒 Files selected for processing (1)
stationapi/src/infrastructure/station_repository.rs
The LATERAL subquery's ORDER BY does not guarantee result ordering in the outer SELECT. Add explicit ORDER BY ic.source_g_cd, distance to ensure results are nearest-first per source station, which is required for the entry().or_insert() logic that picks the closest bus stop per line. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the LEFT JOIN to station_station_types with a NOT EXISTS subquery that preserves the pass=1 exclusion while avoiding the row multiplication from the JOIN. This gives the performance benefit of skipping the sst JOIN while maintaining correct pass filtering. A station is excluded only when all its sst records with non-null line_group_cd have pass=1 (i.e., no non-pass-through entry exists). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
stationapi/src/use_case/interactor/query.rs (1)
1047-1055: Phase 2 のバス路線取得がまだ広すぎます。
all_bus_station_group_idsを 300m 判定前の候補から組み立てているので、後段で捨てる停留所グループまでall_bus_lines/ 会社補完の対象に入ります。さらにskip_types_join=trueでもここだけget_lines_by_station_group_id_vecを使っているため、最も fan-out するバッチで型JOINを 다시踏みます。半径内の候補に絞ってから group_id を集め、skip_types_join時は_no_types側に寄せた方が、この最適化の効果を取りこぼしにくいです。Also applies to: 1066-1069, 1167-1173
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@stationapi/src/use_case/interactor/query.rs` around lines 1047 - 1055, The code collects all_bus_station_group_ids from bus_candidate_cache before applying the 300m radius filter, causing unnecessary wide fetches and reintroducing type JOINs; change the logic to first filter bus_candidate_cache for candidates that pass the 300m proximity check, then collect station_g_cd into all_bus_station_group_ids (sort_unstable as before). Additionally, when skip_types_join is true, call the no-types variant (e.g., get_lines_by_station_group_id_vec_no_types) instead of get_lines_by_station_group_id_vec to avoid type JOINs. Apply the same fix to the other occurrences where group ids are gathered (the blocks around the other get_lines_by_station_group_id_vec calls).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@stationapi/src/infrastructure/station_repository.rs`:
- Around line 1359-1361: The JOIN against the lines table is missing the
active-status filter; update the JOIN condition "JOIN lines AS l ON s.line_cd =
l.line_cd" to include "AND l.e_status = 0" so inactive lines are excluded
(mirror the approach used in get_by_station_group_id_vec_no_types); ensure this
change targets the JOIN that uses alias l (leaving the LEFT JOINs for
line_aliases/la and aliases/a unchanged).
---
Nitpick comments:
In `@stationapi/src/use_case/interactor/query.rs`:
- Around line 1047-1055: The code collects all_bus_station_group_ids from
bus_candidate_cache before applying the 300m radius filter, causing unnecessary
wide fetches and reintroducing type JOINs; change the logic to first filter
bus_candidate_cache for candidates that pass the 300m proximity check, then
collect station_g_cd into all_bus_station_group_ids (sort_unstable as before).
Additionally, when skip_types_join is true, call the no-types variant (e.g.,
get_lines_by_station_group_id_vec_no_types) instead of
get_lines_by_station_group_id_vec to avoid type JOINs. Apply the same fix to the
other occurrences where group ids are gathered (the blocks around the other
get_lines_by_station_group_id_vec calls).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 82b7dbcd-45aa-49da-8063-57701eb2d5ae
📒 Files selected for processing (3)
stationapi/src/infrastructure/line_repository.rsstationapi/src/infrastructure/station_repository.rsstationapi/src/use_case/interactor/query.rs
…tions The JOIN to the lines table was missing the active-status filter, which could include inactive/discontinued lines in bus stop results. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (2)
stationapi/src/infrastructure/station_repository.rs (2)
84-214:BusStopWithSourceRowとStationRowの重複が大きすぎます。
source_g_cd以外がほぼ完全複製なので、列追加や nullable 変更のたびに 2 つの row struct と手書きマッピングを同期し続ける必要があります。ここは共通 projection を切り出すか、少なくともBusStopWithSourceRow -> StationRowに寄せて差分をsource_g_cdだけにしておくと保守しやすいです。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@stationapi/src/infrastructure/station_repository.rs` around lines 84 - 214, The BusStopWithSourceRow duplicates almost all fields of StationRow; change BusStopWithSourceRow to contain a single StationRow plus the extra source_g_cd (e.g. struct BusStopWithSourceRow { pub source_g_cd: i32, #[sqlx(flatten)] pub station: StationRow }) so the shared projection is maintained in StationRow and sqlx can map both via flatten; then update the impl From<BusStopWithSourceRow> for Station to convert from row.station (or call station.into()) and remove the manual one-to-one field mapping. Ensure to add #[sqlx::FromRow] to StationRow if not already present and keep only source_g_cd in BusStopWithSourceRow as the delta.
1351-1357: この KNN LATERAL は実行計画だけ確認しておきたいです。
ORDER BY point(s.lat, s.lon) <-> point(ic.lat, ic.lon) LIMIT $4が index を使えないと、入力座標ごとにstationsを広く走査して別のボトルネックになり得ます。EXPLAIN (ANALYZE, BUFFERS)で本番相当データの plan を一度確認してください。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@stationapi/src/infrastructure/station_repository.rs` around lines 1351 - 1357, この KNN LATERAL クエリ(LATERAL ブロック内の ORDER BY point(s.lat, s.lon) <-> point(ic.lat, ic.lon) LIMIT $4 を含む部分)について、本番相当データで実行計画を確認してください:対象クエリを station_repository.rs の該当 LATERAL クエリ文字列で EXPLAIN (ANALYZE, BUFFERS) を実行し、各入力座標ごとに stations テーブルがどのようにスキャンされているか(index が使われているか、シーケンシャルスキャンや大量バッファ使用が発生していないか)を確認し、必要なら GiST/GiN や point ベースの空間インデックス追加やクエリリライトを検討してください。
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@stationapi/src/infrastructure/station_repository.rs`:
- Around line 84-214: The BusStopWithSourceRow duplicates almost all fields of
StationRow; change BusStopWithSourceRow to contain a single StationRow plus the
extra source_g_cd (e.g. struct BusStopWithSourceRow { pub source_g_cd: i32,
#[sqlx(flatten)] pub station: StationRow }) so the shared projection is
maintained in StationRow and sqlx can map both via flatten; then update the impl
From<BusStopWithSourceRow> for Station to convert from row.station (or call
station.into()) and remove the manual one-to-one field mapping. Ensure to add
#[sqlx::FromRow] to StationRow if not already present and keep only source_g_cd
in BusStopWithSourceRow as the delta.
- Around line 1351-1357: この KNN LATERAL クエリ(LATERAL ブロック内の ORDER BY point(s.lat,
s.lon) <-> point(ic.lat, ic.lon) LIMIT $4
を含む部分)について、本番相当データで実行計画を確認してください:対象クエリを station_repository.rs の該当 LATERAL
クエリ文字列で EXPLAIN (ANALYZE, BUFFERS) を実行し、各入力座標ごとに stations
テーブルがどのようにスキャンされているか(index が使われているか、シーケンシャルスキャンや大量バッファ使用が発生していないか)を確認し、必要なら
GiST/GiN や point ベースの空間インデックス追加やクエリリライトを検討してください。
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 10f953bf-178b-40b8-ac5f-3fdabe7c49c6
📒 Files selected for processing (1)
stationapi/src/infrastructure/station_repository.rs
Summary
GetStationsByLineIdListにtransport_type=RailAndBus+ 大量のline_idsでリクエストした場合、バス停エンリッチメント処理で 100+ の逐次 DB クエリが発生していたボトルネックを解消LATERAL JOIN+UNNESTによる座標ベースの一括バス停検索クエリを新規追加LineRepositoryにget_by_station_group_id_vec_no_typesを追加し、不要なstation_station_typesJOIN を除去Changes
最適化1: バス停座標検索のバッチ化(最大効果)
get_bus_stops_near_stationsを新規追加。PostgreSQL のLATERAL JOIN+UNNESTで、ユニークな鉄道駅座標ごとの近隣バス停を 1回のクエリ で取得。最適化2: バス路線のバッチ取得
全バス停の
station_g_cdを事前収集し、1回のget_by_station_group_id_vecで全バス路線を取得。ループ内のbus_lines_cacheは不要になり廃止。最適化3: バス会社のバッチ取得
バス路線の
company_cdを Phase 2 の会社取得に統合。ループ内のfind_company_by_id_vec呼び出しを削除。最適化4: Lines クエリの不要 JOIN 除去
skip_types_join=true時にstation_station_typesテーブルへの LEFT JOIN を省略するget_by_station_group_id_vec_no_typesを追加。最適化後のクエリフロー
_no_typesバリアントではpass=1(通過駅)フィルタが外れるため、レスポンスに通過駅の路線が含まれる可能性があるLATERAL JOINのクエリプランが GiST インデックスを効率的に使えるか、実 DB でのEXPLAIN ANALYZE確認を推奨Test plan
SQLX_OFFLINE=true cargo buildビルド成功SQLX_OFFLINE=true cargo clippy警告なしSQLX_OFFLINE=true cargo test全26テストパスcargo fmt --checkフォーマット問題なしGetStationsByLineIdList(transport_type=RailAndBus, 大量 line_ids) のレスポンス時間を計測EXPLAIN ANALYZEで LATERAL JOIN が GiST インデックスを使用していることを確認pass=1フィルタ除去の影響を実データで確認🤖 Generated with Claude Code
Summary by CodeRabbit