Skip to content

Performance: parallel file I/O and optimized reading#44

Open
wojtek wants to merge 6 commits intozeux:masterfrom
wojtek:performance-improvements
Open

Performance: parallel file I/O and optimized reading#44
wojtek wants to merge 6 commits intozeux:masterfrom
wojtek:performance-improvements

Conversation

@wojtek
Copy link
Copy Markdown

@wojtek wojtek commented Feb 3, 2026

Summary

  • Add parallel file I/O with read-ahead buffering for index building
  • Use Windows FILE_FLAG_SEQUENTIAL_SCAN for better prefetching
  • Pre-allocate file buffers to avoid O(n²) reallocation
  • Minor optimizations: memchr for line finding, unordered_map for regex cache

Benchmark Results (UE5.6.1 Engine - 186k files, 2GB)

Scenario Baseline Improved Speedup
Cold Build 4.47s 2.72s 1.64x (39% faster)
Incremental 1.15s 1.14s ~1.0x (no change)

Incremental updates only scan metadata, so no improvement expected there.

Changes

  1. fileutil_win.cpp - readFileOptimized() with FILE_FLAG_SEQUENTIAL_SCAN
  2. build.cpp - Parallel read-ahead with multiple reader threads
  3. stringutil.hpp - Use memchr for faster line-end finding
  4. blockingqueue.hpp - notify_one instead of notify_all
  5. project.cpp - unordered_map for regex cache

Use Windows API directly with FILE_FLAG_SEQUENTIAL_SCAN hint for better
prefetching when reading files sequentially. This improves I/O throughput
during index building.

The POSIX implementation returns empty to allow fallback to standard file
reading.
Replace manual loop with memchr() which is typically optimized with
SIMD instructions for faster scanning.
Only one waiting producer needs to be woken when space becomes available,
reducing unnecessary thread wake-ups.
Replace std::map with std::unordered_map for O(1) average lookup
instead of O(log n) when caching compiled regex patterns.
- Pre-allocate file buffer using size hint to avoid O(n^2) reallocation
- Use readFileOptimized() with FILE_FLAG_SEQUENTIAL_SCAN on Windows
- Fallback to standard FileStream for non-Windows or special files
Use multiple reader threads to overlap file I/O with processing during
index building. Reader threads read ahead while the main thread consumes
files in order, improving throughput on systems with high I/O latency.

The number of reader threads scales with available CPU cores, and a
sliding window prevents readers from getting too far ahead.
@wojtek
Copy link
Copy Markdown
Author

wojtek commented Feb 3, 2026

COLD BUILD:
BASELINE: 7.96s, 4.45s, 4.48s → avg 4.47s (excl. cold cache)
IMPROVED: 2.68s, 2.66s, 2.83s → avg 2.72s
SPEEDUP: 1.64x (39% faster)

INCREMENTAL:
BASELINE: 1.14s, 1.16s, 1.16s → avg 1.15s
IMPROVED: 1.14s, 1.15s, 1.13s → avg 1.14s
SPEEDUP: ~1.0x (no change)

Tested on the following config:

qgrep config for Unreal Engine 5.6.1

path E:/UnrealEngine-5.6.1-release/Engine

include .(ini)$
include .(cpp|c|h|hpp|cc|inl)$
include .(ispc|isph)$
include .(cs|vb)$
include .(cmake)$
include .(java|js|kt|kts|ts|tsx)$
include .(md|rst|txt)$
include .(pl|py|pm|rb)$
include .(rs)$
include .(usf|ush|hlsl|glsl|cg|fx|cgfx)$
include .(xml|yml|yaml)$
include .(uplugin|uproject)$
include .(sh|bat)$
exclude ^DerivedDataCache/
exclude ^Intermediate/

186,283 files, 2GB input, 456MB index

@zeux
Copy link
Copy Markdown
Owner

zeux commented Feb 26, 2026

Is the main improvement due to read-ahead? The other changes (modulo some issues) would be easy to merge but that part is more unwieldy. So I'm wondering which changes are responsible for which speedups.

notify_all -> notify_one change is not always efficient. If queue reaches its maximum size and multiple threads wait on it, removal of a large item should wake all consumers, as otherwise the parallelism might be limited.

Changing the order of normalizeEOL & convertUTF8 is incorrect and would break UTF16 files as far as I can tell.

Some of the changes like map -> unordered_map, findLineEnd and sizeHint are no-brainers and could easily be merged if submitted separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants