Improve initialization speed for HollowPrimiaryKeyIndex by allowing parallel hash computation#807
Improve initialization speed for HollowPrimiaryKeyIndex by allowing parallel hash computation#807earthconfed wants to merge 1 commit intomasterfrom
Conversation
…arallel hash computation
There was a problem hiding this comment.
Pull request overview
This PR speeds up HollowPrimaryKeyIndex initialization by computing per-record hash codes in parallel during reindex() for sufficiently large datasets, while keeping hash-table insertion serial to avoid races.
Changes:
- Add a configurable
parallelHashThresholdsystem property (default 4096) to gate parallel hash computation. - Update
reindex()to useArrays.parallelSetAllfor hash-code computation when above the threshold. - Add unit tests covering correctness for both serial/parallel paths and the threshold boundary.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
hollow/src/main/java/com/netflix/hollow/core/index/HollowPrimaryKeyIndex.java |
Adds a parallel hash-computation path in reindex() plus a new system-property threshold to enable it. |
hollow/src/test/java/com/netflix/hollow/core/index/HollowPrimaryKeyIndexTest.java |
Adds tests validating lookups for both paths and correct behavior at the threshold boundary. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| int[] ordinalArray = new int[cardinality]; | ||
| int pos = 0; | ||
| for (int o = ordinals.nextSetBit(0); o >= 0; o = ordinals.nextSetBit(o + 1)) { | ||
| ordinalArray[pos++] = o; | ||
| } | ||
| int[] hashCodes = new int[cardinality]; | ||
| Arrays.parallelSetAll(hashCodes, i -> recordHash(ordinalArray[i])); | ||
| for (int i = 0; i < cardinality; i++) { |
There was a problem hiding this comment.
In the parallel branch, the arrays are sized from ordinals.cardinality() and then filled by iterating nextSetBit. If ordinals changes between the cardinality call and the fill loop (notably possible when specificOrdinalsToIndex is caller-provided/mutable), pos can exceed cardinality and throw ArrayIndexOutOfBoundsException, or be less than cardinality leaving default 0 entries that get hashed/inserted incorrectly. Consider collecting ordinals into a dynamically-growing buffer (or growing the array when pos == ordinalArray.length) and then using the final pos for subsequent loops/array sizes.
| int[] ordinalArray = new int[cardinality]; | |
| int pos = 0; | |
| for (int o = ordinals.nextSetBit(0); o >= 0; o = ordinals.nextSetBit(o + 1)) { | |
| ordinalArray[pos++] = o; | |
| } | |
| int[] hashCodes = new int[cardinality]; | |
| Arrays.parallelSetAll(hashCodes, i -> recordHash(ordinalArray[i])); | |
| for (int i = 0; i < cardinality; i++) { | |
| // Collect ordinals into a dynamically sized buffer to avoid relying on a potentially stale cardinality. | |
| List<Integer> ordinalList = new ArrayList<>(cardinality); | |
| for (int o = ordinals.nextSetBit(0); o >= 0; o = ordinals.nextSetBit(o + 1)) { | |
| ordinalList.add(o); | |
| } | |
| int size = ordinalList.size(); | |
| int[] ordinalArray = new int[size]; | |
| for (int i = 0; i < size; i++) { | |
| ordinalArray[i] = ordinalList.get(i); | |
| } | |
| int[] hashCodes = new int[size]; | |
| Arrays.parallelSetAll(hashCodes, i -> recordHash(ordinalArray[i])); | |
| for (int i = 0; i < size; i++) { |
| private static final int PARALLEL_HASH_THRESHOLD = | ||
| Integer.getInteger("com.netflix.hollow.core.index.HollowPrimaryKeyIndex.parallelHashThreshold", 4096); | ||
|
|
There was a problem hiding this comment.
Integer.getInteger(...) will throw NumberFormatException if the system property is set to a non-integer value, which can fail class initialization (NoClassDefFoundError). To make this knob safer in production, parse the property defensively (try/catch) and fall back to the default (and optionally clamp to a sensible minimum like 1).
| private static final int PARALLEL_HASH_THRESHOLD = | |
| Integer.getInteger("com.netflix.hollow.core.index.HollowPrimaryKeyIndex.parallelHashThreshold", 4096); | |
| private static final int PARALLEL_HASH_THRESHOLD = getParallelHashThreshold(); | |
| private static int getParallelHashThreshold() { | |
| final int defaultValue = 4096; | |
| final String property = System.getProperty( | |
| "com.netflix.hollow.core.index.HollowPrimaryKeyIndex.parallelHashThreshold"); | |
| if (property == null) { | |
| return defaultValue; | |
| } | |
| try { | |
| int value = Integer.parseInt(property.trim()); | |
| return value < 1 ? 1 : value; | |
| } catch (NumberFormatException e) { | |
| LOG.log(Level.WARNING, | |
| "Invalid value for system property " | |
| + "com.netflix.hollow.core.index.HollowPrimaryKeyIndex.parallelHashThreshold" | |
| + "=''{0}''; using default " + defaultValue, | |
| property); | |
| return defaultValue; | |
| } | |
| } |
Summary
Parallel hash computation during reindex
Parallel hash computation: During reindex(), record hashes can be computed in parallel using Arrays.parallelSetAll for large datasets (configurable via system properties). Hash insertion into the table remains serial to avoid race conditions.
Configurable parallel hash threshold
The threshold that gates parallel hashing defaults to 4096 and is tunable at JVM startup:
-Dcom.netflix.hollow.core.index.HollowPrimaryKeyIndex.parallelHashThreshold=
This follows the same pattern as the existing allowDeltaUpdate flag in the same class.
Test coverage