Skip to content

Improve initialization speed for HollowPrimiaryKeyIndex by allowing parallel hash computation#807

Open
earthconfed wants to merge 1 commit intomasterfrom
primaryKeyIndex
Open

Improve initialization speed for HollowPrimiaryKeyIndex by allowing parallel hash computation#807
earthconfed wants to merge 1 commit intomasterfrom
primaryKeyIndex

Conversation

@earthconfed
Copy link
Copy Markdown

@earthconfed earthconfed commented Mar 28, 2026

Summary

Parallel hash computation during reindex

Parallel hash computation: During reindex(), record hashes can be computed in parallel using Arrays.parallelSetAll for large datasets (configurable via system properties). Hash insertion into the table remains serial to avoid race conditions.

Configurable parallel hash threshold

The threshold that gates parallel hashing defaults to 4096 and is tunable at JVM startup:
-Dcom.netflix.hollow.core.index.HollowPrimaryKeyIndex.parallelHashThreshold=
This follows the same pattern as the existing allowDeltaUpdate flag in the same class.

Test coverage

  • testParallelHashMatchesSerialHashForSameLookups — verifies the parallel path (>4096 records) and serial path (<4096 records) both return correct ordinals and that missing keys return -1
  • testParallelHashThresholdBoundary — exercises exactly 4095 records (serial) and 4096 records (parallel) to validate correct behaviour at the boundary

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR speeds up HollowPrimaryKeyIndex initialization by computing per-record hash codes in parallel during reindex() for sufficiently large datasets, while keeping hash-table insertion serial to avoid races.

Changes:

  • Add a configurable parallelHashThreshold system property (default 4096) to gate parallel hash computation.
  • Update reindex() to use Arrays.parallelSetAll for hash-code computation when above the threshold.
  • Add unit tests covering correctness for both serial/parallel paths and the threshold boundary.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
hollow/src/main/java/com/netflix/hollow/core/index/HollowPrimaryKeyIndex.java Adds a parallel hash-computation path in reindex() plus a new system-property threshold to enable it.
hollow/src/test/java/com/netflix/hollow/core/index/HollowPrimaryKeyIndexTest.java Adds tests validating lookups for both paths and correct behavior at the threshold boundary.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +543 to +550
int[] ordinalArray = new int[cardinality];
int pos = 0;
for (int o = ordinals.nextSetBit(0); o >= 0; o = ordinals.nextSetBit(o + 1)) {
ordinalArray[pos++] = o;
}
int[] hashCodes = new int[cardinality];
Arrays.parallelSetAll(hashCodes, i -> recordHash(ordinalArray[i]));
for (int i = 0; i < cardinality; i++) {
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the parallel branch, the arrays are sized from ordinals.cardinality() and then filled by iterating nextSetBit. If ordinals changes between the cardinality call and the fill loop (notably possible when specificOrdinalsToIndex is caller-provided/mutable), pos can exceed cardinality and throw ArrayIndexOutOfBoundsException, or be less than cardinality leaving default 0 entries that get hashed/inserted incorrectly. Consider collecting ordinals into a dynamically-growing buffer (or growing the array when pos == ordinalArray.length) and then using the final pos for subsequent loops/array sizes.

Suggested change
int[] ordinalArray = new int[cardinality];
int pos = 0;
for (int o = ordinals.nextSetBit(0); o >= 0; o = ordinals.nextSetBit(o + 1)) {
ordinalArray[pos++] = o;
}
int[] hashCodes = new int[cardinality];
Arrays.parallelSetAll(hashCodes, i -> recordHash(ordinalArray[i]));
for (int i = 0; i < cardinality; i++) {
// Collect ordinals into a dynamically sized buffer to avoid relying on a potentially stale cardinality.
List<Integer> ordinalList = new ArrayList<>(cardinality);
for (int o = ordinals.nextSetBit(0); o >= 0; o = ordinals.nextSetBit(o + 1)) {
ordinalList.add(o);
}
int size = ordinalList.size();
int[] ordinalArray = new int[size];
for (int i = 0; i < size; i++) {
ordinalArray[i] = ordinalList.get(i);
}
int[] hashCodes = new int[size];
Arrays.parallelSetAll(hashCodes, i -> recordHash(ordinalArray[i]));
for (int i = 0; i < size; i++) {

Copilot uses AI. Check for mistakes.
Comment on lines +464 to +466
private static final int PARALLEL_HASH_THRESHOLD =
Integer.getInteger("com.netflix.hollow.core.index.HollowPrimaryKeyIndex.parallelHashThreshold", 4096);

Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integer.getInteger(...) will throw NumberFormatException if the system property is set to a non-integer value, which can fail class initialization (NoClassDefFoundError). To make this knob safer in production, parse the property defensively (try/catch) and fall back to the default (and optionally clamp to a sensible minimum like 1).

Suggested change
private static final int PARALLEL_HASH_THRESHOLD =
Integer.getInteger("com.netflix.hollow.core.index.HollowPrimaryKeyIndex.parallelHashThreshold", 4096);
private static final int PARALLEL_HASH_THRESHOLD = getParallelHashThreshold();
private static int getParallelHashThreshold() {
final int defaultValue = 4096;
final String property = System.getProperty(
"com.netflix.hollow.core.index.HollowPrimaryKeyIndex.parallelHashThreshold");
if (property == null) {
return defaultValue;
}
try {
int value = Integer.parseInt(property.trim());
return value < 1 ? 1 : value;
} catch (NumberFormatException e) {
LOG.log(Level.WARNING,
"Invalid value for system property "
+ "com.netflix.hollow.core.index.HollowPrimaryKeyIndex.parallelHashThreshold"
+ "=''{0}''; using default " + defaultValue,
property);
return defaultValue;
}
}

Copilot uses AI. Check for mistakes.
@earthconfed earthconfed requested a review from Sunjeet March 28, 2026 01:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants