Skip to content

Fix InMemoryRegistrar performance: O(1) lookups, ConcurrentHashMap, cache clearing#2

Open
devin-ai-integration[bot] wants to merge 5 commits intomainfrom
devin/1775083967-fix-inmemoryregistrar-performance
Open

Fix InMemoryRegistrar performance: O(1) lookups, ConcurrentHashMap, cache clearing#2
devin-ai-integration[bot] wants to merge 5 commits intomainfrom
devin/1775083967-fix-inmemoryregistrar-performance

Conversation

@devin-ai-integration
Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration bot commented Apr 1, 2026

Summary

Addresses performance bottlenecks in InMemoryRegistrar that cause OOM and slow validation on large bundles (23k+ products). Four files changed:

InMemoryRegistrar.java

  • Adds childrenByParent index populated in addTarget() so getChildTargets() is O(1) lookup instead of O(n) full scan.
  • Adds targetCountByType map (ConcurrentHashMap<TargetType, AtomicInteger>) and labelCount (AtomicInteger) counter so getTargetCount()/getLabelCount() return cached values instead of iterating all targets. Uses AtomicInteger values to avoid boxed Integer allocations on every increment.
  • Guards index updates with isNew check to prevent duplicate children and inflated counts when addTarget() is called multiple times for the same location. Logs a warning if a location is ever re-registered with a different TargetType (documents the invariant that this should not happen).
  • Replaces HashMap/HashSet with ConcurrentHashMap/ConcurrentHashMap.newKeySet() — including inner Sets in identifierDefinitionsByLid and referencedIdentifiersByLid — and removes synchronized from five read-only methods: hasTarget(), isTargetReferenced(), isIdentifierReferenced(), getTargetCount(), getLabelCount().

ValidationTarget.java

  • Adds clearCache() static method to clear the unbounded cachedTargets static HashMap.

ValidateLauncher.java

  • Calls ValidationTarget.clearCache() after each target validation completes to free memory between runs.

InMemoryRegistrarTest.java (new)

  • 6 unit tests covering multi-level hierarchy child lookups, leaf nodes, duplicate addTarget() deduplication, count-by-type, count inflation prevention, and label counting.

Updates Since Initial Review

Addresses all six points from @jordanpadams' first review, plus two follow-up items:

  1. addTarget() call sites audited: Both call sites in RegisterTargets.java pass the correct direct parent (getParentTarget() at line 41 and targetLocation at line 59).
  2. Multi-level hierarchy test added: testGetChildTargetsMultiLevelHierarchy() builds bundle → collections → products and verifies only direct children are returned at each level.
  3. ArrayList thread-safety documented: Inline comment on childrenByParent field explains that inner ArrayLists are safe because both addTarget() and getChildTargets() are synchronized.
  4. clearCache() timing confirmed safe: BlockingTaskManager.submit() runs tasks synchronously, so monitor.endValidation() guarantees all work is complete before clearCache() is called.
  5. labelCountAtomicInteger: Future-proofed against removal of synchronized from setTargetIsLabel().
  6. Redundant synchronized removed from getTargetCount() and getLabelCount() — they now only read from ConcurrentHashMap and AtomicInteger.
  7. Re-registration type-change guard (follow-up): addTarget() now reads the existing target's type before overwriting and emits LOG.warn if the type differs. The invariant is documented inline.
  8. targetCountByTypeConcurrentHashMap<TargetType, AtomicInteger> (follow-up): Eliminates short-lived boxed Integer allocations from merge() on every addTarget() call. Uses computeIfAbsent(...).incrementAndGet() instead.
  9. Full test suite run: mvn test -Dtest=\!ReferenceIntegrityTest* -Dcucumber.filter.tags='@v3.7.x' → 38 run, 0 failures, BUILD SUCCESS.

Review & Testing Checklist for Human

  • Semantic change in getChildTargets(): The old implementation discovered children by filesystem path-prefix matching (parentLocation + File.separator). The new implementation relies on the parentLocation argument passed to addTarget(). The two RegisterTargets.java call sites were audited and a unit test was added, but other code paths or plugins that call addTarget() may exist — grep for addTarget across the full codebase to confirm.
  • Re-registration with different parent: If addTarget() is called for the same location with a different parentLocation, the childrenByParent index retains only the original parent's entry. The type-change case now logs a warning, but the parent-change case is silent. Verify this never happens in practice.
  • clearCache() in non-CLI contexts: The BlockingTaskManager analysis confirms safety for the CLI path. If validate is used as a library with an async TaskManager, clearCache() could race with in-flight work. Verify that ValidateLauncher is the only entry point that matters.
  • Test with a large bundle (e.g., 23k+ products) to confirm memory usage is reduced and child target resolution still works correctly end-to-end.

Notes

  • The setTargets()/setCollections()/setBundles() public setters can replace the ConcurrentHashMap instances with arbitrary Map implementations — pre-existing issue, not introduced here.
  • Inner Sets in identifierDefinitionsByLid and referencedIdentifiersByLid now use ConcurrentHashMap.newKeySet() instead of HashSet to avoid ConcurrentModificationException from unsynchronized reads in isIdentifierReferenced().

Link to Devin session: https://nasa-jpl-demo.devinenterprise.com/sessions/085fab5912d94d948b9826a7dd8d7b71


Open with Devin

…ntHashMap, cache clearing

- Add parent-child index (childrenByParent) to eliminate O(n) scan in getChildTargets()
- Add count-by-type index (targetCountByType, labelCount) for O(1) getTargetCount()/getLabelCount()
- Replace HashMap/HashSet with ConcurrentHashMap for thread-safe reads without coarse locking
- Remove synchronized from read-only methods hasTarget(), isTargetReferenced(), isIdentifierReferenced()
- Add ValidationTarget.clearCache() and call it from ValidateLauncher after each validation to prevent OOM

Fixes #1569

Co-Authored-By: jordan.h.padams <jordan.h.padams@jpl.nasa.gov>
@devin-ai-integration
Copy link
Copy Markdown
Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Fixes potential ConcurrentModificationException when isIdentifierReferenced()
reads inner Sets without synchronization while addIdentifierReference() or
setTargetIdentifier() writes to them concurrently.

Co-Authored-By: jordan.h.padams <jordan.h.padams@jpl.nasa.gov>
devin-ai-integration[bot]

This comment was marked as resolved.

…ype on duplicate addTarget calls

Guard index updates with an isNew check so that re-registering the same
location does not append duplicate children or over-count target types.
The old code derived counts/children from the targets Map (unique keys),
so duplicates were implicitly handled. The new indexes must be explicitly
guarded.
Copy link
Copy Markdown

@jordanpadams jordanpadams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: InMemoryRegistrar Performance Optimization

Good performance work overall — the O(1) child lookup and count caching address real scalability problems for large bundles. A few issues need to be resolved before merge.

Critical: Semantic Change in getChildTargets() Without Test Coverage

The old implementation found children by filesystem path-prefix matching:

targetLocation.startsWith(parentLocation + File.separator)
&& !targetLocation.substring(parentLocation.length()).contains(File.separator)

The new implementation relies entirely on the parentLocation argument passed to addTarget(). These are equivalent only if every caller passes the correct direct parent. If any caller passes a grandparent, null, or an unrelated location, children will be silently omitted — breaking referential integrity checks.

Please: (1) audit all addTarget() call sites to confirm parentLocation is always the immediate parent; (2) add a test that exercises getChildTargets() on a multi-level hierarchy.

Bug: Mutable ArrayList Inside ConcurrentHashMap Is Misleading

childrenByParent.computeIfAbsent(parentLocation, k -> new ArrayList<>()).add(location);

The ArrayList is not thread-safe. It happens to be safe here because both addTarget() and getChildTargets() are synchronized, so the ArrayList is only accessed under the object monitor. But the ConcurrentHashMap wrapper implies lock-free safety, which is misleading. Either document clearly that the ArrayList is only accessed under synchronized, or use a truly concurrent list type.

Concern: clearCache() Timing

validator.validate(monitor, target);
monitor.endValidation();
ValidationTarget.clearCache();  // called immediately after

If validator.validate() spawns background threads that still hold references to cachedTargets entries (e.g., async content validation), those threads will see nulls after clearCache(). Please confirm monitor.endValidation() guarantees all spawned work is complete before returning.

Minor: labelCount Is a Plain int, Inconsistent with ConcurrentHashMap Approach

labelCount is only modified in synchronized methods so visibility is correct today. But it is inconsistent with the rest of the PR — if synchronized is later removed from setTargetIsLabel()/getLabelCount(), this silently becomes a data race. Consider AtomicInteger.

Minor: Redundant synchronized on Methods That Only Read ConcurrentHashMap

getTargetCount() and getLabelCount() are still synchronized but only read from concurrent structures. The synchronization is unnecessary and contradicts the PR's goal of reducing lock contention.

Required Before Merge: Full Test Suite

Only mvn compile was verified locally per the PR description. Please run at minimum:

mvn test -Dtest=\!ReferenceIntegrityTest* -Dcucumber.filter.tags='@v3.7.x'

The getChildTargets() semantic change makes this especially important.

…chronized, document ArrayList thread-safety, add InMemoryRegistrarTest

Co-Authored-By: jordan.h.padams <jordan.h.padams@jpl.nasa.gov>
@devin-ai-integration
Copy link
Copy Markdown
Author

All six points from this review have been addressed in commit 937d380. Summary:

1. Semantic Change in getChildTargets() — Audit + Test

Audit: Both addTarget() call sites are in RegisterTargets.java:

  • Line 45: parentLocation comes from getParentTarget(), which is the immediate parent.
  • Line 59: parentLocation is targetLocation (the current directory), and crawler.crawl() returns immediate children.

Both pass the correct direct parent.

Test: Added InMemoryRegistrarTest with 6 tests, including testGetChildTargetsMultiLevelHierarchy() which builds a 3-level hierarchy (bundle → collections → products) and verifies getChildTargets() returns only direct children at each level — not grandchildren.

2. Mutable ArrayList Inside ConcurrentHashMap

Added inline documentation above the childrenByParent field:

// The inner ArrayLists are only accessed under the instance monitor (both addTarget()
// and getChildTargets() are synchronized), so they are thread-safe despite being mutable.

3. clearCache() Timing

Confirmed safe. BlockingTaskManager.submit() (the implementation used in CLI mode) runs tasks synchronously — its comment says "running tasks synchronously". So when monitor.endValidation() returns, all work is complete and clearCache() is safe to call.

4. labelCountAtomicInteger

Changed from private int labelCount = 0 to private AtomicInteger labelCount = new AtomicInteger(0). All reads/writes updated to use .get(), .incrementAndGet(), .decrementAndGet().

5. Redundant synchronized on getTargetCount() and getLabelCount()

Removed synchronized from both methods — they now only read from ConcurrentHashMap and AtomicInteger, which are inherently thread-safe.

6. Full Test Suite

Ran mvn test -Dtest=\!ReferenceIntegrityTest* -Dcucumber.filter.tags='@v3.7.x' locally:

Tests run: 38, Failures: 0, Errors: 0, Skipped: 23
BUILD SUCCESS

All 6 new InMemoryRegistrarTest tests pass, along with all existing tests.

Copy link
Copy Markdown
Author

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 7 additional findings in Devin Review.

Open in Devin Review

Comment on lines 68 to +83
this.collections.put(location, target);
}

boolean isNew = !this.targets.containsKey(location);
this.targets.put(location, target);

// Only update indexes for genuinely new targets to avoid duplicates
if (isNew) {
// Index parent-child relationship for O(1) child lookups
if (parentLocation != null) {
childrenByParent.computeIfAbsent(parentLocation, k -> new ArrayList<>()).add(location);
}

// Increment count-by-type index
targetCountByType.merge(type, 1, Integer::sum);
}
Copy link
Copy Markdown
Author

@devin-ai-integration devin-ai-integration bot Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 targetCountByType becomes stale when target is re-added with a different TargetType

In addTarget(), the isNew check at line 71 prevents updating targetCountByType when the same location is added again. However, the target object in the targets map IS replaced unconditionally (line 72), and bundles/collections maps are also updated unconditionally (lines 65-69). If the same location is re-added with a different TargetType, the count for the old type remains inflated and the count for the new type is missing the entry — yet the targets map now has the new target.

While the primary caller RegisterTargets guards against re-addition (InMemoryRegistrar.java:44 via registrar.hasTarget()), addTarget() itself is a public interface method with no such contract, and the bundles/collections maps are still unconditionally updated even when isNew is false — creating an inconsistency where a target can appear in bundles but not be reflected in targetCountByType for BUNDLE.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown

@jordanpadams jordanpadams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-Review: InMemoryRegistrar Performance Optimization

All five issues from the initial review have been addressed. The AtomicInteger for labelCount, removal of redundant synchronized on the count getters, the multi-level hierarchy test, the addTarget() call-site audit, and the clearCache() timing analysis are all solid additions. The test suite passes.

One Remaining Concern: Re-registration With a Different TargetType

The isNew guard correctly prevents duplicate children and inflated counts when addTarget() is called multiple times for the same location. However, it only checks whether the location is new — it does not handle the case where the same location is re-registered with a different TargetType:

boolean isNew = !this.targets.containsKey(location);
this.targets.put(location, target);       // overwrites old target (new type)
if (isNew) {
    targetCountByType.merge(type, 1, Integer::sum);   // only runs for first registration
}

If this ever happens, targets will reflect the new type, but targetCountByType will still count the old type. The PR's own checklist flags this as an open item. The original code was also undefined for this case, so this is pre-existing behavior, but worth confirming: can addTarget() ever be called for the same location with different types? The RegisterTargets.java audit presumably rules this out for the two known call sites, but a brief comment in the code (or an assert type == existingTarget.getType() under a debug flag) would make the assumption explicit and catch future regressions.

Minor: getTargetCount() Reads Boxed Integer From ConcurrentHashMap Without Synchronization

public int getTargetCount(TargetType type) {
    return targetCountByType.getOrDefault(type, 0);
}

ConcurrentHashMap.merge() and getOrDefault() are each individually atomic, but the boxed Integer values mean each write via merge() allocates a new object. This is fine for correctness (ConcurrentHashMap's internal locking ensures visibility), but worth noting if GC pressure ever becomes a concern on very large bundles. LongAdder per type (or ConcurrentHashMap<TargetType, AtomicInteger>) would eliminate the allocation, but this is a minor optimization and not a blocker.

Confirmed Fixed

  • getChildTargets() semantic change audited: both RegisterTargets.java call sites pass the correct direct parent ✓
  • Multi-level hierarchy test (testGetChildTargetsMultiLevelHierarchy) added and passes ✓
  • ArrayList inside ConcurrentHashMap documented as safe under synchronized
  • clearCache() timing confirmed safe: BlockingTaskManager.submit() is synchronous ✓
  • labelCountAtomicInteger
  • Redundant synchronized removed from getTargetCount() and getLabelCount()
  • Full test suite passes ✓

Summary

This is ready to merge once the re-registration concern is either confirmed impossible (with a comment in the code) or handled defensively. Everything else looks good.

@jordanpadams
Copy link
Copy Markdown

Remaining Issue 1: Re-registration of the same location with a different TargetType

The isNew guard prevents duplicate children and inflated counts when addTarget() is called multiple times for the same location. However, it only checks whether the location is new — it does not handle the case where the same location is re-registered with a different TargetType:

boolean isNew = !this.targets.containsKey(location);
this.targets.put(location, target);       // overwrites old target (and its type)
if (isNew) {
    targetCountByType.merge(type, 1, Integer::sum);  // skipped on re-registration
}

If addTarget() is ever called for the same location with a different type, targets will reflect the new type but targetCountByType will still count only the original type — a silent stale count. The two RegisterTargets.java call sites were audited and this presumably never happens, but please add a short comment or assertion in addTarget() making the invariant explicit (e.g., assert !targets.containsKey(location) || targets.get(location).getType() == type : "re-registration with different type") so it's visible and catches regressions.

Remaining Issue 2: Boxed Integer allocations in targetCountByType.merge()

Each call to targetCountByType.merge(type, 1, Integer::sum) allocates a new Integer object for the updated value. For large bundles (23k+ products), this is a steady stream of short-lived allocations under synchronized addTarget(). Not a correctness issue, but a minor GC concern given the PR's performance motivation. Consider ConcurrentHashMap<TargetType, AtomicInteger> with computeIfAbsent(..., k -> new AtomicInteger(0)).incrementAndGet() to avoid the boxing entirely.

…to AtomicInteger to avoid boxing

Co-Authored-By: jordan.h.padams <jordan.h.padams@jpl.nasa.gov>
@devin-ai-integration
Copy link
Copy Markdown
Author

Both issues addressed in commit 431439e:

Issue 1: Re-registration with different TargetType

Added a defensive check in addTarget() that reads the existing target's type before the put() overwrites it. If a location is ever re-registered with a different type, it logs a warning:

if (!isNew) {
    ValidationTarget existing = this.targets.get(location);
    if (existing != null && !type.equals(existing.getType())) {
        LOG.warn("addTarget(): location {} re-registered with type {} (was {})",
            location, type, existing.getType());
    }
}

I chose LOG.warn over assert because asserts are disabled by default in production JVMs, and a warning log will surface the issue without crashing validation. The accompanying comment documents the invariant and explains why it holds (both RegisterTargets.java call sites derive type from Utility.getTargetType()).

Issue 2: Boxed Integer allocations

Switched targetCountByType from ConcurrentHashMap<TargetType, Integer> to ConcurrentHashMap<TargetType, AtomicInteger>:

// Before: allocates a new Integer on every merge()
targetCountByType.merge(type, 1, Integer::sum);

// After: in-place increment, no boxing
targetCountByType.computeIfAbsent(type, k -> new AtomicInteger(0)).incrementAndGet();

getTargetCount() updated accordingly to call .get() on the AtomicInteger.

All 6 InMemoryRegistrarTest tests pass.

Copy link
Copy Markdown
Author

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 9 additional findings in Devin Review.

Open in Devin Review

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 setTargets() bypasses new indexes, causing stale getChildTargets(), getTargetCount(), and getLabelCount() results

The PR changes getChildTargets(), getTargetCount(), and getLabelCount() to use precomputed indexes (childrenByParent, targetCountByType, labelCount) instead of scanning the targets map. However, the existing setTargets() method at InMemoryRegistrar.java:338 replaces the targets map without rebuilding these indexes. After a setTargets() call, the indexes become stale: getChildTargets() returns children from the old map, getTargetCount() returns old counts, and getLabelCount() reflects old label state. Before this PR, all three methods scanned the live targets map, so setTargets() was inherently consistent. The same issue applies to setBundles() and setCollections(), though those only affect their own maps. While there are no current callers of setTargets() in the codebase, it is a public method on the TargetRegistrar interface (TargetRegistrar.java:191).

(Refers to lines 338-340)

Prompt for agents
The setTargets() method (and similarly setBundles/setCollections) replaces the targets map but does not rebuild the childrenByParent, targetCountByType, or labelCount indexes that were introduced in this PR. Before the PR, getChildTargets/getTargetCount/getLabelCount scanned the live targets map, so setTargets was inherently consistent. Now they use precomputed indexes that become stale.

Possible approaches:
1. Rebuild all indexes inside setTargets() by iterating the new map (but parent-child info is lost since it's not stored on ValidationTarget).
2. Throw UnsupportedOperationException in setTargets() if there are no callers, or deprecate it.
3. Document that setTargets() must not be called after addTarget() has been used, or that callers must also manually rebuild indexes.

The same issue applies to setCollections() and setBundles(), though those don't affect the child/count indexes directly.

Files: InMemoryRegistrar.java (setTargets at line 338, setBundles at line 328, setCollections at line 318), TargetRegistrar.java interface.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant