Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
112 changes: 71 additions & 41 deletions src/main/java/gov/nasa/pds/tools/validate/InMemoryRegistrar.java
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 setTargets() bypasses new indexes, causing stale getChildTargets(), getTargetCount(), and getLabelCount() results

The PR changes getChildTargets(), getTargetCount(), and getLabelCount() to use precomputed indexes (childrenByParent, targetCountByType, labelCount) instead of scanning the targets map. However, the existing setTargets() method at InMemoryRegistrar.java:338 replaces the targets map without rebuilding these indexes. After a setTargets() call, the indexes become stale: getChildTargets() returns children from the old map, getTargetCount() returns old counts, and getLabelCount() reflects old label state. Before this PR, all three methods scanned the live targets map, so setTargets() was inherently consistent. The same issue applies to setBundles() and setCollections(), though those only affect their own maps. While there are no current callers of setTargets() in the codebase, it is a public method on the TargetRegistrar interface (TargetRegistrar.java:191).

(Refers to lines 338-340)

Prompt for agents
The setTargets() method (and similarly setBundles/setCollections) replaces the targets map but does not rebuild the childrenByParent, targetCountByType, or labelCount indexes that were introduced in this PR. Before the PR, getChildTargets/getTargetCount/getLabelCount scanned the live targets map, so setTargets was inherently consistent. Now they use precomputed indexes that become stale.

Possible approaches:
1. Rebuild all indexes inside setTargets() by iterating the new map (but parent-child info is lost since it's not stored on ValidationTarget).
2. Throw UnsupportedOperationException in setTargets() if there are no callers, or deprecate it.
3. Document that setTargets() must not be called after addTarget() has been used, or that callers must also manually rebuild indexes.

The same issue applies to setCollections() and setBundles(), though those don't affect the child/count indexes directly.

Files: InMemoryRegistrar.java (setTargets at line 338, setBundles at line 328, setCollections at line 318), TargetRegistrar.java interface.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@
import java.io.File;
import java.net.MalformedURLException;
import java.util.*;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.function.Function;
import java.util.function.Predicate;
import java.util.stream.Collectors;
Expand All @@ -28,14 +30,24 @@ public class InMemoryRegistrar implements TargetRegistrar {

private static Logger LOG = LoggerFactory.getLogger(InMemoryRegistrar.class);
private ValidationTarget rootTarget;
private Map<String, ValidationTarget> targets = new HashMap<>();
private Map<String, ValidationTarget> collections = new HashMap<>();
private Map<String, ValidationTarget> bundles = new HashMap<>();
private Set<String> referencedTargetLocations = new HashSet<>();
private Map<Identifier, String> identifierDefinitions = new HashMap<>();
private Map<String, Set<Identifier>> identifierDefinitionsByLid = new HashMap<>();
private Map<Identifier, String> identifierReferenceLocations = new HashMap<>();
private Map<String, Set<Identifier>> referencedIdentifiersByLid = new HashMap<>();
private Map<String, ValidationTarget> targets = new ConcurrentHashMap<>();
private Map<String, ValidationTarget> collections = new ConcurrentHashMap<>();
private Map<String, ValidationTarget> bundles = new ConcurrentHashMap<>();
private Set<String> referencedTargetLocations = ConcurrentHashMap.newKeySet();
private Map<Identifier, String> identifierDefinitions = new ConcurrentHashMap<>();
private Map<String, Set<Identifier>> identifierDefinitionsByLid = new ConcurrentHashMap<>();
private Map<Identifier, String> identifierReferenceLocations = new ConcurrentHashMap<>();
private Map<String, Set<Identifier>> referencedIdentifiersByLid = new ConcurrentHashMap<>();

// Parent-child index: maps parent location to list of direct child locations.
// The inner ArrayLists are only accessed under the instance monitor (both addTarget()
// and getChildTargets() are synchronized), so they are thread-safe despite being mutable.
private Map<String, List<String>> childrenByParent = new ConcurrentHashMap<>();

// Count-by-type index: tracks target counts per TargetType.
// Uses AtomicInteger to avoid boxed Integer allocations on each merge() call.
private Map<TargetType, AtomicInteger> targetCountByType = new ConcurrentHashMap<>();
private AtomicInteger labelCount = new AtomicInteger(0);

@Override
public ValidationTarget getRoot() {
Expand All @@ -57,7 +69,33 @@ public synchronized void addTarget(String parentLocation, TargetType type, Strin
this.collections.put(location, target);
}

boolean isNew = !this.targets.containsKey(location);

// Only update indexes for genuinely new targets to avoid duplicates.
// Invariant: a location is never re-registered with a different TargetType.
// Both RegisterTargets.java call sites derive the type from the location itself
// (via Utility.getTargetType), so re-registration with a different type cannot
// happen in practice. Log a warning if this invariant is ever violated.
if (!isNew) {
ValidationTarget existing = this.targets.get(location);
if (existing != null && !type.equals(existing.getType())) {
LOG.warn("addTarget(): location {} re-registered with type {} (was {})",
location, type, existing.getType());
}
}

this.targets.put(location, target);

if (isNew) {
// Index parent-child relationship for O(1) child lookups
if (parentLocation != null) {
childrenByParent.computeIfAbsent(parentLocation, k -> new ArrayList<>()).add(location);
}

// Increment count-by-type index
targetCountByType.computeIfAbsent(type, k -> new AtomicInteger(0)).incrementAndGet();
}
Comment on lines 69 to +97
Copy link
Copy Markdown
Author

@devin-ai-integration devin-ai-integration bot Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 targetCountByType becomes stale when target is re-added with a different TargetType

In addTarget(), the isNew check at line 71 prevents updating targetCountByType when the same location is added again. However, the target object in the targets map IS replaced unconditionally (line 72), and bundles/collections maps are also updated unconditionally (lines 65-69). If the same location is re-added with a different TargetType, the count for the old type remains inflated and the count for the new type is missing the entry — yet the targets map now has the new target.

While the primary caller RegisterTargets guards against re-addition (InMemoryRegistrar.java:44 via registrar.hasTarget()), addTarget() itself is a public interface method with no such contract, and the bundles/collections maps are still unconditionally updated even when isNew is false — creating an inconsistency where a target can appear in bundles but not be reflected in targetCountByType for BUNDLE.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.


LOG.debug("addTarget(): location: {}, target: {}", location, target);
} catch (MalformedURLException e) {
// TODO Auto-generated catch block
Expand All @@ -67,40 +105,39 @@ public synchronized void addTarget(String parentLocation, TargetType type, Strin

@Override
public synchronized Collection<ValidationTarget> getChildTargets(ValidationTarget parent) {
List<ValidationTarget> children = new ArrayList<>();
String parentLocation = parent.getLocation() + File.separator;

for (String targetLocation : targets.keySet()) {
if (targetLocation.startsWith(parentLocation)
&& !targetLocation.substring(parentLocation.length()).contains(File.separator)) {
children.add(targets.get(targetLocation));
}
List<String> childLocations = childrenByParent.getOrDefault(parent.getLocation(), Collections.emptyList());
List<ValidationTarget> children = new ArrayList<>(childLocations.size());
for (String loc : childLocations) {
ValidationTarget t = targets.get(loc);
if (t != null) children.add(t);
}

Collections.sort(children);
return children;
}

@Override
public synchronized boolean hasTarget(String targetLocation) {
public boolean hasTarget(String targetLocation) {
return targets.containsKey(targetLocation);
}

@Override
public synchronized int getTargetCount(TargetType type) {
int count = 0;

for (Map.Entry<String, ValidationTarget> entry : targets.entrySet()) {
if (entry.getValue().getType() == type) {
++count;
}
}
return count;
public int getTargetCount(TargetType type) {
AtomicInteger count = targetCountByType.get(type);
return count != null ? count.get() : 0;
}

@Override
public synchronized void setTargetIsLabel(String location, boolean isLabel) {
targets.get(location).setLabel(isLabel);
ValidationTarget target = targets.get(location);
boolean wasLabel = target.isLabel();
target.setLabel(isLabel);

// Update label count index
if (isLabel && !wasLabel) {
labelCount.incrementAndGet();
} else if (!isLabel && wasLabel) {
labelCount.decrementAndGet();
}

// Labels refer to themselves.
if (isLabel) {
Expand All @@ -109,15 +146,8 @@ public synchronized void setTargetIsLabel(String location, boolean isLabel) {
}

@Override
public synchronized int getLabelCount() {
int count = 0;
for (Map.Entry<String, ValidationTarget> entry : targets.entrySet()) {
if (entry.getValue().isLabel()) {
++count;
}
}

return count;
public int getLabelCount() {
return labelCount.get();
}

@Override
Expand All @@ -126,7 +156,7 @@ public synchronized void setTargetIdentifier(String location, Identifier identif
LOG.debug("setTargetIdentifier:identifier,location {},{}", identifier, location);
identifierDefinitions.put(identifier, location);

identifierDefinitionsByLid.computeIfAbsent(identifier.getLid(), x -> new HashSet<>()).add(identifier);
identifierDefinitionsByLid.computeIfAbsent(identifier.getLid(), x -> ConcurrentHashMap.newKeySet()).add(identifier);
}

@Override
Expand All @@ -135,7 +165,7 @@ public synchronized void addTargetReference(String referenceLocation, String tar
}

@Override
public synchronized boolean isTargetReferenced(String location) {
public boolean isTargetReferenced(String location) {
return referencedTargetLocations.contains(location);
}

Expand All @@ -145,11 +175,11 @@ public synchronized void addIdentifierReference(String referenceLocation, Identi

String lid = identifier.getLid();

referencedIdentifiersByLid.computeIfAbsent(identifier.getLid(), x -> new HashSet<>()).add(identifier);
referencedIdentifiersByLid.computeIfAbsent(identifier.getLid(), x -> ConcurrentHashMap.newKeySet()).add(identifier);
}

@Override
public synchronized boolean isIdentifierReferenced(Identifier identifier, boolean orNearNeighbor) {
public boolean isIdentifierReferenced(Identifier identifier, boolean orNearNeighbor) {
boolean result = identifierReferenceLocations.containsKey(identifier);
if (!result && orNearNeighbor) {
for (Identifier id : this.referencedIdentifiersByLid.getOrDefault(identifier.getLid(), Collections.emptySet())) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -209,4 +209,13 @@ public boolean equals(Object obj) {
public URL getUrl() {
return url;
}

/**
* Clears the static cache of ValidationTargets to free memory between
* validation runs. Without this, cachedTargets grows unboundedly across
* runs and can cause OOM on large bundles.
*/
public static void clearCache() {
cachedTargets.clear();
}
}
4 changes: 4 additions & 0 deletions src/main/java/gov/nasa/pds/validate/ValidateLauncher.java
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,7 @@
import gov.nasa.pds.tools.validate.ValidateProblemHandler;
import gov.nasa.pds.tools.validate.ValidationProblem;
import gov.nasa.pds.tools.validate.ValidationResourceManager;
import gov.nasa.pds.tools.validate.ValidationTarget;
import gov.nasa.pds.tools.validate.rule.pds4.SchemaValidator;
import gov.nasa.pds.validate.checksum.ChecksumManifest;
import gov.nasa.pds.validate.commandline.options.ConfigKey;
Expand Down Expand Up @@ -1526,6 +1527,9 @@ public boolean doValidation(Map<URL, String> checksumManifest) throws Exception
validator.validate(monitor, target);
monitor.endValidation();

// Free cached ValidationTargets to prevent OOM on large bundles
ValidationTarget.clearCache();

if (validationRule != null) {
// If the rule is pds4.label, clear out the list of Information Model Versions
// except the first element.
Expand Down
150 changes: 150 additions & 0 deletions src/test/java/gov/nasa/pds/tools/validate/InMemoryRegistrarTest.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
package gov.nasa.pds.tools.validate;

import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.junit.jupiter.api.Assertions.assertTrue;

import java.util.Collection;
import java.util.List;
import java.util.stream.Collectors;

import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;

/**
* Unit tests for {@link InMemoryRegistrar}, focusing on the parent-child index
* and count-by-type caching introduced to eliminate O(n) scans.
*/
class InMemoryRegistrarTest {

private InMemoryRegistrar registrar;

// Use file:// URLs to match the format used in production
private static final String BUNDLE = "file:///data/bundle";
private static final String COLLECTION_A = "file:///data/bundle/collection_a";
private static final String COLLECTION_B = "file:///data/bundle/collection_b";
private static final String PRODUCT_1 = "file:///data/bundle/collection_a/product1.xml";
private static final String PRODUCT_2 = "file:///data/bundle/collection_a/product2.xml";
private static final String PRODUCT_3 = "file:///data/bundle/collection_b/product3.xml";

@BeforeEach
void setUp() {
registrar = new InMemoryRegistrar();
ValidationTarget.clearCache();
}

@Test
void testGetChildTargetsMultiLevelHierarchy() {
// Build a 3-level hierarchy: bundle -> collections -> products
registrar.addTarget(null, TargetType.BUNDLE, BUNDLE);
registrar.addTarget(BUNDLE, TargetType.COLLECTION, COLLECTION_A);
registrar.addTarget(BUNDLE, TargetType.COLLECTION, COLLECTION_B);
registrar.addTarget(COLLECTION_A, TargetType.FILE, PRODUCT_1);
registrar.addTarget(COLLECTION_A, TargetType.FILE, PRODUCT_2);
registrar.addTarget(COLLECTION_B, TargetType.FILE, PRODUCT_3);

// Bundle's children should be only the two collections (not grandchildren)
ValidationTarget bundleTarget = registrar.getRoot();
Collection<ValidationTarget> bundleChildren = registrar.getChildTargets(bundleTarget);
List<String> bundleChildLocations = bundleChildren.stream()
.map(ValidationTarget::getLocation)
.collect(Collectors.toList());

assertEquals(2, bundleChildLocations.size(),
"Bundle should have exactly 2 direct children (collections)");
assertTrue(bundleChildLocations.contains(COLLECTION_A));
assertTrue(bundleChildLocations.contains(COLLECTION_B));

// Collection A's children should be product1 and product2
ValidationTarget collATarget = registrar.getTargets().get(COLLECTION_A);
Collection<ValidationTarget> collAChildren = registrar.getChildTargets(collATarget);
List<String> collAChildLocations = collAChildren.stream()
.map(ValidationTarget::getLocation)
.collect(Collectors.toList());

assertEquals(2, collAChildLocations.size(),
"Collection A should have exactly 2 direct children (products)");
assertTrue(collAChildLocations.contains(PRODUCT_1));
assertTrue(collAChildLocations.contains(PRODUCT_2));

// Collection B's children should be only product3
ValidationTarget collBTarget = registrar.getTargets().get(COLLECTION_B);
Collection<ValidationTarget> collBChildren = registrar.getChildTargets(collBTarget);
List<String> collBChildLocations = collBChildren.stream()
.map(ValidationTarget::getLocation)
.collect(Collectors.toList());

assertEquals(1, collBChildLocations.size(),
"Collection B should have exactly 1 direct child");
assertTrue(collBChildLocations.contains(PRODUCT_3));
}

@Test
void testGetChildTargetsLeafNodeReturnsEmpty() {
registrar.addTarget(null, TargetType.BUNDLE, BUNDLE);
registrar.addTarget(BUNDLE, TargetType.FILE, PRODUCT_1);

// Leaf node should have no children
ValidationTarget product = registrar.getTargets().get(PRODUCT_1);
Collection<ValidationTarget> children = registrar.getChildTargets(product);
assertTrue(children.isEmpty(), "Leaf node should have no children");
}

@Test
void testDuplicateAddTargetDoesNotCreateDuplicateChildren() {
registrar.addTarget(null, TargetType.BUNDLE, BUNDLE);
registrar.addTarget(BUNDLE, TargetType.FILE, PRODUCT_1);
// Add the same target again
registrar.addTarget(BUNDLE, TargetType.FILE, PRODUCT_1);

ValidationTarget bundleTarget = registrar.getRoot();
Collection<ValidationTarget> children = registrar.getChildTargets(bundleTarget);
assertEquals(1, children.size(),
"Duplicate addTarget should not create duplicate children");
}

@Test
void testTargetCountByType() {
registrar.addTarget(null, TargetType.BUNDLE, BUNDLE);
registrar.addTarget(BUNDLE, TargetType.COLLECTION, COLLECTION_A);
registrar.addTarget(BUNDLE, TargetType.COLLECTION, COLLECTION_B);
registrar.addTarget(COLLECTION_A, TargetType.FILE, PRODUCT_1);
registrar.addTarget(COLLECTION_A, TargetType.FILE, PRODUCT_2);
registrar.addTarget(COLLECTION_B, TargetType.FILE, PRODUCT_3);

assertEquals(1, registrar.getTargetCount(TargetType.BUNDLE));
assertEquals(2, registrar.getTargetCount(TargetType.COLLECTION));
assertEquals(3, registrar.getTargetCount(TargetType.FILE));
assertEquals(0, registrar.getTargetCount(TargetType.DIRECTORY));
}

@Test
void testDuplicateAddTargetDoesNotInflateCount() {
registrar.addTarget(null, TargetType.BUNDLE, BUNDLE);
registrar.addTarget(BUNDLE, TargetType.FILE, PRODUCT_1);
registrar.addTarget(BUNDLE, TargetType.FILE, PRODUCT_1); // duplicate

assertEquals(1, registrar.getTargetCount(TargetType.BUNDLE));
assertEquals(1, registrar.getTargetCount(TargetType.FILE),
"Duplicate addTarget should not inflate count");
}

@Test
void testLabelCount() {
registrar.addTarget(null, TargetType.BUNDLE, BUNDLE);
registrar.addTarget(BUNDLE, TargetType.FILE, PRODUCT_1);
registrar.addTarget(BUNDLE, TargetType.FILE, PRODUCT_2);

assertEquals(0, registrar.getLabelCount());

registrar.setTargetIsLabel(PRODUCT_1, true);
assertEquals(1, registrar.getLabelCount());

registrar.setTargetIsLabel(PRODUCT_2, true);
assertEquals(2, registrar.getLabelCount());

// Setting the same target as label again should not double-count
registrar.setTargetIsLabel(PRODUCT_1, true);
assertEquals(2, registrar.getLabelCount(),
"Re-setting same label should not increase count");
}
}