Skip to content

Address PR review comments on schematron Templates cache#4

Merged
jordanpadams merged 7 commits intomainfrom
devin/1775084795-address-pr-review-comments
Apr 2, 2026
Merged

Address PR review comments on schematron Templates cache#4
jordanpadams merged 7 commits intomainfrom
devin/1775084795-address-pr-review-comments

Conversation

@devin-ai-integration
Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration bot commented Apr 1, 2026

Summary

Follow-up to the initial schematron Templates caching PR (#1572), addressing all five review comments from @jordanpadams plus four additional comments from the second review round.

Changes to SchematronTransformer.java:

No behavioral changes to validation output — same errors, same warnings, same SVRL.

Updates since last revision

  • Inline comment added documenting that handler is intentionally unused on cache hit (per reviewer's non-blocking observation)

Review & Testing Checklist for Human

  • ProblemHandler ignored on cache hit: The handler parameter is only used during compileSchematron() on a cache miss. On a hit, no error listener is set on the returned Transformer. An inline comment now documents this, but verify the behavior is acceptable for all callers — particularly fetch() which passes a handler through to transform(String, ProblemHandler).
  • ThreadLocal<MessageDigest> lifecycle: The ThreadLocal is a static field, so MessageDigest instances survive across SchematronTransformer instances on the same thread. Verify digest.reset() before each use is sufficient (it should be — digest() also resets internal state after returning).
  • @Tag("integration") not excluded by CI: Confirm the Maven Surefire/Failsafe configuration does not filter out integration-tagged tests. CI passed on Java 17 and 21, suggesting they do run — but worth verifying the surefire config explicitly.
  • End-to-end validation: Run mvn test -Dtest=SchematronTransformerTest to confirm the 5 tests pass locally, then run a representative bundle validation (e.g. a bundle with multiple labels sharing the same schematron) and diff the output against a pre-cache baseline to confirm identical results.

Notes

  • The diff against the merge base includes only SchematronTransformer.java and SchematronTransformerTest.java changes.
  • 38 pre-existing Cucumber test failures (in pre.3.6.x.feature) are unrelated to this change and were present before this PR.
  • CI passed on both Java 17 and Java 21.

Link to Devin session: https://nasa-jpl-demo.devinenterprise.com/sessions/e8943ca2e9de4879856766ebf367604c


Open with Devin

jordanpadams and others added 4 commits April 1, 2026 23:06
Intercept non-existent file targets in doValidation() before running the
validator. Records a MISSING_REFERENCED_FILE error (PRODUCT category) directly
to the report so the product shows FAIL, the error is counted in the summary,
and the exit code is non-zero.

Previously, LocationValidator recorded the error as NO_PRODUCTS_FOUND
(ProblemCategory.EXECUTION), which Report.record() explicitly excluded from
error counts, causing the product to show PASS with 0 errors in the summary.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add a HashMap<String, Templates> cache to SchematronTransformer so that
the expensive ISO schematron XSLT compilation is performed only once per
unique schematron source string. Subsequent calls to transform(String)
return a new Transformer from the cached Templates object.

- Extract compilation logic into private compileSchematron() returning Templates
- Add cache lookup in transform(String, ProblemHandler)
- Add clearCache() method (naturally reset when LabelValidator.clear() creates a new instance)
- Add debug logging for cache hits/misses

Fixes: NASA-PDS#1565
Co-Authored-By: jordan.h.padams <jordan.h.padams@jpl.nasa.gov>
1. Use ConcurrentHashMap instead of HashMap for thread-safety (NASA-PDS#1566, NASA-PDS#1567)
2. Key cache on SHA-256 hash of source string to reduce memory overhead
3. Make clearCache() package-private; add cacheSize() for testing
4. Document Source-path cache bypass design decision
5. Add SchematronTransformerTest with cache behavior assertions

Co-Authored-By: jordan.h.padams <jordan.h.padams@jpl.nasa.gov>
@devin-ai-integration
Copy link
Copy Markdown
Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Copy link
Copy Markdown
Author

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Open in Devin Review

Co-Authored-By: jordan.h.padams <jordan.h.padams@jpl.nasa.gov>
Copy link
Copy Markdown

@jordanpadams jordanpadams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall: Looks good — all 5 review comments addressed. @devin-ai-integration The implementation is solid and the test suite is a meaningful addition. A few notes below, but nothing blocking.


What's addressed well

  • ConcurrentHashMap — correct swap, no other changes needed.
  • clearCache() → package-private — right call. The lifecycle comment is helpful.
  • cacheSize() for test observability — clean pattern, avoids exposing the map itself.
  • Source-path comment — documents the design decision clearly at the right place in the code.
  • Test suite — 5 focused tests covering population, hit, distinctness, clear, and re-population. Each test is independent (fresh instance via @BeforeEach). Good.

Comments

1. SHA-256 computed with a new MessageDigest instance per call — should use ThreadLocal

MessageDigest.getInstance("SHA-256") acquires a provider lookup on every invocation. For bundles with thousands of labels this is called thousands of times. Fix with a ThreadLocal<MessageDigest>:

private static final ThreadLocal<MessageDigest> SHA256 = ThreadLocal.withInitial(() -> {
    try { return MessageDigest.getInstance("SHA-256"); }
    catch (NoSuchAlgorithmException e) { throw new RuntimeException(e); }
});

private static String sha256(String input) {
    MessageDigest digest = SHA256.get();
    digest.reset();
    byte[] hash = digest.digest(input.getBytes(StandardCharsets.UTF_8));
    ...
}

Minor given the schematron count is small in practice, but worth addressing.

2. String.format("%02x", b) in the hex loop — use HexFormat (Java 17+)

The current hex encoding calls String.format 32 times per hash. Since the project targets Java 17, HexFormat is available and cleaner:

return HexFormat.of().formatHex(digest.digest(input.getBytes(StandardCharsets.UTF_8)));

3. Document the intentional check-then-act race on ConcurrentHashMap

The PR description flags this correctly — computeIfAbsent is the wrong tool here because it holds a bucket lock during a long XSLT compilation. The double-compilation race is the right tradeoff (functionally correct, self-healing after warmup). But this should be documented in a code comment so future maintainers don't "fix" it:

// Intentional check-then-act: a concurrent miss may compile twice, but
// both results are equivalent and the race is self-healing after warmup.
// computeIfAbsent is avoided because it holds a lock during compilation.
Templates templates = cachedTemplates.get(key);

4. Tests require the full Saxon XSLT pipeline — add a note or @Tag

transform(MINIMAL_SCHEMATRON) runs the full multi-stage ISO schematron → XSLT compilation, which requires Saxon-HE on the classpath. These are effectively integration tests, will be slow on first run, and will fail without Saxon. Worth adding a class-level comment noting this dependency, or a @Tag("integration") annotation to allow selective exclusion.


Summary

Comment Severity
MessageDigest.getInstance per call — use ThreadLocal Minor
String.format hex loop — use HexFormat.of() Cosmetic
Document the intentional check-then-act race Recommended
Tests require Saxon on classpath — add note or @Tag Minor

1. ThreadLocal<MessageDigest> instead of per-call getInstance (avoids provider lookup)
2. HexFormat.of().formatHex() instead of String.format loop (Java 17+)
3. Document intentional check-then-act race on ConcurrentHashMap
4. @tag("integration") and class-level Javadoc noting Saxon-HE dependency on tests

Co-Authored-By: jordan.h.padams <jordan.h.padams@jpl.nasa.gov>
Copy link
Copy Markdown

@jordanpadams jordanpadams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall: LGTM — all review comments addressed. Approving. @devin-ai-integration Good work across all three rounds. The implementation is clean, correct, and well-tested.


What's been resolved since last review

  • ThreadLocal<MessageDigest> — provider lookup eliminated, digest.reset() called correctly before each use.
  • HexFormat.of().formatHex()String.format loop gone, Java 17 API used cleanly.
  • Check-then-act race documented — the inline comment above the cachedTemplates.get(key) call explains the intent clearly and will prevent a well-meaning future refactor to computeIfAbsent.
  • @Tag("integration") + class-level Javadoc — Saxon-HE dependency is now visible to anyone reading the test class.

One remaining observation (non-blocking)

The ProblemHandler is silently ignored on cache hits — the handler passed to transform(String, ProblemHandler) is only used during compileSchematron() on a miss. On a hit, no error listener is set on the returned Transformer. The PR description's checklist flags this correctly: since compilation already succeeded on the first call, there are no transform-time compilation errors to re-report. This is acceptable behavior, but worth a brief inline comment for clarity:

// handler is intentionally unused on cache hit — compilation errors
// would only occur during the first call (cache miss), not on reuse.
Templates templates = cachedTemplates.get(key);

Minor — does not need to block merge.


Summary

All four comments from the previous review have been addressed correctly. The SchematronTransformer is now thread-ready, memory-efficient, well-documented, and covered by a solid test suite. Ready to merge.

Co-Authored-By: jordan.h.padams <jordan.h.padams@jpl.nasa.gov>
@devin-ai-integration
Copy link
Copy Markdown
Author

Thanks for the thorough reviews across all three rounds! I've added the inline comment about handler being intentionally unused on cache hits (e84ab06). All CI checks pass on Java 17 and 21. Ready to merge whenever you are.

@jordanpadams jordanpadams merged commit 29999d0 into main Apr 2, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants