Address PR review comments on schematron Templates cache#4
Address PR review comments on schematron Templates cache#4jordanpadams merged 7 commits intomainfrom
Conversation
Intercept non-existent file targets in doValidation() before running the validator. Records a MISSING_REFERENCED_FILE error (PRODUCT category) directly to the report so the product shows FAIL, the error is counted in the summary, and the exit code is non-zero. Previously, LocationValidator recorded the error as NO_PRODUCTS_FOUND (ProblemCategory.EXECUTION), which Report.record() explicitly excluded from error counts, causing the product to show PASS with 0 errors in the summary. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add a HashMap<String, Templates> cache to SchematronTransformer so that the expensive ISO schematron XSLT compilation is performed only once per unique schematron source string. Subsequent calls to transform(String) return a new Transformer from the cached Templates object. - Extract compilation logic into private compileSchematron() returning Templates - Add cache lookup in transform(String, ProblemHandler) - Add clearCache() method (naturally reset when LabelValidator.clear() creates a new instance) - Add debug logging for cache hits/misses Fixes: NASA-PDS#1565 Co-Authored-By: jordan.h.padams <jordan.h.padams@jpl.nasa.gov>
1. Use ConcurrentHashMap instead of HashMap for thread-safety (NASA-PDS#1566, NASA-PDS#1567) 2. Key cache on SHA-256 hash of source string to reduce memory overhead 3. Make clearCache() package-private; add cacheSize() for testing 4. Document Source-path cache bypass design decision 5. Add SchematronTransformerTest with cache behavior assertions Co-Authored-By: jordan.h.padams <jordan.h.padams@jpl.nasa.gov>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
Co-Authored-By: jordan.h.padams <jordan.h.padams@jpl.nasa.gov>
jordanpadams
left a comment
There was a problem hiding this comment.
Overall: Looks good — all 5 review comments addressed. @devin-ai-integration The implementation is solid and the test suite is a meaningful addition. A few notes below, but nothing blocking.
What's addressed well
ConcurrentHashMap— correct swap, no other changes needed.clearCache()→ package-private — right call. The lifecycle comment is helpful.cacheSize()for test observability — clean pattern, avoids exposing the map itself.Source-path comment — documents the design decision clearly at the right place in the code.- Test suite — 5 focused tests covering population, hit, distinctness, clear, and re-population. Each test is independent (fresh instance via
@BeforeEach). Good.
Comments
1. SHA-256 computed with a new MessageDigest instance per call — should use ThreadLocal
MessageDigest.getInstance("SHA-256") acquires a provider lookup on every invocation. For bundles with thousands of labels this is called thousands of times. Fix with a ThreadLocal<MessageDigest>:
private static final ThreadLocal<MessageDigest> SHA256 = ThreadLocal.withInitial(() -> {
try { return MessageDigest.getInstance("SHA-256"); }
catch (NoSuchAlgorithmException e) { throw new RuntimeException(e); }
});
private static String sha256(String input) {
MessageDigest digest = SHA256.get();
digest.reset();
byte[] hash = digest.digest(input.getBytes(StandardCharsets.UTF_8));
...
}Minor given the schematron count is small in practice, but worth addressing.
2. String.format("%02x", b) in the hex loop — use HexFormat (Java 17+)
The current hex encoding calls String.format 32 times per hash. Since the project targets Java 17, HexFormat is available and cleaner:
return HexFormat.of().formatHex(digest.digest(input.getBytes(StandardCharsets.UTF_8)));3. Document the intentional check-then-act race on ConcurrentHashMap
The PR description flags this correctly — computeIfAbsent is the wrong tool here because it holds a bucket lock during a long XSLT compilation. The double-compilation race is the right tradeoff (functionally correct, self-healing after warmup). But this should be documented in a code comment so future maintainers don't "fix" it:
// Intentional check-then-act: a concurrent miss may compile twice, but
// both results are equivalent and the race is self-healing after warmup.
// computeIfAbsent is avoided because it holds a lock during compilation.
Templates templates = cachedTemplates.get(key);4. Tests require the full Saxon XSLT pipeline — add a note or @Tag
transform(MINIMAL_SCHEMATRON) runs the full multi-stage ISO schematron → XSLT compilation, which requires Saxon-HE on the classpath. These are effectively integration tests, will be slow on first run, and will fail without Saxon. Worth adding a class-level comment noting this dependency, or a @Tag("integration") annotation to allow selective exclusion.
Summary
| Comment | Severity |
|---|---|
MessageDigest.getInstance per call — use ThreadLocal |
Minor |
String.format hex loop — use HexFormat.of() |
Cosmetic |
| Document the intentional check-then-act race | Recommended |
Tests require Saxon on classpath — add note or @Tag |
Minor |
1. ThreadLocal<MessageDigest> instead of per-call getInstance (avoids provider lookup) 2. HexFormat.of().formatHex() instead of String.format loop (Java 17+) 3. Document intentional check-then-act race on ConcurrentHashMap 4. @tag("integration") and class-level Javadoc noting Saxon-HE dependency on tests Co-Authored-By: jordan.h.padams <jordan.h.padams@jpl.nasa.gov>
jordanpadams
left a comment
There was a problem hiding this comment.
Overall: LGTM — all review comments addressed. Approving. @devin-ai-integration Good work across all three rounds. The implementation is clean, correct, and well-tested.
What's been resolved since last review
- ✅
ThreadLocal<MessageDigest>— provider lookup eliminated,digest.reset()called correctly before each use. - ✅
HexFormat.of().formatHex()—String.formatloop gone, Java 17 API used cleanly. - ✅ Check-then-act race documented — the inline comment above the
cachedTemplates.get(key)call explains the intent clearly and will prevent a well-meaning future refactor tocomputeIfAbsent. - ✅
@Tag("integration")+ class-level Javadoc — Saxon-HE dependency is now visible to anyone reading the test class.
One remaining observation (non-blocking)
The ProblemHandler is silently ignored on cache hits — the handler passed to transform(String, ProblemHandler) is only used during compileSchematron() on a miss. On a hit, no error listener is set on the returned Transformer. The PR description's checklist flags this correctly: since compilation already succeeded on the first call, there are no transform-time compilation errors to re-report. This is acceptable behavior, but worth a brief inline comment for clarity:
// handler is intentionally unused on cache hit — compilation errors
// would only occur during the first call (cache miss), not on reuse.
Templates templates = cachedTemplates.get(key);Minor — does not need to block merge.
Summary
All four comments from the previous review have been addressed correctly. The SchematronTransformer is now thread-ready, memory-efficient, well-documented, and covered by a solid test suite. Ready to merge.
Co-Authored-By: jordan.h.padams <jordan.h.padams@jpl.nasa.gov>
|
Thanks for the thorough reviews across all three rounds! I've added the inline comment about |
Summary
Follow-up to the initial schematron
Templatescaching PR (#1572), addressing all five review comments from @jordanpadams plus four additional comments from the second review round.Changes to
SchematronTransformer.java:HashMap→ConcurrentHashMapfor forward-looking thread-safety (parallel validation work in As a PDS data engineer, I want label validation to run in parallel within a single JVM so that I can fully utilize available CPU cores without spawning separate processes NASA-PDS/validate#1566, As a PDS developer, I want static mutable utility singletons converted to instance-based classes so that parallel label validation becomes possible NASA-PDS/validate#1567)ThreadLocal<MessageDigest>to avoidgetInstance("SHA-256")provider lookup on every callHexFormat.of().formatHex()(Java 17+) replacing theString.formathex loopclearCache()made package-private with comment explaining lifecycle;cacheSize()added for test observabilitytransform(Source, …)bypass documenting thatLabelValidatoronly uses theString-based overloadcomputeIfAbsentavoided because it holds a bucket lock during long XSLT compilationhandlerunused on cache hit — compilation errors only occur on the first call (cache miss)SchematronTransformerTest(5 JUnit 5 tests,@Tag("integration")) asserting cache population, hit, distinct entries, clear, and re-population; class-level Javadoc notes Saxon-HE classpath dependencyNo behavioral changes to validation output — same errors, same warnings, same SVRL.
Updates since last revision
handleris intentionally unused on cache hit (per reviewer's non-blocking observation)Review & Testing Checklist for Human
ProblemHandlerignored on cache hit: Thehandlerparameter is only used duringcompileSchematron()on a cache miss. On a hit, no error listener is set on the returnedTransformer. An inline comment now documents this, but verify the behavior is acceptable for all callers — particularlyfetch()which passes ahandlerthrough totransform(String, ProblemHandler).ThreadLocal<MessageDigest>lifecycle: TheThreadLocalis a static field, soMessageDigestinstances survive acrossSchematronTransformerinstances on the same thread. Verifydigest.reset()before each use is sufficient (it should be —digest()also resets internal state after returning).@Tag("integration")not excluded by CI: Confirm the Maven Surefire/Failsafe configuration does not filter outintegration-tagged tests. CI passed on Java 17 and 21, suggesting they do run — but worth verifying the surefire config explicitly.mvn test -Dtest=SchematronTransformerTestto confirm the 5 tests pass locally, then run a representative bundle validation (e.g. a bundle with multiple labels sharing the same schematron) and diff the output against a pre-cache baseline to confirm identical results.Notes
SchematronTransformer.javaandSchematronTransformerTest.javachanges.pre.3.6.x.feature) are unrelated to this change and were present before this PR.Link to Devin session: https://nasa-jpl-demo.devinenterprise.com/sessions/e8943ca2e9de4879856766ebf367604c