refactor: remove synchronized bottleneck from LabelValidator using ThreadLocal#3
Conversation
…readLocal Wrap per-thread mutable parsing state (cachedParser, cachedValidatorHandler, docBuilder, cachedSchematron, cachedLSResolver, validatingSchema, saxParserFactory, schemaFactory) in a ThreadLocal<ParserState> holder so each thread gets its own instances. This allows removing synchronized from parseAndValidate(), validate(ProblemHandler, URL), validate(ProblemHandler, File), and validate(ProblemHandler, URL, String). Also change filesProcessed to AtomicLong and totalTimeElapsed to DoubleAdder for lock-free concurrent updates, and update clear() to call threadState.remove(). Resolves NASA-PDS#1566 Co-Authored-By: jordan.h.padams <jordan.h.padams@jpl.nasa.gov>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
jordanpadams
left a comment
There was a problem hiding this comment.
Review: Remove synchronized Bottleneck from LabelValidator via ThreadLocal
The approach is sound — moving per-thread mutable state into ThreadLocal<ParserState> is the right pattern for this kind of singleton parser. However there are real concurrency bugs that must be fixed before this can safely land.
Bug (Must Fix): cachedLabelSchematrons HashMap Is Now a Data Race
loadLabelSchematrons() does containsKey + put on the shared cachedLabelSchematrons (a plain HashMap) from inside parseAndValidate(), which is no longer synchronized. When useLabelSchematron is true and multiple threads validate concurrently, this is an unsynchronized read-write to a shared mutable HashMap — a classic data race that can cause ConcurrentModificationException or silent corruption.
Fix: change cachedLabelSchematrons to a ConcurrentHashMap and replace the containsKey + put pattern with computeIfAbsent().
Bug (Must Fix): setCachedLSResourceResolver() Now Scoped to Calling Thread Only
public void setCachedLSResourceResolver(CachedLSResourceResolver resolver) {
this.threadState.get().cachedLSResolver = resolver; // only sets THIS thread's state
}Any caller that sets a custom resolver and then triggers validation from a different thread will have its resolver silently ignored. The other threads will use whatever resolver their ParserState was initialized with. This is a silent behavior change from the previous global assignment. Either document clearly that this method only affects the calling thread, or find a way to propagate it to all threads (e.g., store as a shared instance field that overrides the per-thread default in createParserIfNeeded).
Concern: Memory Leak in Long-Lived Thread Pools
threadState is an instance field, not static. When a LabelValidator is discarded but the threads that used it are kept alive (e.g., in an executor thread pool), those threads' ThreadLocal entries hold ParserState objects that will not be collected until the thread dies or threadState.remove() is called on that thread. Since clear() only removes the calling thread's state, threads that performed validation but never called clear() will leak ParserState for the lifetime of the thread.
This is especially relevant if LabelValidator instances are created and discarded per-validation-run while threads are reused. Consider adding a try-finally in parseAndValidate() that calls threadState.remove() after each label is processed, or document the lifecycle contract.
Concern: Shared Configuration Fields Read Without Synchronization
Fields like resolver, useLabelSchema, skipProductValidation, userSchematronTransformers, configurations, userSchemaFiles, userSchematronFiles, and cachedEntityResolver are all read inside parseAndValidate() without any synchronization. The PR assumes they are only written during single-threaded setup. This assumption must be verified and documented — if any caller writes to these fields while validation is running, the results are undefined.
Concern: LabelUtil Static Methods Called from Unsynchronized Context
LabelUtil.setLocation() and LabelUtil.registerIMVersion() are static methods called from the now-unsynchronized parseAndValidate(). Per the CLAUDE.md for this project, LabelUtil maintains static state. If these methods are not internally thread-safe, concurrent validation will corrupt that state. Please verify.
Minor: ParserState Constructor Wraps ParserConfigurationException in RuntimeException
} catch (ParserConfigurationException e) {
throw new RuntimeException("Failed to initialise per-thread ParserState", e);
}The old code threw ParserConfigurationException (checked), which callers could handle. This is now a RuntimeException that will crash the validation thread without an informative diagnostic if parser initialization fails. This is an acceptable tradeoff for ThreadLocal.withInitial, but it should be noted in the Javadoc.
Missing: Concurrent Integration Test
The existing test suite runs sequentially. The entire value of this change is multi-threaded throughput. Before relying on this in production, add a test that validates multiple labels from a thread pool concurrently and verifies both correctness and the absence of ConcurrentModificationException. Without this, the cachedLabelSchematrons race will not be caught by CI.
Summary
The ThreadLocal refactor is the right direction, but the cachedLabelSchematrons race and setCachedLSResourceResolver behavior change are real bugs that will cause failures in production use. Please address those two before merge.
- Change cachedLabelSchematrons to ConcurrentHashMap + computeIfAbsent() - Make CachedLSResourceResolver.handler ThreadLocal to fix setProblemHandler race - Add sharedCachedLSResolver field with propagation in createParserIfNeeded - Add AtomicLong generation counter so clear() invalidates all threads' ParserState - Initialize skipProductValidation in clear() - Add comprehensive thread-safety Javadoc (class-level + setCachedLSResourceResolver) - Document that LabelUtil static methods are internally synchronized - Add LabelValidatorConcurrencyTest with local schemas (no network I/O) Co-Authored-By: jordan.h.padams <jordan.h.padams@jpl.nasa.gov>
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
jordanpadams
left a comment
There was a problem hiding this comment.
Re-Review: Remove synchronized Bottleneck from LabelValidator via ThreadLocal
All six concurrency bugs from the initial review have been addressed. The generation counter approach for clear() is particularly well-designed. The concurrent test is a solid addition. A few remaining items below.
Remaining Bug: CachedLSResourceResolver.cachedEntities Is Still a Plain HashMap
This is explicitly flagged in the PR's own checklist, and it's now more urgent than before. With sharedCachedLSResolver in place, a single CachedLSResourceResolver instance is now explicitly shared across threads via ps.cachedLSResolver = sharedCachedLSResolver. When multiple threads call resolveResource() concurrently on a shared instance, they hit unsynchronized containsKey + put on the internal cachedEntities HashMap — a real data race:
// CachedLSResourceResolver.resolveResource() — called concurrently on shared instance:
if (!cachedEntities.containsKey(systemId)) { // ← unsynchronized read
...
cachedEntities.put(systemId, entity); // ← unsynchronized write
}This was pre-existing but was previously hidden behind synchronized on parseAndValidate. Now it is exposed. Change cachedEntities to ConcurrentHashMap and merge the containsKey + put into computeIfAbsent.
Minor Bug: Dead Code in loadLabelSchematrons Catch Block
After the computeIfAbsent refactor, TransformerException is always wrapped in a RuntimeException inside the lambda. The outer catch (TransformerException te) block is therefore unreachable — all TransformerException instances now arrive as RuntimeException via the RuntimeException re branch. This is dead code and should be removed to avoid confusion.
} catch (RuntimeException re) {
// Unwrap exceptions from computeIfAbsent lambda
...
} catch (TransformerException te) { // ← dead, never reached
...
}Minor: Concurrent Test Coverage Is Still Narrow
The new LabelValidatorConcurrencyTest runs with schematronCheck=false and skipProductValidation=true, so it exercises none of the schematron loading, loadLabelSchematrons, or CachedLSResourceResolver.resolveResource() code paths under concurrency. For production confidence, a follow-on test with schematronCheck=true against a local schematron file would be valuable — especially once cachedEntities is fixed.
Confirmed Fixed
cachedLabelSchematrons→ConcurrentHashMap+computeIfAbsent+volatile✓CachedLSResourceResolver.handler→ThreadLocal<ProblemHandler>✓setCachedLSResourceResolver()→volatile sharedCachedLSResolverpropagated increateParserIfNeeded✓clear()one-thread limitation →AtomicLong configGenerationcounter with stale-state detection inparseAndValidate✓skipProductValidationnot reset inclear()→ fixed ✓- Thread-safety contract documented in class-level Javadoc ✓
- Concurrent integration test added ✓
Summary
Fix CachedLSResourceResolver.cachedEntities (the one remaining real data race), remove the dead catch (TransformerException) block, and this is ready to merge. Good work addressing all the original issues.
- CachedLSResourceResolver.cachedEntities: HashMap -> ConcurrentHashMap + putIfAbsent - Remove dead catch (TransformerException) block in loadLabelSchematrons (now wrapped in RuntimeException via computeIfAbsent lambda) - Add concurrentParseAndValidate_withSchematron test exercising schematron loading and CachedLSResourceResolver under concurrent access Co-Authored-By: jordan.h.padams <jordan.h.padams@jpl.nasa.gov>
…el-validator' into devin/1775083777-threadlocal-label-validator Co-Authored-By: jordan.h.padams <jordan.h.padams@jpl.nasa.gov>
Remaining Issue 1:
|
|
Both issues are already fixed in the latest push (
Also added |
Summary
Removes the
synchronizedkeyword fromparseAndValidate()and allvalidate()overloads inLabelValidator, which was serializing all label validation in multi-threaded runs becauseLabelValidatoris used as a singleton viaValidationResourceManager.The per-call mutable parsing state (
cachedParser,cachedValidatorHandler,docBuilder,cachedSchematron,cachedLSResolver,validatingSchema, plus the non-thread-safeSAXParserFactoryandSchemaFactory) is moved into a private static inner classParserState, held in aThreadLocal<ParserState>. Each thread now gets its own isolated parser instances.filesProcessedis changed toAtomicLongandtotalTimeElapsedtoDoubleAdderfor lock-free concurrent accumulation.Resolves NASA-PDS#1566
Updates since last revision
Round 2 — Six concurrency bugs from initial review
cachedLabelSchematronsdata race — Changed fromHashMaptoConcurrentHashMap; replacedcontainsKey+putwithcomputeIfAbsent(). The field is nowvolatileso thatclear()'s replacement is visible to all threads.resolver.setProblemHandler()race —CachedLSResourceResolver.handleris now aThreadLocal<ProblemHandler>so concurrent callers don't overwrite each other's handler.setCachedLSResourceResolver()thread scoping — Reverted to a sharedvolatileinstance field (sharedCachedLSResolver) that is propagated to each thread'sParserStateduringcreateParserIfNeeded.clear()only resetting one thread — Added anAtomicLong configGenerationcounter.clear()increments it; eachParserStatestores the generation it was created for.parseAndValidate()checks for a mismatch and discards stale state.skipProductValidationnot initialized inclear()— AddedskipProductValidation = falseinclear().LabelUtil.setLocation()andregisterIMVersion()are internallysynchronized.LabelValidatorConcurrencyTestwith two tests:concurrentParseAndValidate_sameLabel(4 threads × 3 iterations) andclearInvalidatesAllThreads(verifies generation counter). Uses local schemas via XML catalog rewriting to avoid network I/O.Round 3 — Three remaining issues from re-review
CachedLSResourceResolver.cachedEntitiesdata race — Changed fromHashMaptoConcurrentHashMap; replacedcontainsKey/putwithget()+putIfAbsent()pattern. Multiple threads may redundantly fetch the same resource, but only one wins theputIfAbsent; no data corruption.catch (TransformerException)block — Removed unreachable catch inloadLabelSchematrons. After thecomputeIfAbsentlambda wraps checked exceptions inRuntimeException, the originalcatch (TransformerException te)was dead code. Thecatch (RuntimeException re)block now unwrapsTransformerExceptionfrom the cause and handles other runtime exceptions with a descriptive message.concurrentParseAndValidate_withSchematron()test (4 threads × 2 iterations) that enables label-schematron validation, exercisingloadLabelSchematrons/cachedLabelSchematronsconcurrentcomputeIfAbsentand the sharedCachedLSResourceResolverunder concurrent access.Review & Testing Checklist for Human
computeIfAbsentnull-safety inloadLabelSchematrons.ConcurrentHashMap.computeIfAbsent()throwsNullPointerExceptionif the mapping function returnsnull. IfschematronTransformer.fetch()can ever returnnull, this will throw at runtime instead of caching a missing entry. Verify thatfetch()always returns a non-nullStringon success, or add a null guard.RuntimeExceptionunwrapping inloadLabelSchematrons. Thecatch (RuntimeException re)block inspectsre.getCause()to distinguishTransformerExceptionfrom other failures. If a genuine (non-wrapping)RuntimeExceptionis thrown bycomputeIfAbsentinternals or byfetch()itself, the cause may benull, which the code handles but may produce a less informative error message. Worth a quick read of the catch block logic.ParserStateconstructor failure surface.ParserConfigurationExceptionis wrapped in an uncheckedRuntimeExceptionsinceThreadLocal.withInitialcannot throw checked exceptions. Verify this is acceptable for your error-handling strategy.findTestLabel()returnsnullwhengithub71/ELE_MOM.xmlis absent, and tests early-return without assertion failure. In CI this means the tests pass vacuously if the resource path changes. Consider whetherAssumptions.assumeTrue()orfail()is more appropriate.Suggested manual test plan: Run a real multi-threaded validation (e.g.,
validate -t /path/to/bundle/ --threads 4) against a bundle with schematron-validated labels and compare results (error/warning counts) against a single-threaded run to confirm identical output.Notes
pre.3.6.x.featureare unrelated to this change (count mismatches present onmain).SchematronTransformeris still shared across threads but appears stateless per-call (creates freshTransformerFactoryinstances internally), so this should be safe.resolver,useLabelSchema,configurations, etc.) are read without synchronization inparseAndValidate. The class-level Javadoc now documents the contract that these are setup-time only.Link to Devin session: https://nasa-jpl-demo.devinenterprise.com/sessions/27f58bf067b14671aade159c64df6ab0
Requested by: @jordanpadams