Open
Conversation
Contributor
dgigafox
commented
Apr 20, 2026
- Move ports dataset to pre-parsed ETF loaded at app start
- Add a CI check to see if the ETF is up-to-date based on the ports CSV source
- Other functions remain to be loaded during compile-time
Loading the ~116k-row UN/LOCODE port list at compile time forced dependents to spend seconds expanding struct literals into AST and serializing them into BEAM files on every fresh deps compile. Move the port list to a pre-parsed, compressed ETF shipped in priv/data/, loaded once into :persistent_term at application start (~79 ms). Cold compile of this library drops from ~7 s to ~0.7 s. The four small datasets (countries, functions, statuses, subdivisions) remain compile-time module attributes — they're tiny and don't impact compile time. Add mix ports.gen_etf task to regenerate priv/data/ports.etf from the source CSVs whenever they're updated.
CI now fails if priv/data/ports.etf drifts from the source CSVs, catching cases where a contributor updates a CSV without regenerating the ETF. The check compares decoded terms rather than raw bytes so it isn't sensitive to serialization determinism.
There was a problem hiding this comment.
Code Review
This pull request transitions the loading of port data from compile-time to runtime using :persistent_term. It introduces a new application module to handle data loading on startup, a Mix task to generate the binary ETF file from source CSVs, and updates the Ports.all/0 function to retrieve data from persistent storage. Feedback suggests improving error handling in the Mix task when the ETF file is missing and providing a more descriptive error message if Ports.all/0 is called before the application has initialized.
The ETF is a build artifact we produce, but decoding with the bare :erlang.binary_to_term can still exhaust the atom table or evaluate function/reference terms if the file is ever tampered with. non_executable_binary_to_term/2 layers a recursive rejection of function and reference terms on top of the :safe option, giving real hardening rather than silencing sobelow's Misc.BinToTerm check.
Addresses PR feedback on #85: - mix ports.gen_etf --check now reports a missing ETF with the same actionable "run mix ports.gen_etf" message instead of a generic File.Error. - Ports.all/0 raises a named error when the :ports application has not been started, instead of a bare ArgumentError from :persistent_term.get/1.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.