Proposal for V2: Static Huffman as extremely efficient compression procedure #1926

dkriesel · 2026-03-05T09:16:47Z

dkriesel
Mar 5, 2026

In the general todos it is stated that lzw compression be introduced for meshcore payload. I did not find a discussion elsewhere on this, apologies if I missed it. Because I find LZW quite heavyweight, here is a very minimal approach

Proposal:

Static Huffman: Both sides share a hardcoded frequency table for message atoms (e.g., bytes, or utf-8 chars), built once from a general language corpus.
Aside from this, standard huffman procedure - each byte gets a variable-length bit code – frequent bytes (space, 'e', 'a') get 3-4 bits, rare ones get longer codes. (Info on huffman itself: https://en.wikipedia.org/wiki/Huffman_coding).
That's it. No state, no learning, no header.

Benefits / Comparison static huffman vs LZW

LZW starts empty. It builds its dictionary from the message itself. First few dozen bytes are emitted at 9+ bits per symbol with no compression at all. On a LoRa frame, it's practically still in warmup when the message ends.
Static Huffman compresses from byte one. The table is known, every byte immediately maps to its optimal code. No ramp-up. Also easy live view of message fill percentage possible.
LZW can expand short messages, kind of counterintuitive. The dictionary indices plus growing code width can make output larger than input, of course with higher probability on small messages (so, the meshcore use case).
Memory: LZW needs a growing hash table. Huffman decode is a fixed ~256-entry lookup table or a small tree. Better fit for ultra-low-end hardware.
LZW shines at 1KB+ where it finds repeated substrings. Chat messages don't have enough repetition in 50-200 bytes for LZW to ever pay off.

Expected compression on typical chat messages (roughly!): Huffman ~30%, LZW ~0-10% (sometimes negative). YMMW though, count this as rough voodoo speculation.

Future / Thoughts

The approach is extendable. Above, the simplest case was sketched: Very generic huffman lookup tables, in the most general case just on bytes (probably less efficient but content independent), or, probably with better practicability here, on character level with a cross-language character frequency table for utf-8 text messages.

We could however also add different freq tables for different language clusters (vs languages, because there is clusters of languages that are quite compatible to each other in char frequencies, e.g., western) even other kinds of data, different hardcoded frequency tables; like code pages. I imagine a specific "western" code page vs. the completely generic. We need to be aware however that any new code page is a new message type, and such codepages are carved in stone after creation, so some cautuiousness is required.

Challenge

Backwards compatibility to uncompressed messages.

Note: I accidentally posted as issue before; sorry. After posting the idea here I will close the issue.

gjelsoe · 2026-03-06T11:54:01Z

gjelsoe
Mar 6, 2026

I did play around with using Unishox2 and used compression when it made sense. The other issue is as a user pointed out when encrypting it happing in block of 16 bytes so we wont be saving that much. At best we can send longer messages within the limit we have now.

According to AI, our best option would be changing encryption method from AES to something like Ascon-128a and some afford has been done on it here #1450 before compression gives any usfull results.

0 replies

gjelsoe · 2026-03-08T13:53:10Z

gjelsoe
Mar 8, 2026

I've added the work I did here #1959

0 replies

robekl · 2026-03-09T05:02:42Z

robekl
Mar 9, 2026

I think compression of the text can be very beneficial. I don't know much about the details of the compression tech being considered, but I asked the AI about some things and it had a lot to say.

LZW Message Compression Roadmap Item

The roadmap includes Core + Apps: support for LZW message compression. The roadmap itself does not include a detailed specification, so this document interprets that item in terms of likely intent, implementation shape, risks, and a phased delivery plan.

Intent

The likely intent is straightforward:

reduce LoRa airtime for chat payloads
fit longer text into the current packet budget
reduce fragmentation pressure if multipart text is added later
improve delivery probability under congestion by shrinking packets

Why this fits MeshCore:

text payloads are small but airtime-expensive on LoRa
current packet payload is capped at 184 bytes in v1 (MAX_PACKET_PAYLOAD), defined in src/MeshCore.h
text messages are carried as encrypted payloads under PAYLOAD_TYPE_TXT_MSG and PAYLOAD_TYPE_GRP_TXT, described in docs/payloads.md and docs/packet_format.md
user-facing text is one of the few payload classes where compression is both useful and semantically safe

Roadmap Interpretation

A reasonable interpretation of this roadmap item is:

compression applies to message text, not all packet types
compression happens before encryption and after message framing
decompression happens after decryption and before UI or app handling
support must exist in both firmware and companion apps

That suggests the initial scope is probably:

direct text messages
possibly group text messages later
possibly signed text messages later
probably not CLI payloads in the first iteration
not binary datagrams, adverts, ACKs, control packets, or request/response payloads in the first iteration

Current Protocol Constraints

Relevant constraints in the current implementation:

Packet payload budget is hard-capped: src/MeshCore.h
Payload version bits exist in the header, but v2+ are still reserved: src/Packet.h
Plain text messages currently use TXT_TYPE_PLAIN, TXT_TYPE_CLI_DATA, and TXT_TYPE_SIGNED_PLAIN: src/helpers/TxtDataHelpers.h
Message handling in src/helpers/BaseChatMesh.cpp assumes decrypted text is directly null-terminated C string data
MAX_TEXT_LEN is already constrained around cipher block size and packet budget: src/helpers/BaseChatMesh.h

So compression is not just a local optimization. It changes:

message encoding format
receive-side parsing
app/firmware compatibility negotiation
bounds checking and fallback behavior
possibly ACK hash semantics if hashes depend on message bytes

Uncertainties

The roadmap item leaves several important questions open.

Compression scope

only TXT_MSG?
also GRP_TXT?
also signed text?
also app-side synced message storage/history?

Negotiation model

must compression only be used if both endpoints support it?
can group messages use compression if some listeners do not support it?
does a repeater need awareness, or can it stay payload-agnostic?

Framing choice

new txt_type value inside existing text payload
new payload version
new payload type
feature bit in companion protocol or contact capabilities

LZW variant

fixed-width codes?
dictionary reset rules?
max dictionary size?
byte-packing format?
deterministic decoder behavior across firmware and apps?

Value threshold

many short chat messages will not compress well
compressed output may be larger than input
decision rule needs to be explicit

Backward compatibility

what happens if an older app or firmware receives compressed text?
should sender fall back automatically?
can mixed-version meshes coexist without operator confusion?

Where The Complexity Is

The hard part is not the compressor itself. The hard parts are protocol and compatibility.

Wire compatibility
Current parsing assumes decrypted text payloads look like:

timestamp (4 bytes)
txt_type + attempt (1 byte)
remaining bytes are text

That logic is in src/helpers/BaseChatMesh.cpp. If compressed text is introduced, receive code can no longer assume &data[5] is printable text.

App interoperability
The roadmap explicitly says Core + Apps. If firmware sends compressed messages but companion libraries and clients do not decode them, the feature is not usable.
Embedded implementation constraints
MeshCore is optimized for low overhead:

no heap allocation during runtime is preferred
packet buffers are fixed-size
CPU/RAM budgets vary across boards

A naive LZW implementation can still be too heavy if:

dictionary is large
decompression uses dynamic structures
code packing is branch-heavy or RAM-heavy

Small-message economics
LoRa chat messages are often short. Compression only helps when:

the text is long enough
repetitive enough
framing overhead does not erase savings

For many short messages, compression is not worth it.

Group compatibility
For direct messages, capability negotiation can be per-contact.
For group messages, compatibility is harder:

a channel may include mixed app versions
compression must either be channel-configured, sender-conditional, or forbidden unless all listeners are known-capable

Signed messages
TXT_TYPE_SIGNED_PLAIN currently includes sender prefix plus text. If compression is added, the signed content must be defined carefully:

sign original plaintext?
sign compressed bytes?
display canonical decompressed text?

That needs a precise rule.

LZW-Specific Concerns

LZW is plausible, but it is not obviously the only or best option.

Pros:

classic dictionary compressor
no external model needed
deterministic
can work with repetitive natural language

Cons:

more implementation detail than simpler encodings
code packing matters
not ideal for very short strings
dictionary reset rules can get messy
UTF-8 handling must be explicit

Important detail:

LZW should operate on bytes, not characters
that is fine for UTF-8 if both sides treat the stream as opaque bytes
compression ratio for short multilingual text may still be mediocre

Alternatives

These are worth considering before locking on LZW.

No compression, just multipart text
Pros:

simpler
robust
keeps protocol semantics clean
Cons:
does not reduce airtime
only solves message length, not congestion

Static phrase or token compression
Pros:

tiny decoder
very predictable
Cons:
language-specific
brittle
poor generality

Heatshrink-style LZSS
Pros:

designed for embedded systems
small decoder footprint
simpler bounded memory story than full LZW in many cases
Cons:
different tradeoffs than the roadmap wording
still requires protocol and app support

Deflate or miniz-style compression
Pros:

standard
good compression
Cons:
too heavy for this kind of firmware target

Adaptive “compress only larger text, otherwise raw”
This is not an algorithm alternative, but it is likely necessary regardless of algorithm.

A practical interpretation of the roadmap item is that the system may still choose raw text for many messages, even after compression support exists.

Extensibility

The clean way to make this extensible is to separate:

payload type
payload version
text encoding mode

A good long-term shape would be:

keep PAYLOAD_TYPE_TXT_MSG and PAYLOAD_TYPE_GRP_TXT
define a small encoding field inside decrypted text payload
reserve values like:
- 0: raw UTF-8
- 1: LZW
- 2: future LZSS or heatshrink
- 3: reserved

That avoids spending whole payload types on encoding variants.

If broader packet-version work lands later, payload versioning could absorb this. For incremental delivery, a text-level encoding field is simpler.

Recommended Design

A narrow first phase is the safest interpretation of the roadmap item:

direct text only
group text optional later
no CLI compression initially
no signed-text compression initially
sender compresses only when receiver capability is known
sender compresses only if output is materially smaller

Concrete framing idea:

keep existing packet type
define a new text encoding mode or text subtype
compressed body includes:
- timestamp 4 bytes
- txt_type + attempt 1 byte
- encoding 1 byte, or fold encoding into txt_type
- orig_len 1 byte or 2 bytes
- compressed bytes...

Using a separate encoding byte is more extensible. Using a new txt_type is simpler to implement in the short term.

Phased Implementation Plan

Phase 0: Protocol Decision

Output:

short spec doc for compressed text framing
compatibility matrix
exact fallback rules

Decisions required:

which payloads are in scope
exact wire format
exact LZW variant
compression threshold
capability negotiation method

Suggested decisions:

scope: direct text only
negotiation: per-contact capability
threshold: compress only if saves at least 8 bytes
fallback: raw text if unsupported or not smaller
signed/group/CLI: deferred

Phase 1: Codec Library In Firmware

Add:

src/helpers/Compression.h/.cpp or similar
fixed-memory LZW encode/decode over byte arrays
no heap
hard output/input bounds
deterministic error codes

Requirements:

encoder returns “not worth compressing”
decoder rejects malformed streams
bounded worst-case runtime
tests for round-trip and malformed input

This phase should be standalone and not yet wired into packets.

Phase 2: Direct-Message Send And Receive Support

Modify:

text composition path in src/helpers/BaseChatMesh.h and src/helpers/BaseChatMesh.cpp
message parsing path in src/helpers/BaseChatMesh.cpp

Behavior:

sender decides raw vs compressed
receiver detects encoding and inflates before calling onMessageRecv
ACK hash rule stays based on canonical plaintext, not compressed bytes

That keeps ACK semantics stable even if compression choice changes.

Phase 3: Capability Negotiation

Need a way to know whether the remote endpoint supports compressed text.

Options:

companion protocol version flag during app-device handshake
feature bit on advert appdata
contact capability bit learned from direct interaction
manual config override for testing

Recommended:

app/device handshake for companion links
feature bit in advert or contact metadata for peer capability across the mesh

This phase turns the feature from experimental to usable.

Phase 4: App Support

Update:

mobile apps
web app
meshcore.js
meshcore_py

Requirements:

decode compressed text
encode compressed text only when peer or device supports it
preserve history and export semantics as plaintext at the UI boundary

Phase 5: Optional Group Text Support

Only after compatibility strategy is proven.

Additional rules needed:

when is a channel considered compression-safe?
if unknown recipients exist, does sender force raw?
is compression channel-configurable?

A conservative approach is:

do not auto-enable for groups initially
require explicit channel capability or sender setting

Phase 6: Optional Signed Text And Multipart

After the base path is stable:

define compressed signed text semantics
optionally combine with multipart for long messages
revisit whether broader payload compression is worth it

Suggested Wire-Format Sketch

One conservative approach inside decrypted text body:

For raw text:

[timestamp:4][flags:1][utf8 bytes...]

For compressed text:

[timestamp:4][flags:1][orig_len:1][lzw bytes...]

Where:

flags >> 2 still identifies message class
a new class means “compressed plain text”
lower 2 bits still carry attempt number

That is easy to wire into current parsing.

If extensibility matters more, use:

[timestamp:4][flags:1][encoding:1][orig_len:1][data...]

Then:

TXT_TYPE_PLAIN can remain semantic type
encoding=0 raw
encoding=1 lzw

This is cleaner, but it changes parsing more broadly.

Decision Rules Worth Adopting

only compress direct plain-text messages in phase 1
never compress CLI payloads in phase 1
do not compress if compressed size is not at least 8 bytes smaller
decompress into fixed stack or static buffer capped by current text limits
ACK based on decompressed canonical plaintext
sender falls back to raw automatically if remote capability is unknown

Main Risks

mixed-version ecosystem causes unreadable messages
compression overhead adds code complexity without enough airtime savings
short-message workloads see little benefit
malformed compressed payload handling becomes a new attack or failure surface
signed-message semantics become inconsistent if deferred poorly

Bottom Line

This roadmap item makes sense, but only if treated as a protocol compatibility project, not just a firmware optimization. The actual complexity is in:

negotiation
wire format
app rollout
bounded embedded decoding

A sensible interpretation of the roadmap is to deliver this in phases, starting with direct plain-text messages and explicit fallback behavior, then expanding only after interoperability is proven.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal for V2: Static Huffman as extremely efficient compression procedure #1926

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Proposal for V2: Static Huffman as extremely efficient compression procedure #1926

Uh oh!

dkriesel Mar 5, 2026

Proposal:

Benefits / Comparison static huffman vs LZW

Future / Thoughts

Challenge

Replies: 3 comments

Uh oh!

Uh oh!

gjelsoe Mar 6, 2026

Uh oh!

gjelsoe Mar 8, 2026

Uh oh!

robekl Mar 9, 2026

LZW Message Compression Roadmap Item

Intent

Roadmap Interpretation

Current Protocol Constraints

Uncertainties

Where The Complexity Is

LZW-Specific Concerns

Alternatives

Extensibility

Recommended Design

Phased Implementation Plan

Phase 0: Protocol Decision

Phase 1: Codec Library In Firmware

Phase 2: Direct-Message Send And Receive Support

Phase 3: Capability Negotiation

Phase 4: App Support

Phase 5: Optional Group Text Support

Phase 6: Optional Signed Text And Multipart

Suggested Wire-Format Sketch

Decision Rules Worth Adopting

Main Risks

Bottom Line

dkriesel
Mar 5, 2026

gjelsoe
Mar 6, 2026

gjelsoe
Mar 8, 2026

robekl
Mar 9, 2026