Unishox2 Text Compression for MeshCore #1959
gjelsoe
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Unishox2 Text Compression for MeshCore
Feature proposal —
USE_LZWMemory Footprint
Tested on Heltec V3 running Companion Radio BLE firmware v1.14.0:
The overhead is minimal — 384 bytes of additional RAM and less than 9 KB of additional Flash. The Flash increase covers the Unishox2 library and the compression/decompression code. The RAM increase is primarily the stack buffers used during compression and decompression.
Overview
This proposal adds optional Unishox2 text compression to MeshCore's private messaging layer. When both the sender and recipient are running compression-capable firmware, messages are compressed before encryption, reducing payload size by 25–40% for typical natural language messages. Nodes without compression support are fully unaffected — the feature is strictly opt-in and backward compatible.
Why Unishox2
Standard compression algorithms such as LZW or DEFLATE require hundreds of bytes of input before producing any meaningful reduction in size. MeshCore messages are capped at 160 bytes, making these algorithms unsuitable. Unishox2 is specifically designed for short Unicode strings and delivers positive compression ratios starting from around 40 characters. It handles ASCII, accented characters, and emoji (multi-byte UTF-8) correctly, and its C implementation (
unishox2.c/unishox2.h) is small enough to run comfortably on ESP32, nRF52, RP2040, and STM32 targets.Measured Compression Results
Example results observed during development:
Messages below 40 characters, or messages with high entropy content such as base64 strings or coordinates, are sent uncompressed automatically.
Implementation
Build flag
The entire feature is gated behind a single build flag. No code paths are affected in builds without it:
build_flags = -DUSE_LZWNew files
src/helpers/MeshCompression.h/.cppWrapper around Unishox2 with compression heuristics. The heuristics reject messages that are too short, have high character entropy, or a low ratio of alphabetic characters. Multi-byte UTF-8 sequences (emoji etc.) are handled correctly by counting Unicode codepoints rather than raw bytes.
lib/unishox2/unishox2.c/unishox2.hThe Unishox2 library by Siara Creations (Apache 2.0). Placed in the
/libdirectory so PlatformIO compiles it automatically without any changes tobuild_src_filter. Source: https://github.com/siara-cc/Unishox2Modified files
src/helpers/TxtDataHelpers.hAdds
TXT_FLAG_COMPRESSED(0x80) — bit 7 of payloadbyte[4]. This bit is currently unused by the existing attempt number andTXT_TYPEfields.src/helpers/AdvertDataHelpers.hAdds
ADV_CAP_LZW(0x0001) advertised inFEAT2.FEAT2is currently unused.src/helpers/BaseChatMesh.cppFour small
#ifdef USE_LZWblocks are added:createSelfAdvert()— advertisesADV_CAP_LZWinFEAT2so other nodes can detect capability.populateContactFromAdvert()andonAdvertRecv()update block — sets or clearsCONTACT_FLAG_SUPPORTS_LZWon a contact whenever an advertisement is received, ensuring the flag stays current as nodes update firmware.composeMsgPacket()— compresses the message if the recipient hasCONTACT_FLAG_SUPPORTS_LZWset. A 2-byte length prefix is prepended to the compressed data so the receiver can determine the exact compressed length independently of AES block padding.onPeerDataRecv()— decompresses the message ifTXT_FLAG_COMPRESSEDis set inbyte[4]. The ACK hash is calculated on the raw compressed payload using the 2-byte length prefix, matching the sender's hash exactly.Payload Format
Uncompressed message (unchanged from original):
Compressed message:
The 2-byte length prefix is necessary because AES-128 pads the encrypted payload to the nearest 16-byte block boundary. Without the prefix, the receiver cannot determine where the compressed data ends and padding begins, causing the ACK hash to mismatch.
Capability Negotiation
Nodes announce compression support by setting
ADV_CAP_LZW(0x0001) in theFEAT2field of their advertisement packet. When a node receives an advertisement from a peer, it sets or clearsCONTACT_FLAG_SUPPORTS_LZW(bit 4 ofContactInfo::flags) accordingly. A node will only send compressed messages to contacts that have this flag set.Compression activates automatically after both nodes have exchanged advertisements — no manual configuration is required. If a node downgrades to non-LZW firmware, its advertisement will no longer carry
ADV_CAP_LZW, and the flag will be cleared on the peer at the next advertisement.Backward Compatibility
Nodes without
USE_LZWare completely unaffected. A compression-capable node will never send compressed data to a node that has not advertisedADV_CAP_LZW. Messages from non-LZW nodes are received and ACK'd normally by LZW-capable nodes. Mixed networks of LZW and non-LZW nodes operate without any issues.Note on ContactInfo::flags Bit Allocation
ContactInfo::flagsis auint8_tmanaged directly by the companion app viaCMD_ADD_UPDATE_CONTACTover the serial protocol — it is not derived fromFEAT1orFEAT2in the advertisement packet. The companion app uses the following bits:FEAT1andFEAT2in the advertisement packet are separate fields used only as a transport mechanism — their values are not stored directly inContactInfo::flags. This proposal usesFEAT2to advertise LZW capability and stores the result in bit 4 ofContactInfo::flags, which is currently unused by the companion app.Future Work
If a stream cipher such as ChaCha20-Poly1305 or Ascon is adopted in a future MeshCore release, the 2-byte length prefix will no longer be necessary since stream ciphers produce no block padding. The compression layer itself requires no changes — it remains independent of the encryption layer. Combining Unishox2 compression with a stream cipher would yield consistent 25–40% airtime savings on every message over 40 characters.
The following table compares encrypted payload sizes across four configurations. Stream cipher overhead is 8 bytes (4-byte counter + 4-byte tag as proposed in PR #1450 and PR #1677). AES overhead is 2 bytes HMAC plus block padding to the nearest 16 bytes.
The combination of Stream + LZW consistently outperforms AES alone and eliminates the unpredictability caused by AES block padding.
Two open PRs are strong candidates for this combination:
Both eliminate AES block padding and would make the compression savings fully predictable on every message.
Credits
This implementation was developed with assistance from Claude (Anthropic).
Source Code
The full implementation is available at:
https://github.com/gjelsoe/MeshCore/tree/LZW-Messages
Beta Was this translation helpful? Give feedback.
All reactions