BIP Draft: Formosa — Themed mnemonic sentences for generating deterministic keys#2108
BIP Draft: Formosa — Themed mnemonic sentences for generating deterministic keys#2108Yuri-SVB wants to merge 11 commits intobitcoin:masterfrom
Conversation
Mnemonic *sentences* instead of words proposed as forwards- and backwards-compatible expansion to BIP39, itself as Bitcoin Improvement Proposal.
murchandamus
left a comment
There was a problem hiding this comment.
Hi Yuri, thank you for your submission. I see that your proposal was posted to the mailing list in 2023. Since then, we deployed BIP3 as a new BIP Process, so there are a few formatting changes that would be needed to the preamble. I would also suggest that you add a link to the prior discussion to the Discussion header.
At first glance, your document appears to be missing a Specification, a Rationale, and a Backwards Compability section. Please refer to BIP3 for more information.
|
Hi @Yuri-SVB, I haven’t given this document a full review yet, because the initial submission has some formatting issues. If you are still working on this, please update your submission to meet the formatting requirements. |
|
Hello, Murchandamus. Thank you for your attention, and thank you for remembering my earlier attempt from 3 years ago! |
Co-authored-by: Mark "Murch" Erhardt <murch@murch.one>
Satisfying requirement of title in fewer than 50 characters.
Hello, Murch! |
|
Hey Yuri, |
Hello, Murch. No problem. |
There was a problem hiding this comment.
This reads already pretty well, although the specification could be presented in a more technical manner. It seems a bit light on the Rationale. It would be preferable if there were a Backwards Compatibility section instead of the mention in the Abstract.
I think an example of a Formosa-encoded seed could help illustrate what you are trying to do, I was firmly expecting to see one until I got to the end.
Restructure the draft to follow BIP-3 conventions and resolve the issues raised by reviewers in bitcoin#2108: - Introduce explicit Specification section with a Terminology subsection that distinguishes 'word', 'category', 'theme', 'sentence' and 'mnemonic' / 'mnemonic story', removing the ambiguity of using 'sentence' at two different scales. - Replace the unclear 'if the category is led by another category' wording with an explicit LED_BY field description and a step-by-step algorithm that covers both the leaderless and led cases. - Reflow the theme-property list (previously a/b/c/d/e split by an intervening paragraph) into a single numbered list so it renders as a list rather than as code blocks. - Add a dedicated Rationale section covering the 33-bit sentence size, themed sentences, free-form theme schema, the LED_BY mechanism, the re-encoding-through-BIP-39 design, and why custom themes are discouraged. - Add a dedicated Backwards Compatibility section describing compatibility at the mnemonic, entropy, and seed levels. - Add a worked Example section showing a 128-bit entropy being encoded into a 4-sentence mnemonic story under a small illustrative theme, including bit splitting, FILLING_ORDER vs NATURAL_ORDER, and the LED_BY lookup. - Tighten the Abstract and Motivation; clarify that BIP-39 is itself a Formosa theme.
Reviewer on PR bitcoin#2108 asked for no abbreviations in table labels. Replace: - ENT / CS / S / MS column headers with 'Initial entropy bits', 'Checksum bits', 'Total bits', 'Number of sentences', 'Mnemonic words (6-word theme)' and 'Mnemonic words (BIP-0039)'. - 'List size / Bits / Chars to identify / Density (bits/char)' with 'Wordlist size / Bits per word / Characters to identify / Density (bits per character)'. - ADJ. with ADJECTIVE in the example bit-assignment diagram, and the surrounding narrative ENT/MS uses with the spelled-out forms. The accompanying formulas now use the expanded names too, so the algorithm description and the table column headers stay consistent.
Replace the previous hypothetical 5-category example with one that mirrors the medieval_fantasy theme actually shipped at https://github.com/Yuri-SVB/formosa/tree/master/src/mnemonic/themes, including: - the real 6 categories with their actual BIT_LENGTHs (VERB=5, SUBJECT=6, OBJECT=6, ADJECTIVE=5, WILDCARD=6, PLACE=5, summing to 33); - the real FILLING_ORDER and NATURAL_ORDER; - the real lead tree (VERB → SUBJECT; SUBJECT → OBJECT and WILDCARD; OBJECT → ADJECTIVE; WILDCARD → PLACE), showing that a single leader can have several dependent categories; - a 33-bit block whose decoded indices (28, 32, 63, 27, 46, 29) pick existing words and existing sub-list entries: VERB[28] =unveil, SUBJECT_under_unveil[32]=king, OBJECT_under_king[63] =wine, ADJECTIVE_under_wine[27]=sweet, WILDCARD_under_king[46] =queen, PLACE_under_queen[29]=throne_room, yielding the sentence 'king unveil sweet wine queen throne_room'. This keeps the worked example faithful to the reference implementation rather than to a fabricated theme, so that anyone can reproduce the encoding by parsing medieval_fantasy.json.
Add a paragraph to the LED_BY rationale clarifying that a Formosa theme behaves as a primitive language model (next-word predictor): each LED_BY relation skews the conditional distribution over the next word so that probability mass falls only on the 2^BIT_LENGTH words compatible with the already- chosen leader, and zero elsewhere. The theme designer plays the role of training data, hand-curating which combinations are semantically coherent. This framing makes explicit why themes produce sentences that 'sound right' while still covering all 2^33 bit patterns of a sentence.
…oncake) which builds on this property by rendering each Formosa category as an on-screen table whose rows and columns are permuted per input session. Combined with the randomized-indexation property, an attacker watching only the screen still learns nothing without also recovering the press sequence. Add a Rationale paragraph explaining a further benefit of splitting the vocabulary into several short wordlists (32-128 entries each): such tables fit on a mobile-device screen and admit input via on-screen lookup, which a single 2048-word list does not. The randomized indexation: - defeats pure key-logging (keystrokes alone don't reveal words; the attacker also needs the session permutation), - raises the bar for shoulder surfing (same as key-logging: only keys AND session's permutation suffice. Either alone is uniformative). This gives an operational, security-focused argument for the many-small-lists design that complements the existing memorization and information-density arguments. Formosa: document Mooncake's volume-key input on mobile Add a paragraph to the Mooncake rationale describing the proposed mobile input mechanism: reuse of the volume-up / volume-down keys as a two-button binary selector. Because every Formosa category is sized 2^BIT_LENGTH and the on-screen table is laid out in rows, sub-rows and columns whose counts are powers of two, narrowing to a single cell takes exactly BIT_LENGTH presses (5 for a 32-entry category, 6 for 64, 7 for 128). The per-category press count is invariant therefore uninformative, and equal to the bits of entropy encoded, and the 'one bit per press' bound matches the existing side-channel argument. Add three concrete reasons why volume-key input on mobile resists visual shoulder surfing better than an on-screen keyboard: - Subtler input motions: a single finger pressing a side rocker, much harder to read from a distance than multi-finger taps on a glass keyboard. - Easy occlusion with the second hand: both volume keys are on one edge of the device, so the free hand (or the holding hand's thumb) can cover them without obscuring the screen for the user. - Pocket input via headphone volume buttons: because the protocol is purely binary, headphone volume controls are sufficient, letting the user keep the buttons in a pocket while operating it by feel and removing the input motion from the observer's field of view entirely.
Fixed typo from "dektop" to "desktop" Fixed agreement of number from "Those of a mobile device" to "Those of mobile devices"
murchandamus
left a comment
There was a problem hiding this comment.
Good improvements, this reads great. I’m gonna look into a number assignment.
It would probably be good if some wallet developers that have worked with BIP39 reviewed it, too.
Substituted triple hyphen for — Co-authored-by: Murch <murch@murch.one>
Updated title to mention Formosa and be more self-explanatory. Co-authored-by: Murch <murch@murch.one>
Do you have someone in mind? Would you like me to invite a wallet develper? |
Mnemonic sentences instead of words proposed as forwards- and backwards-compatible expansion to BIP39, itself as Bitcoin Improvement Proposal.