mirror of https://github.com/zcash/zips.git
471 lines
20 KiB
ReStructuredText
471 lines
20 KiB
ReStructuredText
::
|
|
|
|
ZIP: 173
|
|
Title: Bech32 Format
|
|
Author: Daira Hopwood <daira@electriccoin.co>
|
|
Credits: Pieter Wuille <pieter.wuille@gmail.com>
|
|
Greg Maxwell <greg@xiph.org>
|
|
Rusty Russell
|
|
Mark Friedenbach
|
|
Status: Final
|
|
Category: Standards / Wallet
|
|
Created: 2018-06-13
|
|
License: MIT
|
|
|
|
|
|
Terminology
|
|
===========
|
|
|
|
The key words "MUST", "MUST NOT", "SHOULD", and "SHOULD NOT" in this document are
|
|
to be interpreted as described in RFC 2119. [#RFC2119]_
|
|
|
|
The term "network upgrade" in this document is to be interpreted as described in
|
|
ZIP 200. [#zip-0200]_
|
|
|
|
The term "Sapling" in this document is to be interpreted as described in ZIP 205.
|
|
[#zip-0205]_
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
This document proposes a checksummed base32 format, "Bech32", and a standard for
|
|
Sapling addresses and keys using it.
|
|
|
|
|
|
Motivation
|
|
==========
|
|
|
|
Since launch, Zcash has relied on Base58 addresses with a truncated double-SHA256
|
|
checksum, inherited from Bitcoin. They were part of the original Bitcoin software
|
|
and their scope was extended in BIP 13 [#bip-0013]_ for P2SH (Pay-to-Script-Hash)
|
|
addresses. However, both the character set and the checksum algorithm have
|
|
limitations:
|
|
|
|
* Base58 needs a lot of space in QR codes, as it cannot use the
|
|
''alphanumeric mode''.
|
|
|
|
* The mixed case in Base58 makes it inconvenient to reliably write down, type on
|
|
mobile keyboards, or read out loud.
|
|
|
|
* The double-SHA256 checksum is slow and has no error-detection guarantees.
|
|
|
|
* Most of the research on error-detecting codes only applies to character-set
|
|
sizes that are a `prime power <https://en.wikipedia.org/wiki/Prime_power>`_,
|
|
which 58 is not.
|
|
|
|
* Base58 decoding is complicated and relatively slow.
|
|
|
|
To address these issues, Bitcoin adopted a new encoding called Bech32 [#bip-0173]_,
|
|
for use with address types added by its Segregated Witness proposal. Zcash does
|
|
not implement Segregated Witness, but it reuses Bech32 with address and key types
|
|
introduced by the Sapling network upgrade [#zip-0205]_.
|
|
|
|
Since the description of Bech32 in [#bip-0173]_ is partially entangled with
|
|
Segregated Witness address formats, we have found it clearer to write this ZIP to
|
|
specify Bech32 separately. This specification should be read in conjunction with
|
|
section 5.6 ("Encodings of Addresses and Keys") of the Zcash Sapling protocol
|
|
specification [#protocol]_, and with ZIP 32 which specifies additional key types
|
|
for support of shielded hierarchical deterministic wallets [#zip-0032]_.
|
|
|
|
|
|
Specification
|
|
=============
|
|
|
|
We describe the general checksummed base32 format called ''Bech32''. Its use for
|
|
Sapling payment addresses and keys is defined in the Sapling protocol specification
|
|
[#protocol]_.
|
|
|
|
A Bech32 string consists of:
|
|
|
|
* The *human-readable part*, which is intended to convey the type of data, or
|
|
anything else that is relevant to the reader. This part MUST contain 1 to 83
|
|
US-ASCII characters, with each character having a value in the range [33-126].
|
|
HRP validity may be further restricted by specific applications.
|
|
|
|
* The *separator*, which is always "1". In case "1" is allowed inside the
|
|
human-readable part, the last one in the string is the separator.
|
|
|
|
* The *data part*, which is at least 6 characters long and only consists of
|
|
alphanumeric characters excluding "1", "b", "i", and "o".
|
|
|
|
+-----+---+---+---+---+---+---+---+---+
|
|
| | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|
|
+=====+===+===+===+===+===+===+===+===+
|
|
| +0 | q | p | z | r | y | 9 | x | 8 |
|
|
+-----+---+---+---+---+---+---+---+---+
|
|
| +8 | g | f | 2 | t | v | d | w | 0 |
|
|
+-----+---+---+---+---+---+---+---+---+
|
|
| +16 | s | 3 | j | n | 5 | 4 | k | h |
|
|
+-----+---+---+---+---+---+---+---+---+
|
|
| +24 | c | e | 6 | m | u | a | 7 | l |
|
|
+-----+---+---+---+---+---+---+---+---+
|
|
|
|
|
|
Checksum
|
|
--------
|
|
|
|
The last six characters of the data part form a checksum and contain no information.
|
|
Valid strings MUST pass the criteria for validity specified by the Python3 code
|
|
snippet below. The checksum is defined so that the function ``bech32_verify_checksum``
|
|
returns true when its arguments are:
|
|
|
|
* ``hrp``: the human-readable part as a string;
|
|
* ``data``: the data part as a list of integers representing the characters after
|
|
conversion using the table above.
|
|
|
|
::
|
|
|
|
def bech32_polymod(values):
|
|
GEN = [0x3b6a57b2, 0x26508e6d, 0x1ea119fa, 0x3d4233dd, 0x2a1462b3]
|
|
chk = 1
|
|
for v in values:
|
|
b = (chk >> 25)
|
|
chk = (chk & 0x1ffffff) << 5 ^ v
|
|
for i in range(5):
|
|
chk ^= GEN[i] if ((b >> i) & 1) else 0
|
|
return chk
|
|
|
|
def bech32_hrp_expand(s):
|
|
return [ord(x) >> 5 for x in s] + [0] + [ord(x) & 31 for x in s]
|
|
|
|
def bech32_verify_checksum(hrp, data):
|
|
return bech32_polymod(bech32_hrp_expand(hrp) + data) == 1
|
|
|
|
|
|
This implements a `BCH code <https://en.wikipedia.org/wiki/BCH_code>`_;
|
|
in the case where the encoded string is at most 90 characters, this code
|
|
guarantees detection of *any error affecting at most 4 characters* and has
|
|
less than a 1 in 10\ :sup:`9` chance of failing to detect more errors.
|
|
More details about the properties can be found in the Checksum Design section.
|
|
The human-readable part is processed by first feeding the higher 3 bits of each
|
|
character's US-ASCII value into the checksum calculation followed by a zero
|
|
and then the lower 5 bits of each US-ASCII value.
|
|
|
|
To construct a valid checksum given the human-readable part and (non-checksum)
|
|
values of the data-part characters, the code below can be used:
|
|
|
|
::
|
|
|
|
def bech32_create_checksum(hrp, data):
|
|
values = bech32_hrp_expand(hrp) + data
|
|
polymod = bech32_polymod(values + [0,0,0,0,0,0]) ^ 1
|
|
return [(polymod >> 5 * (5 - i)) & 31 for i in range(6)]
|
|
|
|
|
|
Error correction
|
|
''''''''''''''''
|
|
|
|
One of the properties of these BCH codes is that they can be used for error
|
|
correction. An unfortunate side effect of error correction is that it erodes
|
|
error detection: correction changes invalid inputs into valid inputs, but if
|
|
more than a few errors were made then the valid input may not be the correct
|
|
input. Use of an incorrect but valid input can cause funds to be lost
|
|
irrecoverably. Because of this, implementations SHOULD NOT implement correction
|
|
beyond potentially suggesting to the user where in the string an error might be
|
|
found, without suggesting the correction to make.
|
|
|
|
Uppercase/lowercase
|
|
'''''''''''''''''''
|
|
|
|
The lowercase form is used when determining a character's value for checksum
|
|
purposes.
|
|
|
|
Encoders MUST always output an all-lowercase Bech32 string. If an uppercase
|
|
version of the encoding result is desired (e.g. for presentation purposes, or
|
|
QR code use), then an uppercasing procedure can be performed external to the
|
|
encoding process.
|
|
|
|
Decoders MUST NOT accept strings where some characters are uppercase and some
|
|
are lowercase (such strings are referred to as mixed-case strings).
|
|
|
|
For presentation, lowercase is usually preferable, but inside QR codes
|
|
uppercase SHOULD be used, as those permit the use of `alphanumeric mode
|
|
<https://www.thonky.com/qr-code-tutorial/alphanumeric-mode-encoding>`_,
|
|
which is 45% more compact than the `byte mode
|
|
<https://www.thonky.com/qr-code-tutorial/byte-mode-encoding>`_ that would
|
|
otherwise be used.
|
|
|
|
Encoding
|
|
''''''''
|
|
|
|
* Start with the bits of the raw encoding of the appropriate address or key
|
|
type, most significant bit per byte first.
|
|
|
|
* Re-arrange those bits into groups of 5, and pad with zeroes at the end if
|
|
needed.
|
|
|
|
* Translate those bits, most significant bit first, to characters using the
|
|
table above.
|
|
|
|
Decoding
|
|
''''''''
|
|
|
|
Software interpreting a Bech32-encoded address MUST first verify that the
|
|
human-readable part matches that of a specified address type, and similarly
|
|
for keys.
|
|
|
|
If this check passes, convert the rest of the data to bytes:
|
|
|
|
* Translate the values using the table above to 5 bits, most significant bit
|
|
first.
|
|
|
|
* Re-arrange those bits into groups of 8 bits. Any incomplete group at the
|
|
end MUST be 4 bits or fewer, MUST be all zeroes, and is discarded.
|
|
|
|
* The resulting groups are interpreted as the raw encoding of the appropriate
|
|
address or key type, with the most significant bit first in each byte.
|
|
|
|
Decoders SHOULD enforce known-length restrictions on address and key types.
|
|
|
|
For example, [#protocol]_ specifies that the length of the raw encoding of a
|
|
Sapling payment address is 43 bytes (88 + 256 bits). This results in a
|
|
Bech32-encoded Sapling payment address being 78 characters long.
|
|
|
|
|
|
Compatibility
|
|
=============
|
|
|
|
Only software supporting the Sapling network upgrade is able to use these
|
|
addresses.
|
|
|
|
There is no effect on support for transparent addresses and keys, or for Sprout
|
|
z-addresses and keys; these will continue to be encoded using Base58Check, and
|
|
MUST NOT be encoded using Bech32.
|
|
|
|
|
|
Rationale
|
|
=========
|
|
|
|
Why use base32 at all?
|
|
----------------------
|
|
|
|
The lack of mixed case makes it more efficient to read out loud or to put into
|
|
QR codes. It does come with a 15% length increase. However, the length of a
|
|
Bech32-encoded Sapling payment address remains below 80 characters, which
|
|
reduces the likelihood of line splitting in terminals or email. Thus,
|
|
cutting-and-pasting of addresses is still reliable.
|
|
|
|
Why call it Bech32?
|
|
-------------------
|
|
|
|
"Bech" contains the characters BCH (the error detection algorithm used) and
|
|
sounds a bit like "base". The term Bech32 is established for Bitcoin and there
|
|
was no reason to use a different name for it in Zcash Sapling.
|
|
|
|
Why not support Bech32 encoding of Sprout or transparent addresses?
|
|
-------------------------------------------------------------------
|
|
|
|
This was not considered to be sufficiently well-motivated given the
|
|
compatibility issues that would arise from having two formats for these
|
|
address types, with pre-Sapling software not supporting the new format.
|
|
|
|
Why include a separator in addresses?
|
|
-------------------------------------
|
|
|
|
That way the human-readable part is unambiguously separated from the data part,
|
|
avoiding potential collisions with other human-readable parts that share a
|
|
prefix. It also allows us to avoid having character-set restrictions on the
|
|
human-readable part. The separator is ''1'' because using a non-alphanumeric
|
|
character would complicate copy-pasting of addresses (with no double-click
|
|
selection in several applications). Therefore an alphanumeric character outside
|
|
the normal character set was chosen.
|
|
|
|
Why not use an existing base32 character set?
|
|
---------------------------------------------
|
|
|
|
Existing character sets for base-32 encodings include
|
|
`RFC 3548 <https://www.rfc-editor.org/rfc/rfc3548.html>`_ and
|
|
`z-base-32 <https://philzimmermann.com/docs/human-oriented-base-32-encoding.txt>`_.
|
|
The set used for Bech32 was chosen to minimize ambiguity according to
|
|
`this <https://hissa.nist.gov/~black/GTLD/>`_ visual similarity data, and the
|
|
ordering is chosen to minimize the number of pairs of similar characters
|
|
(according to the same data) that differ in more than 1 bit. As the checksum is
|
|
chosen to maximize detection capabilities for low numbers of bit errors, this
|
|
choice improves its performance under some error models.
|
|
|
|
Why are the high bits of the human-readable part processed first?
|
|
-----------------------------------------------------------------
|
|
|
|
This design choice had a rationale for Bitcoin Segregated Witness addresses
|
|
(see [#bip-0173]_) that does not apply to Zcash Sapling addresses. It is
|
|
retained for compatibility with Bech32 encoders/decoders written for Bitcoin.
|
|
|
|
|
|
Reference implementations
|
|
=========================
|
|
|
|
* The encoder/decoder used by ``zcashd``:
|
|
|
|
* `In C++ <https://github.com/zcash/zcash/blob/master/src/bech32.cpp>`_
|
|
|
|
* Encoders and decoders written for Bitcoin:
|
|
|
|
* `For C <https://github.com/sipa/bech32/tree/master/ref/c>`_
|
|
* `For C++ <https://github.com/sipa/bech32/tree/master/ref/c++>`_
|
|
* `For Javascript <https://github.com/sipa/bech32/tree/master/ref/javascript>`_
|
|
* `For Go <https://github.com/sipa/bech32/tree/master/ref/go>`_
|
|
* `For Python <https://github.com/sipa/bech32/tree/master/ref/python>`_
|
|
* `For Haskell <https://github.com/sipa/bech32/tree/master/ref/haskell>`_
|
|
* `For Ruby <https://github.com/sipa/bech32/tree/master/ref/ruby>`_
|
|
* `For Rust <https://github.com/sipa/bech32/tree/master/ref/rust>`_
|
|
|
|
* Fancy decoder written for Bitcoin that localizes errors:
|
|
|
|
* `Fancy decoder for Javascript <https://github.com/sipa/bech32/tree/master/ecc/javascript>`_
|
|
(`demo website <HTTP://bitcoin.sipa.be/bech32/demo/demo.html>`_)
|
|
|
|
Note that the encoders written for Bitcoin may make assumptions specific to
|
|
Segregated Witness address formats that do not apply to Zcash. Only the Python
|
|
one has been tested with Zcash at the time of writing.
|
|
|
|
|
|
Examples
|
|
========
|
|
|
|
TODO: add valid Sapling payment addresses and keys, and their corresponding raw encodings.
|
|
|
|
|
|
Test vectors
|
|
============
|
|
|
|
The following strings are valid Bech32:
|
|
|
|
* ``A12UEL5L``
|
|
* ``a12uel5l``
|
|
* ``an83characterlonghumanreadablepartthatcontainsthenumber1andtheexcludedcharactersbio1tt5tgs``
|
|
* ``abcdef1qpzry9x8gf2tvdw0s3jn54khce6mua7lmqqqxw``
|
|
* ``11qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqc8247j``
|
|
* ``split1checkupstagehandshakeupstreamerranterredcaperred2y9e3w``
|
|
* ``?1ezyfcl``
|
|
|
|
The following strings are not valid Bech32 (with reason for invalidity):
|
|
|
|
* 0x20 + ``1nwldj5``: HRP character out of range
|
|
* 0x7F + ``1axkwrx``: HRP character out of range
|
|
* 0x80 + ``1eym55h``: HRP character out of range
|
|
* ``pzry9x0s0muk``: No separator character
|
|
* ``1pzry9x0s0muk``: Empty HRP
|
|
* ``x1b4n0q5v``: Invalid data character
|
|
* ``li1dgmt3``: Too short checksum
|
|
* ``de1lg7wt`` + 0xFF: Invalid character in checksum
|
|
* ``A1G7SGD8``: checksum calculated with uppercase form of HRP
|
|
* ``10a06t8``: empty HRP
|
|
* ``1qzzfhee``: empty HRP
|
|
* ``bc1qw508d6qejxtdg4y5r3zarvary0c5xw7kv8f3t5``: Invalid checksum
|
|
* ``tb1qrp33g0q5c5txsp9arysrx4k6zdkfs4nce4xj0gdcccefvpysxf3q0sL5k7``: Mixed case
|
|
* ``bc1zw508d6qejxtdg4y5r3zarvaryvqyzf3du``: Zero padding of more than 4 bits
|
|
* ``tb1qrp33g0q5c5txsp9arysrx4k6zdkfs4nce4xj0gdcccefvpysxf3pjxtptv``: Non-zero padding in 8-to-5 conversion
|
|
|
|
|
|
Checksum design
|
|
===============
|
|
|
|
Design choices
|
|
--------------
|
|
|
|
BCH codes can be constructed over any prime-power alphabet and can be chosen to
|
|
have a good trade-off between size and error-detection capabilities. Unlike
|
|
`Reed-Solomon codes <https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction>`_,
|
|
they are not restricted in length to one less than the alphabet size. While they
|
|
also support efficient error correction, the implementation of just error detection
|
|
is very simple.
|
|
|
|
We pick 6 checksum characters as a trade-off between length of the addresses and
|
|
the error-detection capabilities, as 6 characters is the lowest number sufficient
|
|
for a random failure chance below 1 per billion. For the length of data we're
|
|
most interested in protecting (up to 77 bytes excluding the separator, for a
|
|
Sapling payment address), BCH codes can be constructed that guarantee detecting
|
|
up to 4 errors. Longer data is also supported with slightly weaker error detection.
|
|
|
|
Selected properties
|
|
'''''''''''''''''''
|
|
|
|
Many of these codes perform badly when dealing with more errors than they are
|
|
designed to detect, but not all. For that reason, we consider codes that are
|
|
designed to detect only 3 errors as well as 4 errors, and analyse how well they
|
|
perform in practice.
|
|
|
|
The specific code chosen here is the result of:
|
|
|
|
* Starting with an exhaustive list of 159605 BCH codes designed to detect 3 or 4
|
|
errors up to length 93, 151, 165, 341, 1023, and 1057.
|
|
|
|
* From those, requiring the detection of 4 errors up to length 71, resulting in
|
|
28825 remaining codes.
|
|
|
|
* From those, choosing the codes with the best worst-case window for 5-character
|
|
errors, resulting in 310 remaining codes.
|
|
|
|
* From those, picking the code with the lowest chance for not detecting small
|
|
numbers of ''bit'' errors.
|
|
|
|
As a naive search would require over 6.5 \* 10\ :sup:`19` checksum evaluations,
|
|
a collision-search approach was used for analysis. The code can be found
|
|
`here <https://github.com/sipa/ezbase32/>`_.
|
|
|
|
Properties
|
|
''''''''''
|
|
|
|
The following table summarizes the chances for detection failure (as multiples of
|
|
1 in 10\ :sup:`9`).
|
|
|
|
+-------------------------------------+-----------------------------------------------+
|
|
| Window length | Number of wrong characters |
|
|
+--------+----------------------------+-------+-------+-------+-------+-------+-------+
|
|
| Length | Description | ≤4 | 5 | 6 | 7 | 8 | ≥9 |
|
|
+--------+----------------------------+-------+-------+-------+-------+-------+-------+
|
|
| 8 | Longest detecting 6 errors | 0 | 1.127 | 0.909 | n/a |
|
|
+--------+----------------------------+---------------+-------+-------+-------+-------+
|
|
| 18 | Longest detecting 5 errors | 0 | 0.965 | 0.929 | 0.932 | 0.931 |
|
|
+--------+----------------------------+-------+-------+-------+-------+-------+-------+
|
|
| 19 | Worst case for 6 errors | 0 | 0.093 | 0.972 | 0.928 | 0.931 |
|
|
+--------+----------------------------+-------+-------+-------+-------+---------------+
|
|
| 39 | Length ≤ 39 characters | 0 | 0.756 | 0.935 | 0.932 | 0.931 |
|
|
+--------+----------------------------+-------+-------+-------+-------+---------------+
|
|
| 59 | Length ≤ 59 characters | 0 | 0.805 | 0.933 | 0.931 |
|
|
+--------+----------------------------+-------+-------+-------+-----------------------+
|
|
| 71 | Length ≤ 71 characters | 0 | 0.830 | 0.934 | 0.931 |
|
|
+--------+----------------------------+-------+-------+-------+-----------------------+
|
|
| 89 | Longest detecting 4 errors | 0 | 0.867 | 0.933 | 0.931 |
|
|
+--------+----------------------------+-------+-------+-------+-----------------------+
|
|
|
|
TODO: fill in table for a Sapling payment address, and check the following paragraph.
|
|
|
|
This means that when 5 changed characters occur randomly distributed in the 77
|
|
characters (excluding the separator) of a Sapling payment address, there is a chance
|
|
of at most 0.867 per billion that it will go undetected. When those 5 changes occur
|
|
randomly within a 19-character window, that chance goes down to 0.093 per billion.
|
|
As the number of errors goes up, the chance converges towards 1 in 2\ :sup:`30` =
|
|
0.931 per billion.
|
|
|
|
The chosen code performs reasonably well up to 1023 characters, but is best for
|
|
lengths up to 89 characters (excluding the separator). Since the search for suitable
|
|
codes was based on the requirements for Bitcoin P2WPKH and P2WSH addresses, it is
|
|
not quite optimal for the address lengths used by Sapling, but the advantages of
|
|
compatibility with existing Bitcoin libraries were considered to outweigh this
|
|
consideration.
|
|
|
|
|
|
Acknowledgements
|
|
================
|
|
|
|
This document is closely based on BIP 173 written by Pieter Wuille and Greg Maxwell,
|
|
which was inspired by the `address proposal <https://rusty.ozlabs.org/?p=578>`_ by
|
|
Rusty Russell and the `base32
|
|
<https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2014-February/004402.html>`_
|
|
proposal by Mark Friedenbach. BIP 173 also had input from Luke Dashjr, Johnson Lau,
|
|
Eric Lombrozo, Peter Todd, and various other reviewers.
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
.. [#RFC2119] `RFC 2119: Key words for use in RFCs to Indicate Requirement Levels <https://www.rfc-editor.org/rfc/rfc2119.html>`_
|
|
.. [#protocol] `Zcash Protocol Specification, Version 2021.2.16 or later <protocol/protocol.pdf>`_
|
|
.. [#zip-0032] `ZIP 32: Shielded Hierarchical Deterministic Wallets <zip-0032.rst>`_
|
|
.. [#zip-0200] `ZIP 200: Network Upgrade Mechanism <zip-0200.rst>`_
|
|
.. [#zip-0205] `ZIP 205: Deployment of the Sapling Network Upgrade <zip-0205.rst>`_
|
|
.. [#bip-0013] `BIP 13: Address Format for pay-to-script-hash <https://github.com/bitcoin/bips/blob/master/bip-0013.mediawiki>`_
|
|
.. [#bip-0173] `BIP 173: Base32 address format for native v0-16 witness outputs <https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki>`_
|