ZIP: 244
Title: Transaction Identifier Non-Malleability
Owners: Kris Nuttycombe <kris@electriccoin.co>
        Daira Hopwood <daira@electriccoin.co>
Status: Proposed
Category: Consensus
Created: 2021-01-06
License: MIT
Discussions-To: <https://github.com/zcash/zips/issues/411>

Terminology

The key words "MUST" and "MUST NOT" in this document are to be interpreted as described in RFC 2119. 1

The terms "consensus branch", "epoch", and "network upgrade" in this document are to be interpreted as described in ZIP 200. 2

The term "field encoding" refers to the binary serialized form of a Zcash transaction field, as specified in section 7.1 of the Zcash protocol specification 8.

Abstract

This proposal defines a new transaction digest algorithm for the NU5 network upgrade onward, in order to introduce non-malleable transaction identifiers that commit to all transaction data except for attestations to transaction validity.

This proposal also defines a new transaction digest algorithm for signature validation, which shares all available structure produced during the construction of transaction identifiers, in order to minimize redundant data hashing in validation.

This proposal also defines a new name and semantics for the hashLightClientRoot field of the block header, to enable additional commitments to be represented in this hash and to provide a mechanism for future extensibility of the set of commitments represented.

Motivation

In all cases, but particularly in order to support the use of transactions in higher-level protocols, any modification of the transaction that has not been explicitly permitted (such as via anyone-can-spend inputs) should invalidate attestations to spend authority or to the included outputs. Following the activation of this proposed change, transaction identifiers will be stable irrespective of any possible malleation of "witness data" such as proofs and transaction signatures.

In addition, by specifying a transaction identifier and signature algorithm that is decoupled from the serialized format of the transaction as a whole, this change makes it so that the wire format of transactions is no longer consensus-critical.

Requirements

Non-requirements

In order to support backwards-compatibility with parts of the ecosystem that have not yet upgraded to the non-malleable transaction format, it is not an initial requirement that all transactions be non-malleable.

It is not required that legacy (Sapling V4 and earlier) transaction formats support construction of non-malleable transaction identifiers, even though they may continue to be accepted by the network after the NU5 upgrade.

Specification

Digests

All digests are personalized BLAKE2b-256 hashes. In cases where no elements are available for hashing (for example, if there are no transparent transaction inputs or no Orchard actions), a personalized hash of the empty byte array will be used. The personalization string therefore provides domain separation for the hashes of even empty data fields.

The notation BLAKE2b-256(personalization_string, []) is used to refer to hashes constructed in this manner.

TxId Digest

A new transaction digest algorithm is defined that constructs the identifier for a transaction from a tree of hashes. Each branch of the tree of hashes will correspond to a specific subset of transaction data. The overall structure of the hash is as follows; each name referenced here will be described in detail below:

txid_digest
├── header_digest
├── transparent_digest
│   ├── prevouts_digest
│   ├── sequence_digest
│   └── outputs_digest
├── sapling_digest
│   ├── sapling_spends_digest
│   │   ├── sapling_spends_compact_digest
│   │   └── sapling_spends_noncompact_digest
│   ├── sapling_outputs_digest
│   │   ├── sapling_outputs_compact_digest
│   │   ├── sapling_outputs_memos_digest
│   │   └── sapling_outputs_noncompact_digest
│   └── valueBalance
└── orchard_digest
    ├── orchard_actions_compact_digest
    ├── orchard_actions_memos_digest
    ├── orchard_actions_noncompact_digest
    ├── flagsOrchard
    ├── valueBalanceOrchard
    └── anchorOrchard

Each node written as snake_case in this tree is a BLAKE2b-256 hash of its children, initialized with a personalization string specific to that branch of the tree. Nodes that are not themselves digests are written in camelCase. In the specification below, nodes of the tree are presented in depth-first order.

txid_digest

A BLAKE2b-256 hash of the following values

T.1: header_digest       (32-byte hash output)
T.2: transparent_digest  (32-byte hash output)
T.3: sapling_digest      (32-byte hash output)
T.4: orchard_digest      (32-byte hash output)

The personalization field of this hash is set to:

"ZcashTxHash_" || CONSENSUS_BRANCH_ID

ZcashTxHash_ has 1 underscore character.

As in ZIP 143 5, CONSENSUS_BRANCH_ID is the 4-byte little-endian encoding of the consensus branch ID for the epoch of the block containing the transaction. Domain separation of the transaction id hash across parallel consensus branches provides replay protection: transactions targeted for one consensus branch will not have the same transaction identifier on other consensus branches.

This signature hash personalization prefix has been changed to reflect the new role of this hash (relative to ZcashSigHash as specified in ZIP 143) as a transaction identifier rather than a commitment that is exclusively used for signature purposes. The previous computation of the transaction identifier was a SHA256d hash of the serialized transaction contents, and was not personalized.

T.1: header_digest

A BLAKE2b-256 hash of the following values

T.1a: version             (4-byte little-endian version identifier including overwinter flag)
T.1b: version_group_id    (4-byte little-endian version group identifier)
T.1c: consensus_branch_id (4-byte little-endian consensus branch id)
T.1d: lock_time           (4-byte little-endian nLockTime value)
T.1e: expiry_height       (4-byte little-endian block height)

The personalization field of this hash is set to:

"ZTxIdHeadersHash"
T.2: transparent_digest

In the case that transparent inputs or outputs are present, the transparent digest consists of a BLAKE2b-256 hash of the following values

T.2a: prevouts_digest (32-byte hash)
T.2b: sequence_digest (32-byte hash)
T.2c: outputs_digest  (32-byte hash)

The personalization field of this hash is set to:

"ZTxIdTranspaHash"

In the case that the transaction has no transparent components, transparent_digest is

BLAKE2b-256("ZTxIdTranspaHash", [])
T.2a: prevouts_digest

A BLAKE2b-256 hash of the field encoding of all outpoint field values of transparent inputs to the transaction.

The personalization field of this hash is set to:

"ZTxIdPrevoutHash"

In the case that the transaction has transparent outputs but no transparent inputs, prevouts_digest is

BLAKE2b-256("ZTxIdPrevoutHash", [])
T.2b: sequence_digest

A BLAKE2b-256 hash of the 32-bit little-endian representation of all nSequence field values of transparent inputs to the transaction.

The personalization field of this hash is set to:

"ZTxIdSequencHash"

In the case that the transaction has transparent outputs but no transparent inputs, sequence_digest is

BLAKE2b-256("ZTxIdSequencHash", [])
T.2c: outputs_digest

A BLAKE2b-256 hash of the concatenated field encodings of all transparent output values of the transaction. The field encoding of such an output consists of the encoded output amount (8-byte little endian) followed by the scriptPubKey byte array (serialized as Bitcoin script).

The personalization field of this hash is set to:

"ZTxIdOutputsHash"

In the case that the transaction has transparent inputs but no transparent outputs, outputs_digest is

BLAKE2b-256("ZTxIdOutputsHash", [])
T.3: sapling_digest

In the case that Sapling spends or outputs are present, the digest of Sapling components is composed of two subtrees which are organized to permit easy interoperability with the CompactBlock representation of Sapling data specified by the ZIP 307 Light Client Protocol 7.

This digest is a BLAKE2b-256 hash of the following values

T.3a: sapling_spends_digest  (32-byte hash)
T.3b: sapling_outputs_digest (32-byte hash)
T.3c: valueBalance           (64-bit signed little-endian)

The personalization field of this hash is set to:

"ZTxIdSaplingHash"

In the case that the transaction has no Sapling spends or outputs, sapling_digest is

BLAKE2b-256("ZTxIdSaplingHash", [])
T.3a: sapling_spends_digest

In the case that Sapling spends are present, this digest is a BLAKE2b-256 hash of the following values

T.3a.i:  sapling_spends_compact_digest    (32-byte hash)
T.3a.ii: sapling_spends_noncompact_digest (32-byte hash)

The personalization field of this hash is set to:

"ZTxIdSSpendsHash"

In the case that the transaction has Sapling outputs but no Sapling spends, sapling_spends_digest is

BLAKE2b-256("ZTxIdSSpendsHash", [])
T.3a.i: sapling_spends_compact_digest

A BLAKE2b-256 hash of the field encoding of all nullifier field values of Sapling shielded spends belonging to the transaction.

The personalization field of this hash is set to:

"ZTxIdSSpendCHash"
T.3a.ii: sapling_spends_noncompact_digest

A BLAKE2b-256 hash of the non-nullifier information for all Sapling shielded spends belonging to the transaction, excluding both zkproof data and spend authorization signature(s). For each spend, the following elements are included in the hash:

T.3a.ii.1: cv     (field encoding bytes)
T.3a.ii.2: anchor (field encoding bytes)
T.3a.ii.3: rk     (field encoding bytes)

In Transaction version 5, Sapling Spends have a shared anchor, which is hashed into the sapling_spends_noncompact_digest for each Spend.

The personalization field of this hash is set to:

"ZTxIdSSpendNHash"
T.3b: sapling_outputs_digest

In the case that Sapling outputs are present, this digest is a BLAKE2b-256 hash of the following values

T.3b.i:   sapling_outputs_compact_digest    (32-byte hash)
T.3b.ii:  sapling_outputs_memos_digest      (32-byte hash)
T.3b.iii: sapling_outputs_noncompact_digest (32-byte hash)

The personalization field of this hash is set to:

"ZTxIdSOutputHash"

In the case that the transaction has Sapling spends but no Sapling outputs, sapling_outputs_digest is

BLAKE2b-256("ZTxIdSOutputHash", [])
T.3b.i: sapling_outputs_compact_digest

A BLAKE2b-256 hash of the subset of Sapling output information included in the ZIP-307 7 CompactBlock format for all Sapling shielded outputs belonging to the transaction. For each output, the following elements are included in the hash:

T.3b.i.1: cmu                  (field encoding bytes)
T.3b.i.2: ephemeral_key        (field encoding bytes)
T.3b.i.3: enc_ciphertext[..52] (First 52 bytes of field encoding)

The personalization field of this hash is set to:

"ZTxIdSOutC__Hash" (2 underscore characters)
T.3b.ii: sapling_outputs_memos_digest

A BLAKE2b-256 hash of the subset of Sapling shielded memo field data for all Sapling shielded outputs belonging to the transaction. For each output, the following elements are included in the hash:

T.3b.ii.1: enc_ciphertext[52..564] (contents of the encrypted memo field)

The personalization field of this hash is set to:

"ZTxIdSOutM__Hash" (2 underscore characters)
T.3b.iii: sapling_outputs_noncompact_digest

A BLAKE2b-256 hash of the remaining subset of Sapling output information not included in the ZIP 307 7 CompactBlock format, excluding zkproof data, for all Sapling shielded outputs belonging to the transaction. For each output, the following elements are included in the hash:

T.3b.iii.1: cv                    (field encoding bytes)
T.3b.iii.2: enc_ciphertext[564..] (post-memo Poly1305 AEAD tag of field encoding)
T.3b.iii.3: out_ciphertext        (field encoding bytes)

The personalization field of this hash is set to:

"ZTxIdSOutN__Hash" (2 underscore characters)
T.4: orchard_digest

In the case that Orchard actions are present in the transaction, this digest is a BLAKE2b-256 hash of the following values

T.4a: orchard_actions_compact_digest      (32-byte hash output)
T.4b: orchard_actions_memos_digest        (32-byte hash output)
T.4c: orchard_actions_noncompact_digest   (32-byte hash output)
T.4d: flagsOrchard                        (1 byte)
T.4e: valueBalanceOrchard                 (64-bit signed little-endian)
T.4f: anchorOrchard                       (32 bytes)

The personalization field of this hash is set to:

"ZTxIdOrchardHash"

In the case that the transaction has no Orchard actions, orchard_digest is

BLAKE2b-256("ZTxIdOrchardHash", [])
T.4a: orchard_actions_compact_digest

A BLAKE2b-256 hash of the subset of Orchard Action information intended to be included in an updated version of the ZIP-307 7 CompactBlock format for all Orchard Actions belonging to the transaction. For each Action, the following elements are included in the hash:

T.4a.i  : nullifier            (field encoding bytes)
T.4a.ii : cmx                  (field encoding bytes)
T.4a.iii: ephemeralKey         (field encoding bytes)
T.4a.iv : encCiphertext[..52]  (First 52 bytes of field encoding)

The personalization field of this hash is set to:

"ZTxIdOrcActCHash"
T.4b: orchard_actions_memos_digest

A BLAKE2b-256 hash of the subset of Orchard shielded memo field data for all Orchard Actions belonging to the transaction. For each Action, the following elements are included in the hash:

T.4b.i: encCiphertext[52..564] (contents of the encrypted memo field)

The personalization field of this hash is set to:

"ZTxIdOrcActMHash"
T.4c: orchard_actions_noncompact_digest

A BLAKE2b-256 hash of the remaining subset of Orchard Action information not intended for inclusion in an updated version of the the ZIP 307 7 CompactBlock format, for all Orchard Actions belonging to the transaction. For each Action, the following elements are included in the hash:

T.4c.i  : cv                    (field encoding bytes)
T.4c.ii : rk                    (field encoding bytes)
T.4c.iii: encCiphertext[564..]  (post-memo suffix of field encoding)
T.4c.iv : outCiphertext         (field encoding bytes)

The personalization field of this hash is set to:

"ZTxIdOrcActNHash"

Signature Digest

A new per-input transaction digest algorithm is defined that constructs a hash that may be signed by a transaction creator to commit to the effects of the transaction. A signature digest is produced for each transparent input, each Sapling input, and each Orchard action. For transparent inputs, this follows closely the algorithms from ZIP 143 5 and ZIP 243 6. For shielded inputs, this algorithm has the exact same output as the transaction digest algorithm, thus the txid may be signed directly.

The overall structure of the hash is as follows; each name referenced here will be described in detail below:

signature_digest
├── header_digest
├── transparent_sig_digest
├── sapling_digest
└── orchard_digest
signature_digest

A BLAKE2b-256 hash of the following values

S.1: header_digest          (32-byte hash output)
S.2: transparent_sig_digest (32-byte hash output)
S.3: sapling_digest         (32-byte hash output)
S.4: orchard_digest         (32-byte hash output)

The personalization field of this hash is set to:

"ZcashTxHash_" || CONSENSUS_BRANCH_ID

ZcashTxHash_ has 1 underscore character.

This value has the same personalization as the top hash of the transaction identifier digest tree, so that what is being signed in the case that there are no transparent inputs is just the transaction id.

S.1: header_digest

Identical to that specified for the transaction identifier.

S.2: transparent_sig_digest

If we are producing a hash for the signature over a Sapling Spend or an Orchard Action, the value of transparent_sig_digest is identical to the value specified in section T.2.

If we are producing a hash for signature over a transparent input, the value of transparent_sig_digest depends upon the value of a hash_type flag as in ZIP 143 5.

The construction of each component below depends upon the values of the hash_type flag bits. Each component will be described separately

This digest is a BLAKE2b-256 hash of the following values

S.2a: prevouts_sig_digest (32-byte hash)
S.2b: sequence_sig_digest (32-byte hash)
S.2c: outputs_sig_digest  (32-byte hash)
S.2d: txin_sig_digest     (32-byte hash)

The personalization field of this hash is set to:

"ZTxIdTranspaHash"
S.2a: prevouts_sig_digest

This is a BLAKE2b-256 hash initialized with the personalization field value ZTxIdPrevoutHash.

If the SIGHASH_ANYONECANPAY flag is not set:

identical to the value of ``prevouts_digest`` as specified for the
transaction identifier in section T.2a.

otherwise:

BLAKE2b-256(``ZTxIdPrevoutHash``, [])
S.2b: sequence_sig_digest

This is a BLAKE2b-256 hash initialized with the personalization field value ZTxIdSequencHash.

If the SIGHASH_ANYONECANPAY flag is not set, and the sighash type is neither SIGHASH_SINGLE nor SIGHASH_NONE:

identical to the value of ``sequence_digest`` as specified for the
transaction identifier in section T.2b.

otherwise:

BLAKE2b-256(``ZTxIdSequencHash``, [])
S.2c: outputs_sig_digest

This is a BLAKE2b-256 hash initialized with the personalization field value ZTxIdOutputsHash.

If the sighash type is neither SIGHASH_SINGLE nor SIGHASH_NONE:

identical to the value of ``outputs_digest`` as specified for the
transaction identifier in section T.2c.

If the sighash type is SIGHASH_SINGLE and the signature hash is being computed for the transparent input at a particular index, and a transparent output appears in the transaction at that index:

the hash is over the transaction serialized form of the transparent output at that
index

otherwise:

BLAKE2b-256(``ZTxIdOutputsHash``, [])
S.2d: txin_sig_digest

This is a BLAKE2b-256 hash of the following properties of the transparent input being signed, initialized with the personalization field value Zcash___TxInHash (3 underscores):

S.2d.i:   prevout     (field encoding)
S.2d.ii:  script_code (field encoding)
S.2d.iii: value       (8-byte signed little-endian)
S.2d.iv:  nSequence   (4-byte unsigned little-endian)

Note: value is defined in the consensus rules to be a nonnegative value <= MAX_MONEY, but all existing implementations parse this value as signed and enforce the nonnegative constraint as a consensus check. It is defined as signed here for consistency with those existing implementations.

S.3: sapling_digest

Identical to that specified for the transaction identifier.

S.4: orchard_digest

Identical to that specified for the transaction identifier.

Authorizing Data Commitment

A new transaction digest algorithm is defined that constructs a digest which commits to the authorizing data of a transaction from a tree of BLAKE2b-256 hashes. For v5 transactions, the overall structure of the hash is as follows:

auth_digest
├── transparent_scripts_digest
├── sapling_auth_digest
└── orchard_auth_digest

Each node written as snake_case in this tree is a BLAKE2b-256 hash of authorizing data of the transaction.

For transaction versions before v5, a placeholder value consisting of 32 bytes of 0xFF is used in place of the authorizing data commitment. This is only used in the tree committed to by hashAuthDataRoot, as defined in Block Header Changes.

The pair (Transaction Identifier, Auth Commitment) constitutes a commitment to all the data of a serialized transaction that may be included in a block.

auth_digest

A BLAKE2b-256 hash of the following values

A.1: transparent_scripts_digest (32-byte hash output)
A.2: sapling_auth_digest        (32-byte hash output)
A.3: orchard_auth_digest        (32-byte hash output)

The personalization field of this hash is set to:

"ZTxAuthHash_" || CONSENSUS_BRANCH_ID

ZTxAuthHash_ has 1 underscore character.

A.1: transparent_scripts_digest

In the case that the transaction contains transparent inputs, this is a BLAKE2b-256 hash of the field encoding of the concatenated values of the Bitcoin script values associated with each transparent input belonging to the transaction.

The personalization field of this hash is set to:

"ZTxAuthTransHash"

In the case that the transaction has no transparent inputs, transparent_scripts_digest is

BLAKE2b-256("ZTxAuthTransHash", [])
A.2: sapling_auth_digest

In the case that Sapling Spends or Sapling Outputs are present, this is a BLAKE2b-256 hash of the field encoding of the Sapling zkproof value of each Sapling Spend Description, followed by the field encoding of the spend_auth_sig value of each Sapling Spend Description belonging to the transaction, followed by the field encoding of the zkproof field of each Sapling Output Description belonging to the transaction, followed by the field encoding of the binding signature:

A.2a: spend_zkproofs           (field encoding bytes)
A.2b: spend_auth_sigs          (field encoding bytes)
A.2c: output_zkproofs          (field encoding bytes)
A.2d: binding_sig              (field encoding bytes)

The personalization field of this hash is set to:

"ZTxAuthSapliHash"

In the case that the transaction has no Sapling Spends or Sapling Outputs, sapling_auth_digest is

BLAKE2b-256("ZTxAuthSapliHash", [])
A.3: orchard_auth_digest

In the case that Orchard Actions are present, this is a BLAKE2b-256 hash of the field encoding of the zkProofsOrchard, spendAuthSigsOrchard, and bindingSigOrchard fields of the transaction:

A.3a: proofsOrchard            (field encoding bytes)
A.3b: vSpendAuthSigsOrchard    (field encoding bytes)
A.3c: bindingSigOrchard        (field encoding bytes)

The personalization field of this hash is set to:

"ZTxAuthOrchaHash"

In the case that the transaction has no Orchard Actions, orchard_auth_digest is

BLAKE2b-256("ZTxAuthOrchaHash", [])

Block Header Changes

The nonmalleable transaction identifier specified by this ZIP will be used in the place of the current malleable transaction identifier within the Merkle tree committed to by the hashMerkleRoot value. However, this change now means that hashMerkleRoot is not sufficient to fully commit to the transaction data, including witnesses, that appear within the block.

As a consequence, we now need to add a new commitment to the block header. This commitment will be the root of a Merkle tree having leaves that are transaction authorizing data commitments, produced according to the Authorizing Data Commitment part of this specification. The insertion order for this Merkle tree MUST be identical to the insertion order of transaction identifiers into the Merkle tree that is used to construct hashMerkleRoot, such that a path through this Merkle tree to a transaction identifies the same transaction as that path reaches in the tree rooted at hashMerkleRoot.

This new commitment is named hashAuthDataRoot and is the root of a binary Merkle tree of transaction authorizing data commitments having height \(\mathsf{ceil(log_2(tx\_count))}\) , padded with leaves having the "null" hash value [0u8; 32]. Note that \(\mathsf{log_2(tx\_count)}\) is well-defined because \(\mathsf{tx\_count} > 0\) , due to the coinbase transaction in each block. Non-leaf hashes in this tree are BLAKE2b-256 hashes personalized by the string "ZcashAuthDatHash".

Changing the block header format to allow space for an additional commitment is somewhat invasive. Instead, the name and meaning of the hashLightClientRoot field, described in ZIP 221 3, is changed.

hashLightClientRoot is renamed to hashBlockCommitments. The value of this hash is the BLAKE2b-256 hash personalized by the string "ZcashBlockCommit" of the following elements:

hashLightClientRoot (as described in ZIP 221)
hashAuthDataRoot    (as described below)
terminator          [0u8;32]

This representation treats the hashBlockCommitments value as a linked list of hashes terminated by arbitrary data. In the case of protocol upgrades where additional commitments need to be included in the block header, it is possible to replace this terminator with the hash of a newly defined structure which ends in a similar terminator. Fully validating nodes MUST always use the entire structure defined by the latest activated protocol version that they support.

The linked structure of this hash is intended to provide extensibility for use by light clients which may be connected to a third-party server that supports a later protocol version. Such a third party SHOULD provide a value that can be used instead of the all-zeros terminator to permit the light client to perform validation of the parts of the structure it needs.

Unlike the hashLightClientRoot change, the change to hashBlockCommitments happens in the block that activates this ZIP.

The block header byte format and version are not altered by this ZIP.

Reference implementation

References

1 RFC 2119: Key words for use in RFCs to Indicate Requirement Levels
2 ZIP 200: Network Upgrade Mechanism
3 ZIP 221: FlyClient - Consensus Layer Changes
4 ZIP 76: Transaction Signature Validation before Overwinter
5 ZIP 143: Transaction Signature Validation for Overwinter
6 ZIP 243: Transaction Signature Validation for Sapling
7 ZIP 307: Light Client Protocol for Payment Detection
8 Zcash Protocol Specification, Version 2020.1.24 [NU5 proposal]. Section 7.1: Transaction Encoding and Consensus