Variable-base scalar multiplication

In the Orchard circuit we need to check $p k_{d} = [ivk] g_{d}$ where $ivk \in [0, p)$ and the scalar field is $F_{q}$ with $p < q$ .

We have $p = 2^{254} + t_{p}$ and $q = 2^{254} + t_{q}$ , for $t_{p}, t_{q} < 2^{128}$ .

Witness scalar

We're trying to compute $[α] T$ for $α \in [0, q)$ . Set $k = α + t_{q}$ and $n = 254$ . Then we can compute

$[2^{254} + (α + t_{q})] T = [2^{254} + (α + t_{q}) - (2^{254} + t_{q})] T = [α] T$

provided that $α + t_{q} \in [0, 2^{n + 1})$ , i.e. $α < 2^{n + 1} - t_{q}$ which covers the whole range we need because in fact $2^{255} - t_{q} > q$ .

Thus, given a scalar $α$ , we witness the boolean decomposition of $k = α + t_{q} .$ (We use big-endian bit order for convenient input into the variable-base scalar multiplication algorithm.)

$k = k_{254} \cdot 2^{254} + k_{253} \cdot 2^{253} + \dots + k_{0} .$

Variable-base scalar multiplication

We use an optimized double-and-add algorithm, copied from "Faster variable-base scalar multiplication in zk-SNARK circuits" with some variable name changes:

Acc := [2] T
for i from n-1 down to 0 {
    P := k_{i+1} ? T : −T
    Acc := (Acc + P) + Acc
}
return (k_0 = 0) ? (Acc - T) : Acc

It remains to check that the x-coordinates of each pair of points to be added are distinct.

When adding points in a prime-order group, we can rely on Theorem 3 from Appendix C of the Halo paper, which says that if we have two such points with nonzero indices wrt a given odd-prime order base, where the indices taken in the range $- (q - 1) /2.. (q - 1) /2$ are distinct disregarding sign, then they have different x-coordinates. This is helpful, because it is easier to reason about the indices of points occurring in the scalar multiplication algorithm than it is to reason about their x-coordinates directly.

So, the required check is equivalent to saying that the following "indexed version" of the above algorithm never asserts:

acc := 2
for i from n-1 down to 0 {
    p = k_{i+1} ? 1 : −1
    assert acc ≠ ± p
    assert (acc + p) ≠ acc    // X
    acc := (acc + p) + acc
    assert 0 < acc ≤ (q-1)/2
}
if k_0 = 0 {
    assert acc ≠ 1
    acc := acc - 1
}

The maximum value of acc is:

    <--- n 1s --->
  1011111...111111
= 1100000...000000 - 1

= $2^{n + 1} + 2^{n} - 1$

The assertion labelled X obviously cannot fail because $p \neq = 0$ . It is possible to see that acc is monotonically increasing except in the last conditional. It reaches its largest value when $k$ is maximal, i.e. $2^{n + 1} + 2^{n} - 1$ .

So to entirely avoid exceptional cases, we would need $2^{n + 1} + 2^{n} - 1 < (q - 1) /2$ . But we can use $n$ larger by $c$ if the last $c$ iterations use complete addition.

The first $i$ for which the algorithm using only incomplete addition fails is going to be $252$ , since $2^{252 + 1} + 2^{252} - 1 > (q - 1) /2$ . We need $n = 254$ to make the wraparound technique above work.

sage: q = 0x40000000000000000000000000000000224698fc0994a8dd8c46eb2100000001
sage: 2^253 + 2^252 - 1 < (q-1)//2
False
sage: 2^252 + 2^251 - 1 < (q-1)//2
True

So the last three iterations of the loop ( $i = 2..0$ ) need to use complete addition, as does the conditional subtraction at the end. Writing this out using ⸭ for incomplete addition (as we do in the spec), we have:

Acc := [2] T
for i from 253 down to 3 {
    P := k_{i+1} ? T : −T
    Acc := (Acc ⸭ P) ⸭ Acc
}
for i from 2 down to 0 {
    P := k_{i+1} ? T : −T
    Acc := (Acc + P) + Acc  // complete addition
}
return (k_0 = 0) ? (Acc + (-T)) : Acc  // complete addition

Constraint program for optimized double-and-add (incomplete addition)

Define a running sum $z_{j} = \sum_{i = j}^{n} (k_{i} \cdot 2^{i - j})$ , where $n = 254$ and:

$z_{n + 1} = 0, z_{n} = k_{n}, (most significant bit) z_{0} = k .$

$Initialize A_{254} = [2] T . for i from 254 down to 4 : bool_check (k_{i}) = 0 z_{i} = 2 z_{i + 1} + k_{i} x_{P, i} = x_{T} y_{P, i} = (2 k_{i} - 1) \cdot y_{T} (conditionally negate) λ_{1, i} \cdot (x_{A, i} - x_{P, i}) = y_{A, i} - y_{P, i} λ_{1, i}^{2} = x_{R, i} + x_{A, i} + x_{P, i} (λ_{1, i} + λ_{2, i}) \cdot (x_{A, i} - x_{R, i}) = 2 y_{A, i} λ_{2, i}^{2} = x_{A, i - 1} + x_{R, i} + x_{A, i} λ_{2, i} \cdot (x_{A, i} - x_{A, i - 1}) = y_{A, i} + y_{A, i - 1},$

where $x_{R, i} = (λ_{1, i}^{2} - x_{A, i} - x_{T}) .$ The helper $bool_check (x) = x \cdot (1 - x)$ . After substitution of $x_{P, i}, y_{P, i}, x_{R, i}, y_{A, i}$ , and $y_{A, i - 1}$ , this becomes:

$Initialize A_{254} = [2] T . for i from 254 down to 4 : // let k_{i} = z_{i} - 2 z_{i + 1} // let y_{A, i} = \frac{( λ _{1, i} + λ _{2, i} ) \cdot ( x _{A, i} - ( λ _{1, i}^{2} - x _{A, i} - x _{T} ))}{2} bool_check (k_{i}) = 0 λ_{1, i} \cdot (x_{A, i} - x_{T}) = y_{A, i} - (2 k_{i} - 1) \cdot y_{T} λ_{2, i}^{2} = x_{A, i - 1} + λ_{1, i}^{2} - x_{T} {λ_{2, i} \cdot (x_{A, i} - x_{A, i - 1}) = y_{A, i} + y_{A, i - 1}, λ_{2, 4} \cdot (x_{A, 4} - x_{A, 3}) = y_{A, 4} + y_{A, 3}^{witnessed}, if i > 4 if i = 4.$

Here, $y_{A, 3}^{witnessed}$ is assigned to a cell. This is unlike previous $y_{A, i}$ 's, which were implicitly derived from $λ_{1, i}, λ_{2, i}, x_{A, i}, x_{T}$ , but never actually assigned.

The bits $k_{3 \dots 1}$ are used in three further steps, using complete addition:

$for i from 3 down to 1 : // let k_{i} = z_{i} - 2 z_{i + 1} bool_check (k_{i}) = 0 (x_{A, i - 1}, y_{A, i - 1}) = ((x_{A, i}, y_{A, i}) + (x_{T}, y_{T})) + (x_{A, i}, y_{A, i})$

If the least significant bit $k_{0} = 1,$ we set $B = O,$ otherwise we set $B = - T$ . Then we return $A + B$ using complete addition.

Let $B = {(0, 0), (x_{T}, - y_{T}), if k_{0} = 1, otherwise.$

Output $(x_{A, 0}, y_{A, 0}) + B$ .

(Note that $(0, 0)$ represents $O$ .)

Circuit design

We need six advice columns to witness $(x_{T}, y_{T}, λ_{1}, λ_{2}, x_{A, i}, z_{i})$ . However, since $(x_{T}, y_{T})$ are the same, we can perform two incomplete additions in a single row, reusing the same $(x_{T}, y_{T})$ . We split the scalar bits used in incomplete addition into $hi$ and $l o$ halves and process them in parallel. This means that we effectively have two for loops:

the first, covering the $hi$ half for $i$ from $254$ down to $130$ , with a special case at $i = 130$ ; and
the second, covering the $l o$ half for the remaining $i$ from $129$ down to $4$ , with a special case at $i = 4$ .

$x_{T} x_{T} x_{T} ⋮ x_{T} y_{T} y_{T} y_{T} ⋮ y_{T} z^{hi} z_{255} = 0 z_{254} z_{253} ⋮ z_{130} x_{A}^{hi} x_{A, 254} = 2 [T]_{x} x_{A, 253} ⋮ x_{A, 130} x_{A, 129} λ_{1}^{hi} y_{A, 254} = 2 [T]_{y} λ_{1, 254} λ_{1, 253} ⋮ λ_{1, 130} y_{A, 129} λ_{2}^{hi} λ_{2, 254} λ_{2, 253} ⋮ λ_{2, 130} q_{1}^{hi} 100 ⋮ 0 q_{2}^{hi} 011 ⋮ 0 q_{3}^{hi} 000 ⋮ 1 z^{l o} z_{130} z_{129} z_{128} ⋮ z_{5} z_{4} x_{A}^{l o} x_{A, 129} x_{A, 128} ⋮ x_{A, 5} x_{A, 4} x_{A, 3} λ_{1}^{l o} y_{A, 129} λ_{1, 129} λ_{1, 128} ⋮ λ_{1, 5} λ_{1, 4} y_{A, 3} λ_{2}^{l o} λ_{2, 129} λ_{2, 128} ⋮ λ_{2, 5} λ_{2, 4} q_{1}^{l o} 100 ⋮ 00 q_{2}^{l o} 011 ⋮ 10 q_{3}^{l o} 000 ⋮ 01$

For each $hi$ and $l o$ half, we have three sets of gates. Note that $i$ is going from $255.. = 3$ ; $i$ is NOT indexing the rows.

$q_{1} = 1$

This gate is only used on the first row (before the for loop). We check that $λ_{1}, λ_{2}$ are initialized to values consistent with the initial $y_{A} .$ $Degree 3 Constraint q_{1} \cdot (y_{A, n}^{witnessed} - y_{A, n}) = 0$ where $y_{A, n} y_{A, n}^{witnessed} = \frac{( λ _{1, n} + λ _{2, n} ) \cdot ( x _{A, n} - ( λ _{1, n}^{2} - x _{A, n} - x _{T} ))}{2}, is witnessed.$

$q_{2} = 1$

This gate is used on all rows corresponding to the for loop except the last.

$Degree 223433 Constraint q_{2} \cdot (x_{T, c u r} - x_{T, n e x t}) = 0 q_{2} \cdot (y_{T, c u r} - y_{T, n e x t}) = 0 q_{2} \cdot bool_check (k_{i}) = 0, where k_{i} = z_{i} - 2 z_{i + 1} q_{2} \cdot (λ_{1, i} \cdot (x_{A, i} - x_{T, i}) - y_{A, i} + (2 k_{i} - 1) \cdot y_{T, i}) = 0 q_{2} \cdot (λ_{2, i}^{2} - x_{A, i - 1} - λ_{1, i}^{2} + x_{T, i}) = 0 q_{2} \cdot (λ_{2, i} \cdot (x_{A, i} - x_{A, i - 1}) - y_{A, i} - y_{A, i - 1}) = 0$ where $y_{A, i} y_{A, i - 1} = \frac{( λ _{1, i} + λ _{2, i} ) \cdot ( x _{A, i} - ( λ _{1, i}^{2} - x _{A, i} - x _{T} ))}{2}, = \frac{( λ _{1, i - 1} + λ _{2, i - 1} ) \cdot ( x _{A, i - 1} - ( λ _{1, i - 1}^{2} - x _{A, i - 1} - x _{T} ))}{2},$

$q_{3} = 1$

This gate is used on the final iteration of the for loop, handling the special case where we check that the output $y_{A}$ has been witnessed correctly. $Degree 3433 Constraint q_{3} \cdot bool_check (k_{i}) = 0, where k_{i} = z_{i} - 2 z_{i + 1} q_{3} \cdot (λ_{1, i} \cdot (x_{A, i} - x_{T, i}) - y_{A, i} + (2 k_{i} - 1) \cdot y_{T, i}) = 0 q_{3} \cdot (λ_{2, i}^{2} - x_{A, i - 1} - λ_{1, i}^{2} + x_{T, i}) = 0 q_{3} \cdot (λ_{2, i} \cdot (x_{A, i} - x_{A, i - 1}) - y_{A, i} - y_{A, i - 1}^{witnessed}) = 0$ where $y_{A, i} y_{A, i - 1}^{witnessed} = \frac{( λ _{1, i} + λ _{2, i} ) \cdot ( x _{A, i} - ( λ _{1, i}^{2} - x _{A, i} - x _{T} ))}{2}, is witnessed.$

Overflow check

$z_{i}$ cannot overflow for any $i \geq 1$ , because it is a weighted sum of bits only up to $2^{n - 1} = 2^{253}$ , which is smaller than $p$ (and also $q$ ).

However, $z_{0} = α + t_{q}$ can overflow $[0, p)$ .

Since overflow can only occur in the final step that constrains $z_{0} = 2 \cdot z_{1} + k_{0}$ , we have $z_{0} = k (mod p)$ . It is then sufficient to also check that $z_{0} = α + t_{q} (mod p)$ (so that $k = α + t_{q} (mod p)$ ) and that $k \in [t_{q}, p + t_{q})$ . These conditions together imply that $k = α + t_{q}$ as an integer, and so $2^{254} + k = α (mod q)$ as required.

Note: the bits $k_{254..0}$ do not represent a value reduced modulo $q$ , but rather a representation of the unreduced $α + t_{q}$ .

Optimized check for $k \in [t_{q}, p + t_{q})$

Since $t_{p} + t_{q} < 2^{130}$ , we have $[t_{q}, p + t_{q}) = [t_{q}, t_{q} + 2^{130}) \cup [2^{130}, 2^{254}) \cup ([2^{254}, 2^{254} + 2^{130}) \cap [p + t_{q} - 2^{130}, p + t_{q})) .$

We may assume that $k = α + t_{q} (mod p)$ .

Therefore, $k \in [t_{q}, p + t_{q}) \Leftrightarrow \Leftrightarrow \Leftrightarrow (k \in [t_{q}, t_{q} + 2^{130}) \lor k \in [2^{130}, 2^{254})) \lor (k \in [2^{254}, 2^{254} + 2^{130}) \land k \in [p + t_{q} - 2^{130}, p + t_{q})) (k_{254} = 0 ⟹ (k \in [t_{q}, t_{q} + 2^{130}) \lor k \in [2^{130}, 2^{254}))) \land (k_{254} = 1 ⟹ (k \in [2^{254}, 2^{254} + 2^{130}) \land k \in [p + t_{q} - 2^{130}, p + t_{q})) (k_{254} = 0 ⟹ (α \in [0, 2^{130}) \lor k \in [2^{130}, 2^{254})) \land (k_{254} = 1 ⟹ (k \in [2^{254}, 2^{254} + 2^{130}) \land (α + 2^{130}) mod p \in [0, 2^{130}))) Ⓐ$

Given $k \in [2^{254}, 2^{254} + 2^{130})$ , we prove equivalence of $k \in [p + t_{q} - 2^{130}, p + t_{q})$ and $(α + 2^{130}) mod p \in [0, 2^{130})$ as follows:

shift the range by $2^{130} - p - t_{q}$ to give $k + 2^{130} - p - t_{q} \in [0, 2^{130})$ ;

observe that $k + 2^{130} - p - t_{q}$ is guaranteed to be in $[2^{130} - t_{p} - t_{q}, 2^{131} - t_{p} - t_{q})$ and therefore cannot overflow or underflow modulo $p$ ;

using the fact that $k = α + t_{q} (mod p)$ , observe that $(k + 2^{130} - p - t_{q}) mod p = (α + t_{q} + 2^{130} - p - t_{q}) mod p = (α + 2^{130}) mod p$ .

(We can see in a different way that this is correct by observing that it checks whether $α mod p \in [p - 2^{130}, p)$ , so the upper bound is aligned as we would expect.)

Now, we can continue optimizing from $Ⓐ$ :

$k \in [t_{q}, p + t_{q}) \Leftrightarrow \Leftrightarrow (k_{254} = 0 ⟹ (α \in [0, 2^{130}) \lor k \in [2^{130}, 2^{254})) \land (k_{254} = 1 ⟹ (k \in [2^{254}, 2^{254} + 2^{130}) \land (α + 2^{130}) mod p \in [0, 2^{130}))) (k_{254} = 0 ⟹ (α \in [0, 2^{130}) \lor k_{253..130} are not all 0)) \land (k_{254} = 1 ⟹ (k_{253..130} are all 0 \land (α + 2^{130}) mod p \in [0, 2^{130})))$

Constraining $k_{253..130}$ to be all- $0$ or not-all- $0$ can be implemented almost "for free", as follows.

Recall that $z_{i} = \sum_{h = i}^{n} (k_{h} \cdot 2^{h - i})$ , so we have:

$z_{130} z_{130} z_{130} - k_{254} \cdot 2^{124} = = = \sum_{h = 130}^{254} (k_{h} \cdot 2^{h - 130}) k_{254} \cdot 2^{254 - 130} + \sum_{h = 130}^{253} (k_{h} \cdot 2^{h - 130}) \sum_{h = 130}^{253} (k_{h} \cdot 2^{h - 130})$

So $k_{253..130}$ are all $0$ exactly when $z_{130} = k_{254} \cdot 2^{124}$ .

Finally, we can merge the $130$ -bit decompositions for the $k_{254} = 0$ and $k_{254} = 1$ cases by checking that $(α + k_{254} \cdot 2^{130}) mod p \in [0, 2^{130})$ .

Overflow check constraints

Let $s = α + k_{254} \cdot 2^{130}$ . The constraints for the overflow check are:

$z_{0} k_{254} = 1 ⟹ (z_{130} k_{254} = 0 ⟹ (z_{130} = α + t_{q} (mod p) = 2^{124} \land s mod p \in [0, 2^{130})) \neq = 0 \lor s mod p \in [0, 2^{130}))$

Define $inv0 (x) = {0, 1/ x, if x = 0 otherwise.$

Witness $η = inv0 (z_{130})$ , and decompose $s mod p$ as $s_{129..0}$ .

Then the needed gates are:

$Degree 22335 Constraint q_mul^{overflow} \cdot (s - (α + k_{254} \cdot 2^{130})) = 0 q_mul^{overflow} \cdot (z_{0} - α - t_{q}) = 0 q_mul^{overflow} \cdot (k_{254} \cdot (z_{130} - 2^{124})) = 0 q_mul^{overflow} \cdot (k_{254} \cdot (s - i = 0 \sum 129 2^{i} \cdot s_{i}) / 2^{130}) = 0 q_mul^{overflow} \cdot ((1 - k_{254}) \cdot (1 - z_{130} \cdot η) \cdot (s - i = 0 \sum 129 2^{i} \cdot s_{i}) / 2^{130}) = 0$ where $(s - i = 0 \sum 129 2^{i} \cdot s_{i}) / 2^{130}$ can be computed by another running sum. Note that the factor of $1/ 2^{130}$ has no effect on the constraint, since the RHS is zero.

Running sum range check

We make use of a $10$ -bit lookup range check in the circuit to subtract the low $130$ bits of $s$ . The range check subtracts the first $13 \cdot 10$ bits of $s,$ and right-shifts the result to give $(s - i = 0 \sum 129 2^{i} \cdot s_{i}) / 2^{130} .$

The halo2 Book