Reviewer note: Does not touch any of the business logic. Avoided
renaming files whereever possible to make it easier to spot differences.
Verbatim migration, in a future CL, we could replace some of the
flag validation code with cobra features and eliminate the global vars.
Moved the dlv tool definition out of the way for the top-level wrapper.
tools/bin/cobra is a helper utility that generates boilerplate
(we slightly deviate from their default scheme by having guardiand
in a separate package, rather than stuffing everything into cmd/)
ghstack-source-id: caec9a38a6
Pull Request resolved: https://github.com/certusone/wormhole/pull/67
- Metrics tracked in #11.
- Timeout and retransmits covered in #21.
- Dependency injection doesn't make sense at this scale.
- `-1` on `GenerateKeyPair` means "this doesn't make sense for Ed25519,
please crash if anyone ever tried to generate RSA keys".
ghstack-source-id: 8951628351
Pull Request resolved: https://github.com/certusone/wormhole/pull/66
Supervisor does not back off tasks that failed in a healthy state.
There are a couple places where we rely on supervisor for
application-level backoff, so we always want back-off. The distinction
is meant to enable runnables to implement their own specific back-off
logic, which we don't, so we can safely ignore it.
Fixes#37
ghstack-source-id: c756381b1b
Pull Request resolved: https://github.com/certusone/wormhole/pull/64
VAAs are deduplicated by the on-chain contracts. For Ethereum,
submission happens outside of the bridge anyway, and for Solana, the
first tx to be confirmed wins. Subsequent attempts to submit it
will fail in preflight, so the fee won't be spent multiple times.
This eliminates the need for leader selection and fixes#20.
ghstack-source-id: 60388d532c
Pull Request resolved: https://github.com/certusone/wormhole/pull/51
In particular, this fixes a race condition where the Solana devnet would
take longer to deploy than the ETH devnet to deploy and we'd end up
with an outdated guardian set on Solana.
We currently create a Goroutine for every pending resubmission, which
waits and blocks on the channel until solwatch is processing requests
again. This is effectively an unbounded queue. An alternative approach
would be a channel with sufficient capacity plus backoff.
Test Plan: Deployed without solana-devnet, waited for initial guardian
set change VAA to be requeued, then deployed solana-devnet.
The VAA was successfully submitted once the transient error resolved:
```
[...]
21:08:44.712Z ERROR wormhole-guardian-0.supervisor Runnable died {"dn": "root.solwatch", "error": "returned error when NODE_STATE_HEALTHY: failed to receive message from agent: EOF"}
21:08:44.712Z INFO wormhole-guardian-0.supervisor rescheduling supervised node {"dn": "root.solwatch", "backoff": 0.737286432}
21:08:45.451Z INFO wormhole-guardian-0.root.solwatch watching for on-chain events
21:08:50.031Z ERROR wormhole-guardian-0.root.solwatch failed to submit VAA {"error": "rpc error: code = Canceled desc = stream terminated by RST_STREAM with error code: CANCEL", "digest": "79[...]"}
21:08:50.031Z ERROR wormhole-guardian-0.root.solwatch requeuing VAA {"error": "rpc error: code = Canceled desc = stream terminated by RST_STREAM with error code: CANCEL", "digest": "79[...]"}
21:09:02.062Z INFO wormhole-guardian-0.root.solwatch submitted VAA {"tx_sig": "4EKmH[...]", "digest": "79[...]"}
```
ghstack-source-id: 1b1d05a4cb
Pull Request resolved: https://github.com/certusone/wormhole/pull/48