Solution: by default, disallow use of non-TLS RPC endpoints
For testing, there's an escape hatch of a command line
argument `--allow-insecure-rpc-endpoints` (purposefully
long) that will reduce the severity of using a non-TLS
RPC endpoint to a warning in a log file.
It was not made to be a configuration file option to reduce
the risk of this option slipping into a production configuration
file by mistake.
Closes#79
Solution: introduce `bridge -v|--version` flag to print the version
The version is built off `git describe`, however, as a backup,
if `git describe` fails (stripped .git, etc.), this will fall back
to whatever is specified in Cargo.toml
Fixes#87
Currently, gas-prices are set upon bridge startup via the
Users's config TOML file; this value remains constant for the
life of the Bridge.
Solution: create a mechanism that asynchronously queries
gas-prices from an "Oracle" service on a timed interval. This
mechanism should be a stream of gas-prices that can be polled
from the Bridge.
If the error like this appears in the logs:
```
INFO:bridge::bridge::withdraw_confirm: waiting for new withdraws that
should get signed
WARN:bridge: Bridge crashed with Error(Transport("Incomplete"), State {
next_error: None, backtrace: None })
Error(Transport("Incomplete"), State { next_error: None, backtrace: None
})
```
it is hard to understand which side of the bridge failed. The message
must contains type of operation (`deposit_relay`, `withdraw_confirm` or
`withdraw_relay`) and side of bridge (URL of RPC channel).
Solution: record error's top level context and print it out if recorded
Addresses #75
The current behavior for logs displayed during the bridge initialization
is not consistent - home url is reported whereas foreign url is not.
Solution: report it
Fixes#69
This is because it is limiting them to one at a time
per operation type. This was done so that there's no
gaps in nonces due to undelivered transactions.
Solution: allow concurrent sending of transactions
By default, 100 transactions are allowed.
Note, however, that now there's a chance that nonce
gaps may be formed under cerain circumstances.
There are even sometimes incorrectly deducted.
There are more situations that can be distinguished -- for example,
nonce re-use. This particular error will be conflated with insufficient
funds because they share the error code in the JSON-RPC respponse.
Proposed solution: discriminate JSON-RPC responses with 32010 code
according to their message.
Closes#54
Bridge's contracts are now developed in a separate repository
and have their own deployment procedure:
https://github.com/poanetwork/poa-parity-bridge-contracts
However, our integration tests are not yet updated to
use this deployment procedure.
Solution: disable deployment compile-time by default
and only use it in integration tests as a stopgap measure
until the new deployment procedure (or any other viable
alternative) has been used.
In cases when the node is backed by a cluster of nodes,
one node will not share the same information with the
other, hence it will not be able to report nonce reuse,
ultimately leading to lost transactions as they are
discarded later.
Solution: combine getTransactionCount with an internal counter
so that validator controls its own nonces, but in case if
something external happens, it can reset itself against
those externalities.
Unfortunately, bridge will still reuse nonce very often.
Specifically when trying to send more than one transaction at
a time, clearly a faulty behaviour.
Solution: chain retrieving a nonce with subsequent sending
of the transaction.
However, chaining these is not enough as it'll still fail.
This is happening because bridge module is polling all its components
(deposit_relay, withdraw_confirm, withdraw_relay) sequentially,
and some of them maybe waiting on their transactions to go through.
However, those transactions are also done as composed futures of nonce
retrieval and transaction sending. This means that it is very often
that first, these futures will go through the nonce acquisition process,
get the same values, and then submit transactions with the same nonce.
This patch makes NonceCheck future check if the transaction failed
with this specific issue of nonce reuse and effectively restarts from
the beginning in that case, repeating nonce acquisition process... until
it succeeeds.
On my computer, this takes approximately 0.3 seconds, which is clearly
a deal-breaker. In retrospect, this is an obvious problem because
of key derivation function use.
Solution: unlock accounts permanently.
This cut down time to sign one transaction is 0.0001 or so.
This means that the node has to sign the transaction itself.
It might be acceptable in a localized setup, but can't be used
with untrusted setups. For example, once HTTP RPC is supported,
we can't really use infrastructure like INFURA to send transactions.
Solution: switch to signing transactions in bridge
This absolutely requires separating the accounts used by validators
and administrative tasks as this will otherwise interfere with
management of nonces.
Using IPC means bridge has to run alognside the node
on the same machine. This, at times, presents problems
in terms of efficiency or coupling of deployment.
Solution: switch to RPC
Currently there are two possible situations related to low balance on
the account which is used for bridge operations:
1. The account which is used to sign transactions to be addressed by
ForeignBridge contract has low balance. So, the bridge is not able to do
deposit_relay and withdraw_confirm.
2. The account which is used to sign transactions to be addressed by
HomeBridge contract has low balance. So, the bridge is not able to do
withdraw_relay.
In both cases bridges hangs silently at the moment of sending
transactions and does not proceed with further actions even the
operation is intended to be performed in opposite direction (e.g. the
bridge hangs at the moment to perform withdraw_relay, so deposit_relay
cannot be performed either).
Solution: make bridge track its balance and hande insufficient
Bridge will crash with ERR_INSUFFICIENT_FUNDS (code 4) so that
supervisor can decide what should happen next. It will also log the
condition.
P.S.Make sure to run the tests with `--test-threads=1` to avoid
other test conflicting with this one. A better solution to this
issue must be devised later, however.
It is impossible to tell whether the bridge
is being shut down intentionally or because of
an error. This is particularly important
for supervising the process, both in development
and production.
Solution: handle SIGINT and SIGTERM as a special case
and designate a separate status code (3) for intentional
shutdowns.
Also, include an example supervisor for development
mode (examples/suprevisor). Simply prepend it before
the invocation of bridge to supervise it.
Currently, bridge will try to handle errors in some
way in order to attempt to restore its functionality.
However, this limits as to how the errors can be handled
as every time a change in handling is needed, a patch
for bridge will needed.
Overtime, this will inevitably grow into a full-fledged
supervisor.
However, there are already supervisor programs out there
(starting from all-encompassing systemd down to small
supervision utilities)
Solution: revert the handling of errors to the old behaviour
but (very importantly) make bridge return meaningful error
codes for all known error types so that the supervisor
can make a proper decision as to what has to be done
(restart, delayed restart, permanent shutdown, notification,
etc.)
Steps to reproduce:
Run two Parity-based nodes responsible for Home and Foreign chains.
Run bridge: RUST_LOG=info bridge --config ... --database ....
Kill parity process responsible for Foreign chain.
Expected results:
The bridge handles gracefully death of Parity node: warns about the
connection lose, shutdowns all operations (deposit_relay,
withdraw_confirm and withdraw_relay) for a while, waits when the
connection appears and runs all operations after that.
Actual results:
After killing Parity process the following appear in the terminal where
the bridge is running:
WARN:<unknown>: Unexpected IO error: Error { repr: Os { code: 32,
message: "Broken pipe" } }
No messages appear from withdraw_confirm and withdraw_relay.
Then after some time (few seconds or few minutes) the following appear
on the terminal and the bridge dies:
Request eth_blockNumber timed out
Solution: once "Broken pipe" error is caught, attempt to
reconnect repeatedly with a pause of 1 second between attempts.
When other errors are caught, simply restart the bridge,
as there is no indication that the connection has been severed.
Fixes#22