* use increment and decrement operators.
* remove unnecessary else branches.
* fix package comment with leading space.
* fix receiver names.
* fix error strings.
* remove omittable code.
* remove redundant return statement.
* Revert changes (code is generated.)
* use cfg as receiver name for all config-related types.
* use lsi as the receiver name for the LastSignedInfo type.
* Init `\health` rpc endpoint
* remove additional info from `\health` rpc endpoint
* Cleanup imports
* Added time threshold for health check
* Update rpc doc
* Remove unnecessary checks for blocktime creation lag
* Clean up of unnecessary config usage
Follow-up to feedback from #1286, this change simplifies the connection
handling in the SocketClient and makes the communication via TCP more
robust. It introduces the tcpTimeoutListener to encapsulate accept and
i/o timeout handling as well as connection keep-alive, this type could
likely be upgraded to handle more fine-grained tuning of the tcp stack
(linger, nodelay, etc.) according to the properties we desire. The same
methods should be applied to the RemoteSigner which will be overhauled
when the priv_val_server is fleshed out.
* require private key
* simplify connect logic
* break out conn upgrades to tcpTimeoutListener
* extend test coverage and simplify component setup
Follow-up to #1255 aligning with the expectation that the external
signing process connects to the node. The SocketClient will block on
start until one connection has been established, support for multiple
signers connected simultaneously is a planned future extension.
* SocketClient accepts connection
* PrivValSocketServer renamed to RemoteSigner
* extend tests
To achieve faster feedback cycles for our feature PRs this change
reduces the average buildtime from 35 to ~6min by utilising their new
2.0 offering based on docker and nomad. We make use of parallel build
steps wherever possible so that the duration is determined by the
slowest test suite (p2p).
This is an intermediate step until we move our CI/CD completely
on-premise for more control and added security.
* expose AuthEnc in the P2P config
if AuthEnc is true, dialed peers must have a node ID in the address and
it must match the persistent pubkey from the secret handshake.
Refs #1157
* fixes after my own review
* fix docs
* fix build failure
```
p2p/pex/pex_reactor_test.go:288:88: cannot use seed.NodeInfo().NetAddress() (type *p2p.NetAddress) as type string in array or slice literal
```
* p2p: introduce peerConn to simplify peer creation
* Introduce `peerConn` containing the known fields of `peer`
* `peer` only created in `sw.addPeer` once handshake is complete and NodeInfo is checked
* Eliminates some mutable variables and makes the code flow better
* Simplifies the `newXxxPeer` funcs
* Use ID instead of PubKey where possible.
* SetPubKeyFilter -> SetIDFilter
* nodeInfo.Validate takes ID
* remove peer.PubKey()
* persistent node ids
* fixes from review
* test: use ip_plus_id.sh more
* fix invalid memory panic during fast_sync test
```
2018-02-21T06:30:05Z box887.localdomain docker/local_testnet_4[14907]: panic: runtime error: invalid memory address or nil pointer dereference
2018-02-21T06:30:05Z box887.localdomain docker/local_testnet_4[14907]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x98dd3e]
2018-02-21T06:30:05Z box887.localdomain docker/local_testnet_4[14907]:
2018-02-21T06:30:05Z box887.localdomain docker/local_testnet_4[14907]: goroutine 3432 [running]:
2018-02-21T06:30:05Z box887.localdomain docker/local_testnet_4[14907]: github.com/tendermint/tendermint/p2p.newOutboundPeerConn(0xc423fd1380, 0xc420933e00, 0x1, 0x1239a60, 0
xc420128c40, 0x2, 0x42caf6, 0xc42001f300, 0xc422831d98, 0xc4227951c0, ...)
2018-02-21T06:30:05Z box887.localdomain docker/local_testnet_4[14907]: #011/go/src/github.com/tendermint/tendermint/p2p/peer.go:123 +0x31e
2018-02-21T06:30:05Z box887.localdomain docker/local_testnet_4[14907]: github.com/tendermint/tendermint/p2p.(*Switch).addOutboundPeerWithConfig(0xc4200ad040, 0xc423fd1380, 0
xc420933e00, 0xc423f48801, 0x28, 0x2)
2018-02-21T06:30:05Z box887.localdomain docker/local_testnet_4[14907]: #011/go/src/github.com/tendermint/tendermint/p2p/switch.go:455 +0x12b
2018-02-21T06:30:05Z box887.localdomain docker/local_testnet_4[14907]: github.com/tendermint/tendermint/p2p.(*Switch).DialPeerWithAddress(0xc4200ad040, 0xc423fd1380, 0x1, 0x
0, 0x0)
2018-02-21T06:30:05Z box887.localdomain docker/local_testnet_4[14907]: #011/go/src/github.com/tendermint/tendermint/p2p/switch.go:371 +0xdc
2018-02-21T06:30:05Z box887.localdomain docker/local_testnet_4[14907]: github.com/tendermint/tendermint/p2p.(*Switch).reconnectToPeer(0xc4200ad040, 0x123e000, 0xc42007bb00)
2018-02-21T06:30:05Z box887.localdomain docker/local_testnet_4[14907]: #011/go/src/github.com/tendermint/tendermint/p2p/switch.go:290 +0x25f
2018-02-21T06:30:05Z box887.localdomain docker/local_testnet_4[14907]: created by github.com/tendermint/tendermint/p2p.(*Switch).StopPeerForError
2018-02-21T06:30:05Z box887.localdomain docker/local_testnet_4[14907]: #011/go/src/github.com/tendermint/tendermint/p2p/switch.go:256 +0x1b7
```
Following ADDR 008 the node will connect to an external
process to handle signing requests. Operation of the external process is
left to the user.
* introduce alias for PrivValidator interface on socket client
* integrate socket client in node
* structure tests
* remove unnecessary flag
As calls to the private validator can involve side-effects like network
communication it is desirable for all methods returning an error to not
break the control flow of the caller.
* adjust PrivValidator interface
Fixes https://github.com/tendermint/tendermint/issues/1189
For every TxEventBuffer.Flush() invoking, we were invoking
a:
b.events = make([]EventDataTx, 0, b.capacity)
whose intention is to innocently clear the events slice but
maintain the underlying capacity.
However, unfortunately this is memory and garbage collection intensive
which is linear in the number of events added. If an attack had access
to our code somehow, invoking .Flush() in tight loops would be a sure
way to cause huge GC pressure, and say if they added about 1e9
events maliciously, every Flush() would take at least 3.2seconds
which is enough to now control our application.
The new using of the capacity preserving slice clearing idiom
takes a constant time regardless of the number of elements with zero
allocations so we are killing many birds with one stone i.e
b.events = b.events[:0]
For benchmarking results, please see
https://gist.github.com/odeke-em/532c14ab67d71c9c0b95518a7a526058
for a reference on how things can get out of hand easily.
if we call it after, we might receive a "fresh" transaction from
`broadcast_tx_sync` before old transactions (which were not
committed).
Refs #1091
```
Commit is called with a lock on the mempool, meaning no calls to CheckTx
can start. However, since CheckTx is called async in the mempool
connection, some CheckTx might have already "sailed", when the lock is
released in the mempool and Commit proceeds.
Then, that spurious CheckTx has not yet "begun" in the ABCI app (stuck
in transport?). Instead, ABCI app manages to start to process the
Commit. Next, the spurious, "sailed" CheckTx happens in the wrong place.
```
* Vulnerability in light client proxy
When calling GetCertifiedCommit the light client proxy would call
Certify and even on error return the Commit as if it had been correctly
certified.
Now it returns the error correctly and returns an empty Commit on error.
* Improve names for clarity
The lite package now contains StaticCertifier, DynamicCertifier and
InqueringCertifier. This also changes the method receivers from one
letter to two letter names, which will make future refactoring easier
and follows the coding standards.
* Fix test failures
* Rename files
* remove dead code
types/vote_test.go now checks signature on a serialized and
then deserialized vote. Turns out go-wire time encoding doesn't
respect timezones, and the signatures don't check out.
comment out failing consensus tests for now
rewrite rpc httpclient to use new pubsub package
import pubsub as tmpubsub, query as tmquery
make event IDs constants
EventKey -> EventTypeKey
rename EventsPubsub to PubSub
mempool does not use pubsub
rename eventsSub to pubsub
new subscribe API
fix channel size issues and consensus tests bugs
refactor rpc client
add missing discardFromChan method
add mutex
rename pubsub to eventBus
remove IsRunning from WSRPCConnection interface (not needed)
add a comment in broadcastNewRoundStepsAndVotes
rename registerEventCallbacks to broadcastNewRoundStepsAndVotes
See https://dave.cheney.net/2014/03/19/channel-axioms
stop eventBuses after reactor tests
remove unnecessary Unsubscribe
return subscribe helper function
move discardFromChan to where it is used
subscribe now returns an err
this gives us ability to refuse to subscribe if pubsub is at its max
capacity.
use context for control overflow
cache queries
handle err when subscribing in replay_test
rename testClientID to testSubscriber
extract var
set channel buffer capacity to 1 in replay_file
fix byzantine_test
unsubscribe from single event, not all events
refactor httpclient to return events to appropriate channels
return failing testReplayCrashBeforeWriteVote test
fix TestValidatorSetChanges
refactor code a bit
fix testReplayCrashBeforeWriteVote
add comment
fix TestValidatorSetChanges
fixes from Bucky's review
update comment [ci skip]
test TxEventBuffer
update changelog
fix TestValidatorSetChanges (2nd attempt)
only do wg.Done when no errors
benchmark event bus
create pubsub server inside NewEventBus
only expose config params (later if needed)
set buffer capacity to 0 so we are not testing cache
new tx event format: key = "Tx" plus a tag {"tx.hash": XYZ}
This should allow to subscribe to all transactions! or a specific one
using a query: "tm.events.type = Tx and tx.hash = '013ABF99434...'"
use TimeoutCommit instead of afterPublishEventNewBlockTimeout
TimeoutCommit is the time a node waits after committing a block, before
it goes into the next height. So it will finish everything from the last
block, but then wait a bit. The idea is this gives it time to hear more
votes from other validators, to strengthen the commit it includes in the
next block. But it also gives it time to hear about new transactions.
waitForBlockWithUpdatedVals
rewrite WAL crash tests
Task:
test that we can recover from any WAL crash.
Solution:
the old tests were relying on event hub being run in the same thread (we
were injecting the private validator's last signature).
when considering a rewrite, we considered two possible solutions: write
a "fuzzy" testing system where WAL is crashing upon receiving a new
message, or inject failures and trigger them in tests using something
like https://github.com/coreos/gofail.
remove sleep
no cs.Lock around wal.Save
test different cases (empty block, non-empty block, ...)
comments
add comments
test 4 cases: empty block, non-empty block, non-empty block with smaller part size, many blocks
fixes as per Bucky's last review
reset subscriptions on UnsubscribeAll
use a simple counter to track message for which we panicked
also, set a smaller part size for all test cases
Updates https://github.com/tendermint/tendermint/issues/693
* Adjusted Heartbeat.Copy to return nil on
trying to copy a nil value instead of panicking.
* Also documented that WriteSignBytes panics
if the Heartbeat is nil.