Commit Graph

37 Commits

Author SHA1 Message Date
Ethan Buchman 30f675aafa
Merge pull request #839 from tendermint/bugfix/pubsub-failures
Fix nondeterministic tests failures related to pubsub
2017-11-14 18:13:47 +00:00
Ethan Buchman aba8a8f4fc consensus: crank timeout in timeoutWaitGroup 2017-11-12 06:41:15 +00:00
Anton Kaliaev 7fa12662c4
check whatever we can read from the channel
```
panic: interface conversion: interface {} is nil, not types.TMEventData

goroutine 7690 [running]:
github.com/tendermint/tendermint/consensus.waitForAndValidateBlock.func1(0xc427727620, 0x3)
        /go/src/github.com/tendermint/tendermint/consensus/reactor_test.go:292 +0x62b
created by github.com/tendermint/tendermint/consensus.timeoutWaitGroup
        /go/src/github.com/tendermint/tendermint/consensus/reactor_test.go:349 +0xa4
exit status 2
FAIL    github.com/tendermint/tendermint/consensus      38.614s

```
2017-11-10 18:16:31 -05:00
Anton Kaliaev f6539737de
new pubsub package
comment out failing consensus tests for now

rewrite rpc httpclient to use new pubsub package

import pubsub as tmpubsub, query as tmquery

make event IDs constants
EventKey -> EventTypeKey

rename EventsPubsub to PubSub

mempool does not use pubsub

rename eventsSub to pubsub

new subscribe API

fix channel size issues and consensus tests bugs

refactor rpc client

add missing discardFromChan method

add mutex

rename pubsub to eventBus

remove IsRunning from WSRPCConnection interface (not needed)

add a comment in broadcastNewRoundStepsAndVotes

rename registerEventCallbacks to broadcastNewRoundStepsAndVotes

See https://dave.cheney.net/2014/03/19/channel-axioms

stop eventBuses after reactor tests

remove unnecessary Unsubscribe

return subscribe helper function

move discardFromChan to where it is used

subscribe now returns an err

this gives us ability to refuse to subscribe if pubsub is at its max
capacity.

use context for control overflow

cache queries

handle err when subscribing in replay_test

rename testClientID to testSubscriber

extract var

set channel buffer capacity to 1 in replay_file

fix byzantine_test

unsubscribe from single event, not all events

refactor httpclient to return events to appropriate channels

return failing testReplayCrashBeforeWriteVote test

fix TestValidatorSetChanges

refactor code a bit

fix testReplayCrashBeforeWriteVote

add comment

fix TestValidatorSetChanges

fixes from Bucky's review

update comment [ci skip]

test TxEventBuffer

update changelog

fix TestValidatorSetChanges (2nd attempt)

only do wg.Done when no errors

benchmark event bus

create pubsub server inside NewEventBus

only expose config params (later if needed)

set buffer capacity to 0 so we are not testing cache

new tx event format: key = "Tx" plus a tag {"tx.hash": XYZ}

This should allow to subscribe to all transactions! or a specific one
using a query: "tm.events.type = Tx and tx.hash = '013ABF99434...'"

use TimeoutCommit instead of afterPublishEventNewBlockTimeout

TimeoutCommit is the time a node waits after committing a block, before
it goes into the next height. So it will finish everything from the last
block, but then wait a bit. The idea is this gives it time to hear more
votes from other validators, to strengthen the commit it includes in the
next block. But it also gives it time to hear about new transactions.

waitForBlockWithUpdatedVals

rewrite WAL crash tests

Task:
test that we can recover from any WAL crash.

Solution:
the old tests were relying on event hub being run in the same thread (we
were injecting the private validator's last signature).

when considering a rewrite, we considered two possible solutions: write
a "fuzzy" testing system where WAL is crashing upon receiving a new
message, or inject failures and trigger them in tests using something
like https://github.com/coreos/gofail.

remove sleep

no cs.Lock around wal.Save

test different cases (empty block, non-empty block, ...)

comments

add comments

test 4 cases: empty block, non-empty block, non-empty block with smaller part size, many blocks

fixes as per Bucky's last review

reset subscriptions on UnsubscribeAll

use a simple counter to track message for which we panicked

also, set a smaller part size for all test cases
2017-10-30 00:32:22 -05:00
Ethan Buchman 591dd9e662 dont catchupReplay on wal if we fast synced 2017-10-27 10:46:19 -04:00
Ethan Buchman 75b97a5a65 PrivValidatorFS is like old PrivValidator, for now 2017-09-21 16:46:31 -04:00
Ethan Buchman 4382c8d28b fix tests 2017-09-21 15:52:25 -04:00
Ethan Buchman c5a657f540 consensus: test proposal heartbeat 2017-08-10 01:24:23 -04:00
Anton Kaliaev 1dfb95f719
[consensus] color code different consensus instances in consensus tests
(Refs #492)
2017-05-15 09:35:29 +02:00
Anton Kaliaev f8fdbe3dbc
changes as per Bucky's review 2017-05-13 16:22:51 +02:00
Anton Kaliaev f803544195
new logging 2017-05-13 10:24:58 +02:00
Ethan Buchman 92bafa7ecd consensus: fix tests 2017-05-04 22:46:13 -04:00
Ethan Buchman 07e59e63f9 TMEventDataInner 2017-04-28 17:57:06 -04:00
Ethan Buchman 56c60fba23 go-p2p -> tendermint/p2p 2017-04-21 18:19:41 -04:00
Ethan Buchman d1926bcad1 use tmlibs 2017-04-21 18:12:54 -04:00
Ethan Buchman c147b41013 TMSP -> ABCI 2017-01-12 15:53:32 -05:00
Ethan Buchman 3c589dac19 startConsensusNet and stopConsensusNet 2017-01-12 02:29:53 -05:00
Ethan Buchman ce0c638005 little fix 2017-01-11 18:37:36 -05:00
Anton Kalyaev 535fc6cd63 test we can make blocks with skip_timeout_commit=false 2017-01-11 18:00:27 -05:00
Anton Kalyaev 3308ac7d83 set skip_timeout_commit to true for tests
For the tests its better to not use the timeout_commit, and to wait for all the
votes, because otherwise we can end up with timing dependencies in the testing
code which can lead to nondeterministic failures. That was part of the reason
for this change originally.
2017-01-11 18:00:26 -05:00
Anton Kalyaev a1fd312bb1 make progress asap on full precommit votes optional (Refs #348) 2017-01-11 18:00:26 -05:00
Ethan Buchman d68cdce2d5 consensus: check HasAll when TwoThirdsMajority 2017-01-11 17:53:46 -05:00
Anton Kalyaev cb2f2b94ee log stages to stdout 2017-01-11 10:35:04 -05:00
Anton Kalyaev 4722410e5e test validator set changes more extensively 2017-01-11 10:35:04 -05:00
Ethan Buchman e5fb681615 consensus: remove crankTimeoutPropose from tests 2016-12-22 22:03:42 -05:00
Ethan Buchman c9698e4848 fixes from review 2016-12-22 22:03:42 -05:00
Ethan Buchman 706dd1d6c5 test: dont start cs until all peers connected 2016-12-19 19:50:40 -05:00
Ethan Buchman faf23aa0d4 consensus: TimeoutTicker, skip TimeoutCommit on HasAll 2016-12-19 15:42:36 -05:00
Ethan Buchman de6bba4609 test: randConsensusNet takes more args 2016-12-17 14:45:20 -05:00
Ethan Buchman 8df32cd540 test: increase proposal timeout 2016-12-06 19:54:10 -05:00
Ethan Buchman 2f9063c1d6 consensus: test validator set change 2016-11-23 18:20:46 -05:00
Ethan Buchman a3d863f83b consensus: track index of privVal 2016-11-22 20:38:14 -05:00
Jae Kwon 3e3b034252 Make ConsensusReactor use ConsensusState's blockstore; debug functions 2016-11-15 18:48:34 -05:00
Ethan Buchman 9d0c7f6ec7 fix bft test. still halts 2016-11-15 18:47:19 -05:00
Ethan Buchman 5f55ed2a40 consensus: ensure dir for cswal on reactor tests 2016-11-15 18:45:36 -05:00
Ethan Buchman 57da2e4af5 make byzantine logic testable 2016-11-15 18:45:36 -05:00
Ethan Buchman f837252ff1 consensus: test reactor 2016-11-15 18:37:33 -05:00