To achieve faster feedback cycles for our feature PRs this change
reduces the average buildtime from 35 to ~6min by utilising their new
2.0 offering based on docker and nomad. We make use of parallel build
steps wherever possible so that the duration is determined by the
slowest test suite (p2p).
This is an intermediate step until we move our CI/CD completely
on-premise for more control and added security.
In order to improve the operator experience we want the node to dial
seeds immediately if there are no peers to connect to. Until now the
routine responsible for ensuring peers are connected to would wait
a random amount of time up to 30s (if not configured otherwise).
* expose AuthEnc in the P2P config
if AuthEnc is true, dialed peers must have a node ID in the address and
it must match the persistent pubkey from the secret handshake.
Refs #1157
* fixes after my own review
* fix docs
* fix build failure
```
p2p/pex/pex_reactor_test.go:288:88: cannot use seed.NodeInfo().NetAddress() (type *p2p.NetAddress) as type string in array or slice literal
```
* p2p: introduce peerConn to simplify peer creation
* Introduce `peerConn` containing the known fields of `peer`
* `peer` only created in `sw.addPeer` once handshake is complete and NodeInfo is checked
* Eliminates some mutable variables and makes the code flow better
* Simplifies the `newXxxPeer` funcs
* Use ID instead of PubKey where possible.
* SetPubKeyFilter -> SetIDFilter
* nodeInfo.Validate takes ID
* remove peer.PubKey()
* persistent node ids
* fixes from review
* test: use ip_plus_id.sh more
* fix invalid memory panic during fast_sync test
```
2018-02-21T06:30:05Z box887.localdomain docker/local_testnet_4[14907]: panic: runtime error: invalid memory address or nil pointer dereference
2018-02-21T06:30:05Z box887.localdomain docker/local_testnet_4[14907]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x98dd3e]
2018-02-21T06:30:05Z box887.localdomain docker/local_testnet_4[14907]:
2018-02-21T06:30:05Z box887.localdomain docker/local_testnet_4[14907]: goroutine 3432 [running]:
2018-02-21T06:30:05Z box887.localdomain docker/local_testnet_4[14907]: github.com/tendermint/tendermint/p2p.newOutboundPeerConn(0xc423fd1380, 0xc420933e00, 0x1, 0x1239a60, 0
xc420128c40, 0x2, 0x42caf6, 0xc42001f300, 0xc422831d98, 0xc4227951c0, ...)
2018-02-21T06:30:05Z box887.localdomain docker/local_testnet_4[14907]: #011/go/src/github.com/tendermint/tendermint/p2p/peer.go:123 +0x31e
2018-02-21T06:30:05Z box887.localdomain docker/local_testnet_4[14907]: github.com/tendermint/tendermint/p2p.(*Switch).addOutboundPeerWithConfig(0xc4200ad040, 0xc423fd1380, 0
xc420933e00, 0xc423f48801, 0x28, 0x2)
2018-02-21T06:30:05Z box887.localdomain docker/local_testnet_4[14907]: #011/go/src/github.com/tendermint/tendermint/p2p/switch.go:455 +0x12b
2018-02-21T06:30:05Z box887.localdomain docker/local_testnet_4[14907]: github.com/tendermint/tendermint/p2p.(*Switch).DialPeerWithAddress(0xc4200ad040, 0xc423fd1380, 0x1, 0x
0, 0x0)
2018-02-21T06:30:05Z box887.localdomain docker/local_testnet_4[14907]: #011/go/src/github.com/tendermint/tendermint/p2p/switch.go:371 +0xdc
2018-02-21T06:30:05Z box887.localdomain docker/local_testnet_4[14907]: github.com/tendermint/tendermint/p2p.(*Switch).reconnectToPeer(0xc4200ad040, 0x123e000, 0xc42007bb00)
2018-02-21T06:30:05Z box887.localdomain docker/local_testnet_4[14907]: #011/go/src/github.com/tendermint/tendermint/p2p/switch.go:290 +0x25f
2018-02-21T06:30:05Z box887.localdomain docker/local_testnet_4[14907]: created by github.com/tendermint/tendermint/p2p.(*Switch).StopPeerForError
2018-02-21T06:30:05Z box887.localdomain docker/local_testnet_4[14907]: #011/go/src/github.com/tendermint/tendermint/p2p/switch.go:256 +0x1b7
```
don't bother with this "only ping when we havent heard from them". lets
just always ping every peer from the sendRoutine every 10s no matter
what. if they dont pong within pongTimeout, disconnect :)
https://play.golang.org/p/gN21yO9IRs3
```
func waitWithCancel(f func() *clist.CElement, ctx context.Context) *clist.CElement {
el := make(chan *clist.CElement, 1)
select {
case el <- f():
```
will just run f() blockingly, so this doesn't change much in terms of behavior.
* linter: address gosimple lints
* linter: make deterministic & a rebase fix
* lint/rpc: fix a gosimple lint
* run linter in CI
* fix rebase mistake
* fix makefile
* ugh
* revert Makefile
* add metalinter to CI
* try this
* linter: last little fix
* need glide
* better
* okayy circle, have it your way
* lints: gosimple
* pr comments
Fixes https://github.com/tendermint/tendermint/issues/875
Ensure that every DialSeeds call uses a new PRNG seeded from
tendermint/tmlibs/common.RandInt which internally uses
crypto/rand to seed its source.
Updates https://github.com/tendermint/tendermint/issues/850
My security alarms falsely blarred when I skimmed and noticed
keys being compared with `==`, without the proper context
so I mistakenly filed an issue, yet the purpose of that
comparison was to check if the local ephemeral public key
was just the least, sorted lexicographically.
Anyways, let's use the proper bytes.Equal check, to save future labor.
Fixes https://github.com/tendermint/tendermint/issues/851
Go1.9 and below's net.Pipe did not implement the SetDeadline
method so after commit
e2dd8ca946
this problem was exposed since now we check for errors.
To counter this problem, implement a simple composition for
net.Conn that always returns nil on SetDeadline instead of
tripping out.
Added build tags so that anyone using go1.10 when it is released
will be able to automatically use net.Pipe's net.Conns
Noticed while auditing the code that we aren't respecting
(*net.Conn) SetDeadline errors which return after
a connection has been killed and is simultaneously
being used.
For example given program, without SetDeadline error checks
```go
package main
import (
"log"
"net"
"time"
)
func main() {
conn, err := net.Dial("tcp", "tendermint.com:443")
if err != nil {
log.Fatal(err)
}
go func() {
<-time.After(400 * time.Millisecond)
conn.Close()
}()
for i := 0; i < 5; i++ {
if err := conn.SetDeadline(time.Now().Add(time.Duration(10 * time.Second))); err != nil {
log.Fatalf("set deadline #%d, err: %v", i, err)
}
log.Printf("Successfully set deadline #%d", i)
<-time.After(150 * time.Millisecond)
}
}
```
erraneously gives
```shell
2017/11/14 17:46:28 Successfully set deadline #0
2017/11/14 17:46:29 Successfully set deadline #1
2017/11/14 17:46:29 Successfully set deadline #2
2017/11/14 17:46:29 Successfully set deadline #3
2017/11/14 17:46:29 Successfully set deadline #4
```
However, if we properly fix it to respect that error with
```diff
--- wild.go 2017-11-14 17:44:38.000000000 -0700
+++ main.go 2017-11-14 17:45:40.000000000 -0700
@@ -16,7 +16,9 @@
conn.Close()
}()
for i := 0; i < 5; i++ {
- conn.SetDeadline(time.Now().Add(time.Duration(10 * time.Second)))
+ if err := conn.SetDeadline(time.Now().Add(time.Duration(10 *
time.Second))); err != nil {
+ log.Fatalf("set deadline #%d, err: %v", i, err)
+ }
log.Printf("Successfully set deadline #%d", i)
<-time.After(150 * time.Millisecond)
}
```
properly catches any problems and gives
```shell
$ go run main.go
2017/11/14 17:43:44 Successfully set deadline #0
2017/11/14 17:43:45 Successfully set deadline #1
2017/11/14 17:43:45 Successfully set deadline #2
2017/11/14 17:43:45 set deadline #3, err: set tcp 10.182.253.51:57395:
use of closed network connection
exit status 1
```
Just noticed while auditing the code in p2p/addrbook.go,
wg.Add(1) but no subsequent defer.
@jaekwon and I had a discussion offline and we agreed to
comment about why the code was that way and why
we shouldn't move the wg.Add(1) into .saveRoutine() because
if go a.saveRoutine() isn't started before anyone invokes
a.Wait(), then we'd have raced a.saveRoutine().
* Full test PeerSet and check its concurrent guarantees
* Improve the doc for PeerSet.Has and remove unnecessary
defer for a path that sets a variable, make it fast anyways.
* Parallelize PeerSet tests with t.Parallel()
* Document functions in peer_set.go more.