added initial trust metric design doc and code

2017-10-24 20:28:20 -04:00 · 2017-10-24 20:28:20 -04:00 · e160a6198c
parent 30f675aafa
commit e160a6198c
4 changed files with 511 additions and 0 deletions
--- a/docs/architecture/adr-006-trust-metric.md
+++ b/docs/architecture/adr-006-trust-metric.md
@ -0,0 +1,117 @@
+# Trust Metric Design
+
+## Overview
+
+The proposed trust metric will allow Tendermint to maintain local trust rankings for peers it has directly interacted with, which can then be used to implement soft security controls. The calculations were obtained from the [TrustGuard](https://dl.acm.org/citation.cfm?id=1060808) project.
+
+## Background
+
+The Tendermint Core project developers would like to improve Tendermint security and reliability by keeping track of the level of trustworthiness peers have demonstrated within the peer-to-peer network. This way, undesirable outcomes from peers will not immediately result in them being dropped from the network (potentially causing drastic changes to take place). Instead, peers behavior can be monitored with appropriate metrics and be removed from the network once Tendermint Core is certain the peer is a threat. For example, when the PEXReactor makes a request for peers network addresses from a already known peer, and the returned network addresses are unreachable, this untrustworthy behavior should be tracked. Returning a few bad network addresses probably shouldn’t cause a peer to be dropped, while excessive amounts of this behavior does qualify the peer being dropped.
+
+Trust metrics can be circumvented by malicious nodes through the use of strategic oscillation techniques, which adapts the malicious node’s behavior pattern in order to maximize its goals. For instance, if the malicious node learns that the time interval of the Tendermint trust metric is *X* hours, then it could wait *X* hours in-between malicious activities. We could try to combat this issue by increasing the interval length, yet this will make the system less adaptive to recent events.
+
+Instead, having shorter intervals, but keeping a history of interval values, will give our metric the flexibility needed in order to keep the network stable, while also making it resilient against a strategic malicious node in the Tendermint peer-to-peer network. Also, the metric can access trust data over a rather long period of time while not greatly increasing its history size by aggregating older history values over a larger number of intervals, and at the same time, maintain great precision for the recent intervals. This approach is referred to as fading memories, and closely resembles the way human beings remember their experiences. The trade-off to using history data is that the interval values should be preserved in-between executions of the node.
+
+## Scope
+
+The proposed trust metric will be implemented as a Go programming language object that will allow a developer to inform the object of all good and bad events relevant to the trust object instantiation, and at any time, the metric can be queried for the current trust ranking. Methods will be provided for storing trust metric history data that is required across instantiations.
+
+## Detailed Design
+
+This section will cover the process being considered for calculating the trust ranking and the interface for the trust metric.
+
+### Proposed Process
+
+The proposed trust metric will count good and bad events relevant to the object, and calculate the percent of counters that are good over an interval with a predefined duration. This is the procedure that will continue for the life of the trust metric. When the trust metric is queried for the current **trust value**, a resilient equation will be utilized to perform the calculation.
+
+The equation being proposed resembles a Proportional-Integral-Derivative (PID) controller used in control systems. The proportional component allows us to be sensitive to the value of the most recent interval, while the integral component allows us to incorporate trust values stored in the history data, and the derivative component allows us to give weight to sudden changes in the behavior of a peer. We compute the trust value of a peer in interval i based on its current trust ranking, its trust rating history prior to interval *i* (over the past *maxH* number of intervals) and its trust ranking fluctuation. We will break up the equation into the three components.
+
+```math
+(1) Proportional Value = a * R[i]
+```
+
+where *R*[*i*] denotes the raw trust value at time interval *i* (where *i* == 0 being current time) and *a* is the weight applied to the contribution of the current reports. The next component of our equation uses a weighted sum over the last *maxH* intervals to calculate the history value for time *i*:
+ 
+
+`H[i] = ` ![formula1](https://github.com/tendermint/tendermint/blob/develop/docs/architecture/img/formula1.png "Weighted Sum Formula")
+
+
+The weights can be chosen either optimistically or pessimistically. With the history value available, we can now finish calculating the integral value:
+
+```math
+(2) Integral Value = b * H[i]
+```
+
+Where *H*[*i*] denotes the history value at time interval *i* and *b* is the weight applied to the contribution of past performance for the object being measured. The derivative component will be calculated as follows:
+
+```math
+D[i] = R[i] – H[i]
+
+(3) Derivative Value = (c * D[i]) * D[i]
+```
+
+Where the value of *c* is selected based on the *D*[*i*] value relative to zero. With the three components brought together, our trust value equation is calculated as follows:
+
+```math
+TrustValue[i] = a * R[i] + b * H[i] + (c * D[i]) * D[i]
+```
+
+As a performance optimization that will keep the amount of raw interval data being saved to a reasonable size of *m*, while allowing us to represent 2^*m* - 1 history intervals, we can employ the fading memories technique that will trade space and time complexity for the precision of the history data values by summarizing larger quantities of less recent values. While our equation above attempts to access up to *maxH* (which can be 2^*m* - 1), we will map those requests down to *m* values using equation 4 below:
+
+```math
+(4) j = index, where index > 0
+```
+
+Where *j* is one of *(0, 1, 2, … , m – 1)* indices used to access history interval data. Now we can access the raw intervals using the following calculations:
+
+```math
+R[0] = raw data for current time interval
+```
+
+`R[j] = ` ![formula2](https://github.com/tendermint/tendermint/blob/develop/docs/architecture/img/formula2.png "Fading Memories Formula")
+
+
+### Interface Detailed Design
+
+This section will cover the Go programming language API designed for the previously proposed process. Below is the interface for a TrustMetric:
+
+```go
+package trust
+
+type TrustMetric struct {
+    
+}
+
+type TrustMetricConfig struct {
+    ProportionalWeight float64
+    IntegralWeight float64
+    HistoryMaxSize int
+    IntervalLen time.Duration
+}
+
+func (tm *TrustMetric) Stop()
+
+func (tm *TrustMetric) IncBad()
+
+func (tm *TrustMetric) AddBad(num int)
+
+func (tm *TrustMetric) IncGood()
+
+func (tm *TrustMetric) AddGood(num int)
+
+// get the dependable trust value
+func (tm *TrustMetric) TrustValue() float64
+
+func NewMetric() *TrustMetric
+
+func NewMetricWithConfig(tmc *TrustMetricConfig) *TrustMetric
+
+func GetPeerTrustMetric(key string) *TrustMetric
+
+func PeerDisconnected(key string)
+
+```
+
+## References
+
+S. Mudhakar, L. Xiong, and L. Liu, “TrustGuard: Countering Vulnerabilities in Reputation Management for Decentralized Overlay Networks,” in *Proceedings of the 14th international conference on World Wide Web, pp. 422-431*, May 2005.
--- a/docs/architecture/img/formula1.png
+++ b/docs/architecture/img/formula1.png
--- a/docs/architecture/img/formula2.png
+++ b/docs/architecture/img/formula2.png
--- a/p2p/trust/trustmetric.go
+++ b/p2p/trust/trustmetric.go
@ -0,0 +1,394 @@
+package trust
+
+import (
+	"encoding/json"
+	"io/ioutil"
+	"math"
+	"os"
+	"path/filepath"
+	"time"
+)
+
+var (
+	store *trustMetricStore
+)
+
+type peerMetricRequest struct {
+	Key  string
+	Resp chan *TrustMetric
+}
+
+type trustMetricStore struct {
+	PeerMetrics map[string]*TrustMetric
+	Requests    chan *peerMetricRequest
+	Disconn     chan string
+}
+
+func init() {
+	store = &trustMetricStore{
+		PeerMetrics: make(map[string]*TrustMetric),
+		Requests:    make(chan *peerMetricRequest, 10),
+		Disconn:     make(chan string, 10),
+	}
+
+	go store.processRequests()
+}
+
+type peerHistory struct {
+	NumIntervals int       `json:"intervals"`
+	History      []float64 `json:"history"`
+}
+
+func loadSaveFromFile(key string, isLoad bool, data *peerHistory) *peerHistory {
+	tmhome, ok := os.LookupEnv("TMHOME")
+	if !ok {
+		return nil
+	}
+
+	filename := filepath.Join(tmhome, "trust_history.json")
+
+	peers := make(map[string]peerHistory, 0)
+	// read in previously written history data
+	content, err := ioutil.ReadFile(filename)
+	if err == nil {
+		err = json.Unmarshal(content, &peers)
+	}
+
+	var result *peerHistory
+
+	if isLoad {
+		if p, ok := peers[key]; ok {
+			result = &p
+		}
+	} else {
+		peers[key] = *data
+
+		b, err := json.Marshal(peers)
+		if err == nil {
+			err = ioutil.WriteFile(filename, b, 0644)
+		}
+	}
+	return result
+}
+
+func createLoadPeerMetric(key string) *TrustMetric {
+	tm := NewMetric()
+
+	if tm == nil {
+		return tm
+	}
+
+	// attempt to load the peer's trust history data
+	if ph := loadSaveFromFile(key, true, nil); ph != nil {
+		tm.historySize = len(ph.History)
+
+		if tm.historySize > 0 {
+			tm.numIntervals = ph.NumIntervals
+			tm.history = ph.History
+
+			tm.historyValue = tm.calcHistoryValue()
+		}
+	}
+	return tm
+}
+
+func (tms *trustMetricStore) processRequests() {
+	for {
+		select {
+		case req := <-tms.Requests:
+			tm, ok := tms.PeerMetrics[req.Key]
+
+			if !ok {
+				tm = createLoadPeerMetric(req.Key)
+
+				if tm != nil {
+					tms.PeerMetrics[req.Key] = tm
+				}
+			}
+
+			req.Resp <- tm
+		case key := <-tms.Disconn:
+			if tm, ok := tms.PeerMetrics[key]; ok {
+				ph := peerHistory{
+					NumIntervals: tm.numIntervals,
+					History:      tm.history,
+				}
+
+				tm.Stop()
+				delete(tms.PeerMetrics, key)
+				loadSaveFromFile(key, false, &ph)
+			}
+		}
+	}
+}
+
+// request a TrustMetric by Peer Key
+func GetPeerTrustMetric(key string) *TrustMetric {
+	resp := make(chan *TrustMetric, 1)
+
+	store.Requests <- &peerMetricRequest{Key: key, Resp: resp}
+	return <-resp
+}
+
+// the trust metric store should know when a Peer disconnects
+func PeerDisconnected(key string) {
+	store.Disconn <- key
+}
+
+// keep track of Peer reliability
+type TrustMetric struct {
+	proportionalWeight float64
+	integralWeight     float64
+	numIntervals       int
+	maxIntervals       int
+	intervalLen        time.Duration
+	history            []float64
+	historySize        int
+	historyMaxSize     int
+	historyValue       float64
+	bad, good          float64
+	stop               chan int
+	update             chan *updateBadGood
+	trustValue         chan *reqTrustValue
+}
+
+type TrustMetricConfig struct {
+	// be careful changing these weights
+	ProportionalWeight float64
+	IntegralWeight     float64
+	// don't allow 2^HistoryMaxSize to be greater than int max value
+	HistoryMaxSize int
+	// each interval should be short for adapability
+	// less than 30 seconds is too sensitive,
+	// and greater than 5 minutes will make the metric numb
+	IntervalLen time.Duration
+}
+
+func defaultConfig() *TrustMetricConfig {
+	return &TrustMetricConfig{
+		ProportionalWeight: 0.4,
+		IntegralWeight:     0.6,
+		HistoryMaxSize:     16,
+		IntervalLen:        1 * time.Minute,
+	}
+}
+
+type updateBadGood struct {
+	IsBad bool
+	Add   int
+}
+
+type reqTrustValue struct {
+	Resp chan float64
+}
+
+// calculates the derivative component
+func (tm *TrustMetric) derivativeValue() float64 {
+	return tm.proportionalValue() - tm.historyValue
+}
+
+// strengthens the derivative component
+func (tm *TrustMetric) weightedDerivative() float64 {
+	var weight float64
+
+	d := tm.derivativeValue()
+	if d < 0 {
+		weight = 1.0
+	}
+
+	return weight * d
+}
+
+func (tm *TrustMetric) fadedMemoryValue(interval int) float64 {
+	if interval == 0 {
+		// base case
+		return tm.history[0]
+	}
+
+	index := int(math.Floor(math.Log(float64(interval)) / math.Log(2)))
+	// map the interval value down to an actual history index
+	return tm.history[index]
+}
+
+func (tm *TrustMetric) updateFadedMemory() {
+	if tm.historySize < 2 {
+		return
+	}
+
+	// keep the last history element
+	faded := tm.history[:1]
+
+	for i := 1; i < tm.historySize; i++ {
+		x := math.Pow(2, float64(i))
+
+		ftv := ((tm.history[i] * (x - 1)) + tm.history[i-1]) / x
+
+		faded = append(faded, ftv)
+	}
+
+	tm.history = faded
+}
+
+// calculates the integral (history) component of the trust value
+func (tm *TrustMetric) calcHistoryValue() float64 {
+	var wk []float64
+
+	// create the weights
+	hlen := tm.numIntervals
+	for i := 0; i < hlen; i++ {
+		x := math.Pow(.8, float64(i+1)) // optimistic wk
+		wk = append(wk, x)
+	}
+
+	var wsum float64
+	// calculate the sum of the weights
+	for _, v := range wk {
+		wsum += v
+	}
+
+	var hv float64
+	// calculate the history value
+	for i := 0; i < hlen; i++ {
+		weight := wk[i] / wsum
+		hv += tm.fadedMemoryValue(i) * weight
+	}
+	return hv
+}
+
+// calculates the current score for good experiences
+func (tm *TrustMetric) proportionalValue() float64 {
+	value := 1.0
+	// bad events are worth more
+	total := tm.good + math.Pow(tm.bad, 2)
+
+	if tm.bad > 0 || tm.good > 0 {
+		value = tm.good / total
+	}
+	return value
+}
+
+func (tm *TrustMetric) calcTrustValue() float64 {
+	weightedP := tm.proportionalWeight * tm.proportionalValue()
+	weightedI := tm.integralWeight * tm.historyValue
+	weightedD := tm.weightedDerivative()
+
+	tv := weightedP + weightedI + weightedD
+	if tv < 0 {
+		tv = 0
+	}
+	return tv
+}
+
+func (tm *TrustMetric) processRequests() {
+	t := time.NewTicker(tm.intervalLen)
+	defer t.Stop()
+loop:
+	for {
+		select {
+		case bg := <-tm.update:
+			if bg.IsBad {
+				tm.bad += float64(bg.Add)
+			} else {
+				tm.good += float64(bg.Add)
+			}
+		case rtv := <-tm.trustValue:
+			// send the calculated trust value back
+			rtv.Resp <- tm.calcTrustValue()
+		case <-t.C:
+			newHist := tm.calcTrustValue()
+			tm.history = append([]float64{newHist}, tm.history...)
+
+			if tm.historySize < tm.historyMaxSize {
+				tm.historySize++
+			} else {
+				tm.history = tm.history[:tm.historyMaxSize]
+			}
+
+			if tm.numIntervals < tm.maxIntervals {
+				tm.numIntervals++
+			}
+
+			tm.updateFadedMemory()
+			tm.historyValue = tm.calcHistoryValue()
+			tm.good = 0
+			tm.bad = 0
+		case <-tm.stop:
+			break loop
+		}
+	}
+}
+
+func (tm *TrustMetric) Stop() {
+	tm.stop <- 1
+}
+
+// indicate that an undesirable event took place
+func (tm *TrustMetric) IncBad() {
+	tm.update <- &updateBadGood{IsBad: true, Add: 1}
+}
+
+// multiple undesirable events need to be acknowledged
+func (tm *TrustMetric) AddBad(num int) {
+	tm.update <- &updateBadGood{IsBad: true, Add: num}
+}
+
+// positive events need to be recorded as well
+func (tm *TrustMetric) IncGood() {
+	tm.update <- &updateBadGood{IsBad: false, Add: 1}
+}
+
+// multiple positive can be indicated in a single call
+func (tm *TrustMetric) AddGood(num int) {
+	tm.update <- &updateBadGood{IsBad: false, Add: num}
+}
+
+// get the dependable trust value; a score that takes a long history into account
+func (tm *TrustMetric) TrustValue() float64 {
+	resp := make(chan float64, 1)
+
+	tm.trustValue <- &reqTrustValue{Resp: resp}
+	return <-resp
+}
+
+func NewMetric() *TrustMetric {
+	return NewMetricWithConfig(defaultConfig())
+}
+
+func NewMetricWithConfig(tmc *TrustMetricConfig) *TrustMetric {
+	tm := new(TrustMetric)
+	dc := defaultConfig()
+
+	if tmc.ProportionalWeight != 0 {
+		tm.proportionalWeight = tmc.ProportionalWeight
+	} else {
+		tm.proportionalWeight = dc.ProportionalWeight
+	}
+
+	if tmc.IntegralWeight != 0 {
+		tm.integralWeight = tmc.IntegralWeight
+	} else {
+		tm.integralWeight = dc.IntegralWeight
+	}
+
+	if tmc.HistoryMaxSize != 0 {
+		tm.historyMaxSize = tmc.HistoryMaxSize
+	} else {
+		tm.historyMaxSize = dc.HistoryMaxSize
+	}
+
+	if tmc.IntervalLen != time.Duration(0) {
+		tm.intervalLen = tmc.IntervalLen
+	} else {
+		tm.intervalLen = dc.IntervalLen
+	}
+
+	// this gives our metric a tracking window of days
+	tm.maxIntervals = int(math.Pow(2, float64(tm.historyMaxSize)))
+	tm.historyValue = 1.0
+	tm.update = make(chan *updateBadGood, 10)
+	tm.trustValue = make(chan *reqTrustValue, 10)
+	tm.stop = make(chan int, 1)
+
+	go tm.processRequests()
+	return tm
+}