Problem: JP's report is unaddressed

Solution: formalize our current thoughts and actions
2017-12-12 18:30:47 +07:00 · 2017-12-12 18:30:47 +07:00 · 82d9476283
commit 82d9476283
8 changed files with 400 additions and 0 deletions
--- a/.gitmodules
+++ b/.gitmodules
@ -0,0 +1,3 @@
+[submodule "fastcmp"]
+	path = fastcmp
+	url = https://github.com/saschagrunert/fastcmp
--- a/README.md
+++ b/README.md
@ -0,0 +1,137 @@
+# Aura Consensus Protocol Audit Findings Commentary
+
+This document is prepared in response to the draft audit report provided
+by Jean-Philippe Aumasson (08/12/17), SHA256 hash of [aura-audit-report.md](aura-audit-report.md)
+is 8040fe95d91b2d77f1f0df2af5ba23c85ee487b18dde650dd46cfc82686d0b3d
+
+## Protocol review
+
+### Risks of synchronized time
+
+The report indicates certain risks of [un]synchronized time and related attacks.
+We believe these concerns are valid and are mostly the consequences of the
+basics of the protocol (it is based on physical time)
+
+### Resilience to malicious nodes
+
+No commentary.
+
+### Denial-of-service attacks
+
+No commentary.
+
+### Finality conditions
+
+We've identified the same case to cause confusion as well (specification
+stating `>= 1/2` of the validators, while code requires `> 1/2`).
+
+With regards to `would_be_finalized` calculation, we currently believe
+that this code is correct and is expressing the same `> 1/2` requirement,
+but is "forecasting" finalization for the next signer.
+
+Recommended course of action:
+
+- [ ] **P1** Consult with Parity maintainers with regards to the original intent
+      and actual implementation. Which constraint ("greater" or "stricly greater")
+      should in fact be specified and implemented
+- [ ] **P2** Further analyze the correctness of `would_be_finalized` definition
+- [ ] **P2.1** Provide meaningful in-code commentary on how `build_ancestry_subchain`
+               operates
+
+### Finality delay
+
+No commentary at this moment, but we are further researching this concern.
+
+## Code review
+
+### Unsafe code
+
+The report mentions use of unsafe code in `bigint::hash` where it is used
+to efficiently compare two byte arrays. As mentioned, this doesn't seem
+to be a security risk. However, it is still an instance of unsafe code
+which would always raise at least some concern.
+
+At the core of this issue there's a question of potential performance gains.
+In order to address it to best of our ability, we've conducted a few comparative
+benchmarks. Firstly, we ran simple benchmarks (slice equality vs memcp) both on
+rust stable (rustc 1.22.1 (05e2e1c41 2017-11-22)) and rust nighty (rustc 1.22.1 (05e2e1c41 2017-11-22))
+and did not see any gains (the results were very similar in both cases):
+
+```
+test cmp      ... bench:           4 ns/iter (+/- 1)
+test cmp_loop ... bench:       4,251 ns/iter (+/- 215)
+test eq       ... bench:           4 ns/iter (+/- 1)
+test eq_loop  ... bench:       4,269 ns/iter (+/- 622)
+```
+
+The source code for this benchmark is available at [byte-cmp](byte-cmp)
+
+To continue this effort, we further sampled some libraries available
+on [crates.io](https://crates.io).
+
+[fastcmp](https://github.com/saschagrunert/fastcmp) was seemingly able
+to show different results:
+
+```
+test fast_compare_equal    ... bench:          10 ns/iter (+/- 0) = 25600 MB/s
+test fast_compare_unequal  ... bench:          10 ns/iter (+/- 0) = 25600 MB/s
+test slice_compare_equal   ... bench:          21 ns/iter (+/- 1) = 12190 MB/s
+test slice_compare_unequal ... bench:          21 ns/iter (+/- 0) = 12190 MB/s
+```
+
+At the core of its implementation method also lies `memcmp`. The reason for the
+significant discrepancy in results is being further investigated. At this moment,
+we can only state that the optimization method used in parity *may* produce
+much more efficient code so its worth keeping around. However, we'd recommend
+the following course of action:
+
+- [ ] **C1** Prepare a patch for bigint::hash that explains the necessity of using unsafe
+code (and memcmp) in prose to make this code easier to understand for others.
+
+### Step number cast from 64- to 32-bit
+
+The report indicates a cast of a 64-bit number to 32-bit one in AuRa code. The concern
+is valid, however, this behaviour will only be triggered 100-500 years later. That said,
+the following course of action was taken:
+
+- [x] **C2** Avoid casting down to 32-bit by dropping the need to use `Duration` type
+      for representing step duration (which is typically single or double digits seconds)
+      https://github.com/oraclesorg/parity/commit/fa2ddce949d8a227b7135b300003f0a5bceddc0f
+- [ ] **C2.1** Prepare a PR and get it merged into the main repository (or get resolved in
+      any other way)
+
+### Potential integer overflow
+
+The report highlights that step increment function might overflow the counter, especially
+when parity is compiled for a 32-bit system. This event is also quite far in the future,
+however, the following course of action was taken:
+
+- [x] **C3** Shutdown parity when such event occurs as there's not much that can be done
+      at that point https://github.com/oraclesorg/parity/commit/fa2ddce949d8a227b7135b300003f0a5bceddc0f
+- [ ] **C3.1** Prepare a PR and get it merged into the main repository (or get resolved in
+      any other way)
+
+
+### Potential division by zero
+
+The report shows that Step calibration function might panic on division by zero. This is a
+valid concern. If somebody will configure Parity with a step duration of 0 seconds, it'll
+quickly panic, however, the error won't necessarily be very informative. This case highlights
+the lack of refinement types in Rust. Following course of action was taken:
+
+- [x] **C4** Guard step duration parameter to be positive and crash Parity otherwise
+     https://github.com/oraclesorg/parity/commit/fa2ddce949d8a227b7135b300003f0a5bceddc0f
+- [ ] **C4.1** Prepare a PR and get it merged into the main repository (or get resolved in
+      any other way)
+
+### Other possible improvements
+
+The report suggests few other improvements. First one is to improve the measurement of timing.
+There is no indication at this point that this is a concern. Course of action:
+
+- [ ] **C5** Try out https://github.com/jedisct1/rust-coarsetime to see how much more
+      performant it is
+- [ ] **C6** Research the importance of measurement timing
+
+The other suggested improvement is erroneous as it suggests that the code doesn't check
+of RNG creation failure, while in fact it does.
--- a/aura-audit-report.md
+++ b/aura-audit-report.md
@ -0,0 +1,172 @@
+% Audit of the Aura consensus protocol
+% Jean-Philippe Aumasson
+% 08/12/17
+
+Authority Round (a.k.a. Aura) is a proof-of-authority consensus engine in the [Parity](https://github.com/paritytech/parity/) Ethereum client, implemented in [/ethcore/src/engines/authority_round](https://github.com/paritytech/parity/tree/master/ethcore/src/engines/authority_round) in files [mod.rs](https://github.com/paritytech/parity/blob/master/ethcore/src/engines/authority_round/mod.rs) and [finality.rs](https://github.com/paritytech/parity/blob/master/ethcore/src/engines/authority_round/finality.rs) (about 850 lines or Rust, including approximately 250 lines of tests).
+
+The goal of this audit is to assess the security strength of the Aura and its implementation, and to find any security shortcoming, in particular related to finality conditions.
+The version of the source code reviewed is that at the commit `b1517654e1212588238c989d00dd92128ea040fe`.
+
+The first part *Protocol review* is about the protocol logic, as understood from the documentation and as implemented. 
+The second part *Code review* is about logic and software bugs in the implementation that are not directly related to the protocol's logic.
+
+
+# Protocol review
+
+## Risks of synchronized time
+
+Aura critically relies on time synchronization of the validators for successful execution of the protocol.
+This increases two risks:
+
+1. **Inconsistency** of the blockchain's state, if clocks of validators are accidentally desynchronized, for example due to clock drift. In this case, different validators would compute different a different step index, and therefore a different lead validator.
+
+2. **Denial-of-service** and possiblity other attacks: Many machines would get time from an NTP server through ntpd, adjusting their local clock to compensate any skew. However,
+    - The NTP server(s) have then to be trusted---which also reduces the decentralization of the protocol
+    - The NTP traffic is unauthenticated by default, thus a network attacker could transmit invalid time to the validators.
+
+The risk of inconsistency can be minimized by choosing a long enough duration, and by synchronizing with a trusted time server from a local network (for example getting its time from GPS).
+
+The risk of denial-of-service can be minimized by authenticating any NTP requests, using the pre-shared key authentication mode or the protocol proposed in a [recent Internet Draft](https://tools.ietf.org/html/draft-ietf-ntp-using-nts-for-ntp-10).
+
+Even when NTP is not used, an attacker may have other means of influencing the local clock skew, for example through [CPU heating](http://sec.cs.ucl.ac.uk/users/smurdoch/papers/ccs06hotornot.pdf).
+
+
+## Resilience to malicious nodes
+
+Aura claims to tolerate up to 50% malicious nodes, as per its documentation.
+The goals of a collusion of malicious nodes would be to seal blocks that should not be sealed, or merely to influence the selection of a validator.
+
+Based on my understanding of the protocol, I believe these claims are correct. The bound 50% is tight, and any collusion of more than 50% of the validators could disrupt the sound execution of the protocol.
+
+## Denial-of-service attacks
+
+Even if connections between validators are encrypted and authenticated, an attacker that blocks from/to one or more validators can prevent the protocol from validating incoming blocks.
+
+Solutions to this problem are non-trivial, and seem to require a trade-off between resiliency to network disruptions and security against collusions of malicious nodes.
+
+## Finality conditions
+
+At the time of writing the [documentation](https://github.com/paritytech/parity/wiki/Aura) defines finality condition as `|SIG_SET(C[K..])| >= n/2`, where `n` is the number of validators, which suggests finalization as soon as 50% of validators have signed a given step.
+However a strict majority seems to be the desired condition, so this condition should instead be `|SIG_SET(C[K..])| > n/2`.
+
+The condition implemented in finality.rs seems even stricter in `build_ancestry_subchain()`:
+
+```
+    let would_be_finalized = (current_signed + 1) * 2 > self.signers.len();
+```
+
+However this condition seems inconsistent with that in `push_hash()`, which simply requires a strict majority:
+
+```
+		while self.sign_count.len() * 2 > self.signers.len() {
+```
+
+If this understanding is correct, the finality condition should be consistently implemented and specified.
+
+
+## Finality delay
+
+A minimum of `n_v/2 + 1` validations being required, with `n_v` the number of validators.
+At least `2(n_v/2 + 1) = n_v + 2` message round trips are therefore necessary before a block is finalized by all validators.
+In the worst case, after exactly `n_v` validations, the delay will instead be of `2n_v + 2`.
+If multiple blocks are proposed for validation, the voting will add at least `2n_v` round trips.
+
+The average delay may be estimated empirically, based on the number of validations of a block or series thereof.
+
+
+# Code review
+
+This section reports on potential security risks in the implementation of Aura.
+We mainly reviewed the mod.rs and finality.rs files, and did not comprehensively review their dependencies.
+
+## Unsafe code
+
+The files implementing Aura, mod.rs and finality.rs, do not unsafe code blocks such as raw pointer dereferences, nor recursions that would create a risk of stack overflow.
+
+However, there is an unsafe blocks in the `bigint::hash` module that is executed in the Aura code, namely the implementation of the PartialEq trait:
+
+```
+		impl PartialEq for $from {
+			fn eq(&self, other: &Self) -> bool {
+				unsafe { memcmp(self.0.as_ptr() as *const c_void, other.0.as_ptr() as *const c_void, $size) == 0 }
+			}
+		}
+```
+This is for example executed in Aura's `is_epoch_end()` function in the line 
+
+```
+	.map(|h| if h == chain_head.hash() {
+```
+This does not create a security, however, since the buffer size (32 bytes) is hardcoded and can't be controlled by an attacher.
+
+
+## Step number cast from 64- to 32-bit 
+
+In `duration_remaining()` a Step's `inner` attribute obtained from `self.load()` is cast from `usize` to `u32`, thus truncating any 64-bit value when running on a 64-bit system (where `usize` is 64-bit). This number is then multiplied to a `Duration` object to estimate the remaining time for this step:
+
+```
+	fn load(&self) -> usize { self.inner.load(AtomicOrdering::SeqCst) }
+	fn duration_remaining(&self) -> Duration {
+		let now = unix_now();
+		let step_end = self.duration * (self.load() as u32 + 1);
+```
+
+Rust does not allow a `Duration` to be multiplied by an `u64`, however. The safest fix seems therefore to check that `inner` is smaller than `u32::MAX`.
+
+## Potential integer overflow
+
+The `AtomicUsize::fetch_add()` function will wrap around on overflow, which may occur on 32-bit systems (where `usize` is 32-bit) and lead to unsafe behavior of the protocol if not detected: c
+```
+	fn increment(&self) {
+		self.inner.fetch_add(1, AtomicOrdering::SeqCst);
+	}
+```
+
+A fix is to check that `inner` is smaller than `u32::MAX`.
+
+
+## Potential division by zero
+
+If `self.duration` is less than a second, then a division by zero will happen in:
+
+```
+		if self.calibrate {
+			let new_step = unix_now().as_secs() / self.duration.as_secs();
+			self.inner.store(new_step as usize, AtomicOrdering::SeqCst);
+		}
+```
+
+A fix is to ensure that `self.duration` is greater than one second.
+
+## Other possible improvements
+
+### Faster timings
+
+If speed of timing measurements is critical, Aura may consider using [rust-coarsetime](https://github.com/jedisct1/rust-coarsetime), "a partial replacement for the Time and Duration structures from the standard library".
+The timings obtained are slightly less accurate, though.
+
+### Safer PRNG
+
+Aura doesn't rely on a pseudorandom generator (`random()` is only used for testing), yet we observed that in hash.rs the `randomize()` function will not check that an instance of OsRng was successfully created:
+
+```
+pub fn randomize(&mut self) {
+	let mut rng = OsRng::new().unwrap();
+	*self= $from::rand(&mut rng);
+}
+```
+
+It would be safer to properly check the OsRng instantiation, by doing the following (as [copied from SO](https://stackoverflow.com/questions/27362523/generating-secure-random-numbers-in-rust))
+
+```
+    let mut rng = match OsRng::new() {
+        Ok(g) => g,
+        Err(e) => panic!("Failed to obtain OS RNG: {}", e)
+    };
+```
+
+
+
+
+
+
--- a/byte-cmp/.gitignore
+++ b/byte-cmp/.gitignore
@ -0,0 +1,4 @@
+/target/
+**/*.rs.bk
+Cargo.lock
+target
--- a/byte-cmp/Cargo.toml
+++ b/byte-cmp/Cargo.toml
@ -0,0 +1,14 @@
+[package]
+name = "byte-cmp"
+version = "0.1.0"
+authors = ["Yurii Rashkovskii <yrashk@gmail.com>"]
+
+[dependencies]
+rand = "*"
+bencher ="*"
+libc = "*"
+
+[[bench]]
+name = "cmp"
+harness = false
+
--- a/byte-cmp/benches/cmp.rs
+++ b/byte-cmp/benches/cmp.rs
@ -0,0 +1,69 @@
+extern crate rand;
+#[macro_use]
+extern crate bencher;
+extern crate libc;
+
+use libc::{c_void, memcmp};
+
+use bencher::Bencher;
+use rand::Rng;
+
+type T = [u8; 32];
+
+fn eq_loop(b: &mut Bencher) {
+    let mut rng = rand::thread_rng(); 
+    let mut t1: T = Default::default();
+    let mut t2: T = Default::default();
+    rng.fill_bytes(&mut t1);
+    rng.fill_bytes(&mut t2);
+    b.iter(|| 
+        for _ in 0..1_000 {
+            assert!(!(t1 == t2))
+        }
+    );
+}
+
+fn cmp_loop(b: &mut Bencher) {
+    let mut rng = rand::thread_rng(); 
+    let mut t1: T = Default::default();
+    let mut t2: T = Default::default();
+    rng.fill_bytes(&mut t1);
+    rng.fill_bytes(&mut t2);
+    let t1ptr = t1.as_ptr() as *const c_void;
+    let t2ptr = t2.as_ptr() as *const c_void;
+    b.iter(|| 
+           for _ in 0..1_000 {
+               assert!(!(unsafe { memcmp(t1ptr, t2ptr, 32) == 0 }));
+           }
+           );
+}
+
+fn eq(b: &mut Bencher) {
+    let mut rng = rand::thread_rng(); 
+    let mut t1: T = Default::default();
+    let mut t2: T = Default::default();
+    rng.fill_bytes(&mut t1);
+    rng.fill_bytes(&mut t2);
+    b.iter(|| 
+            assert!(!(t1 == t2))
+    );
+}
+
+fn cmp(b: &mut Bencher) {
+    let mut rng = rand::thread_rng(); 
+    let mut t1: T = Default::default();
+    let mut t2: T = Default::default();
+    rng.fill_bytes(&mut t1);
+    rng.fill_bytes(&mut t2);
+    let t1ptr = t1.as_ptr() as *const c_void;
+    let t2ptr = t2.as_ptr() as *const c_void;
+    b.iter(|| 
+               assert!(!(unsafe { memcmp(t1ptr, t2ptr, 32) == 0 }))
+           );
+}
+
+
+benchmark_group!(loops, eq_loop, cmp_loop);
+benchmark_group!(comparisons, eq, cmp);
+
+benchmark_main!(comparisons, loops);
--- a/byte-cmp/src/lib.rs
+++ b/byte-cmp/src/lib.rs
--- a/1
+++ b/1
@ -0,0 +1 @@
+Subproject commit 82d9d5c960b9d57065ad3cc73ded4105909a45f9
				`@ -0,0 +1 @@`
				`Subproject commit 82d9d5c960b9d57065ad3cc73ded4105909a45f9`