`tower_watch::WatchService` provides a dynamically-bound `Service` that
updates in response to a `Watch`. A `WatchService` is constructed with a
`Watch<T>` and a `Bind<T>` -- `Bind` is a newly introduced trait that
supports instantiating new service instances with a borrowed value, i.e.
from a watch.
This can be used to reconfigure Services from a shared or otherwise
externally-controlled configuration source (for instance, a file
system).
Previously, there was no notification when capacity is made available by
requests completing. This patch fixes the bug.
This also switches the tests to use `MockTask` from tokio-test.
After speaking with @roanta and @adleong, I realized that our
DEFAULT_RTT_ESTIMATE is too optimstic: it gives new endpoints
an _ideal_ RTT. Instead, our intent is to assign a slightly
pessimistic cost to new endpoints so they don't take on more load
than they are due before the EWMA can establish a baseline.
The balancer provides an implementation of two load balancing strategies: RoundRobin and
P2C+LeastLoaded. The round-robin strategy is extremely simplistic and not sufficient for
most production systems. P2C+LL is a substantial improvement, but relies exclusively on
instantaneous information.
This change introduces P2C+PeakEWMA strategy. P2C+PE improves over P2C+LL by maintaining
an exponentially-weighted moving average of response latencies for each endpoint so that
the recent history directly factors into load balancing decisions. This technique was
pioneered by Finagle for use at Twitter. [Finagle's P2C+PE implementation][finagle] was
referenced heavily while developing this.
The provided demo can be used to illustrate the differences between load balacing
strategies. For example:
```
REQUESTS=50000
CONCURRENCY=50
ENDPOINT_CAPACITY=50
MAX_ENDPOINT_LATENCIES=[1ms, 10ms, 10ms, 10ms, 10ms, 100ms, 100ms, 100ms, 100ms, 1000ms, ]
P2C+PeakEWMA
wall 15s
p50 5ms
p90 56ms
p95 78ms
p99 96ms
p999 105ms
P2C+LeastLoaded
wall 18s
p50 5ms
p90 57ms
p95 80ms
p99 98ms
p999 857ms
RoundRobin
wall 72s
p50 9ms
p90 98ms
p95 496ms
p99 906ms
p999 988ms
````
[numbers]: https://people.eecs.berkeley.edu/~rcs/research/interactive_latency.html
[finagle]: 9cc08d1521/finagle-core/src/main/scala/com/twitter/finagle/loadbalancer/PeakEwma.scala
tower-balance provides a PendingRequests load metric that counts the number of responses
that have not yet been received. However, especially in the case of HTTP, responses have
bodies that remain active far past the initial receipt of the response. We want load
metrics to be able to take such streams into account.
This change introduces a new utility trait, `load::Measure`, which is used by implementors
of `Load` (like `PendingRequests`) to handle the protocol-specific details of attaching an
_instrument_ with a response message. _Instruments_ are implemented as RAII-guarded types
that ensure that load calculations are updated as a response completes. An instrument is
dropped when the load metric no longer needs information from a response.
This all being changed in service of a `PeakEwma` balancer implementation, though this
should benefit the existing load metric as well.
Previously, `power_of_two_choices` and `round_robin` constructors were
exposed from the crate scope.
These have been replaced by `Balance::p2c`, `Balance::p2c_from_rng`, and
`Balance::round_robin`.
When debugging load balancer behavior, it's convenient to observe the
individual node selection decisions. To that end, this change requires
that `Load::Metric` implement `fmt::Debug` when used by
`PowerOfTwoChoices`.
In preparation for additional load balancing strategies, the demo is
being updated to allow for richer testing in several important ways:
- Adopt the new `tokio` multithreaded runtime.
- Use `tower-buffer` to drive each simulated endpoint on an independent
task. This fixes a bug where requests appeared active longer than
intended (while waiting for the SendRequests task process responses).
- A top-level concurrency has been added (by wrapping the balancer in
`tower-in-flight-limit`) so that `REQUESTS` futures were not created
immediately. This also caused incorrect load measurements.
- Endpoints are also constrained with `tower-in-flight-limit`. By
default, the limit is that of the load balancer (so endpoints are
effectively unlimited).
- The `demo.rs` script has been reorganized to account for the new
runtime, such that all examples are one task chain.
- New output format:
```
REQUESTS=50000
CONCURRENCY=50
ENDPOINT_CAPACITY=50
MAX_ENDPOINT_LATENCIES=[1ms, 10ms, 10ms, 10ms, 10ms, 100ms, 100ms, 100ms, 100ms, 1000ms, ]
P2C+LeastLoaded
wall 18s
p50 5ms
p90 56ms
p95 80ms
p99 98ms
p999 900ms
RoundRobin
wall 72s
p50 9ms
p90 98ms
p95 488ms
p99 898ms
p999 989ms
```
`PowerOfTwoChoices` requires a Random Number Generator. In order for
this randomization source to be configurable (i.e. for tests),
`PowerOfTwoChoices` is generic over its implementation of `rand::Rng`;
however, this leads to needless boilerplate when building P2C balancers.
Because load balancers do not need a cryptographically strong RNG, we
can use `rand::SmallRng` (which is `Send + Sync`). `PowerOfTwoChoices`
exposes constructors that take a `SmallRng`.
In order to do this, the `tower-balance` crate now requires `rand = "0.5"`.
When an endpoint's state changes in some way, it may need to be rebound to a
new service, and reinserted into the load balancer. This PR changes
`tower-balance` so that, rather than ignoring duplicate `Insert`s, the new
endpoint replaces the old endpoint. The new endpoint is always placed on the
not-ready list; if the replaced endpoint was on the ready list, it is removed
prior to inserting the new endpoint into the not-ready list.
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
I've implemented `std::error::Error` for the error types in the `tower-balance`, `tower-buffer`, `tower-in-flight-limit`, and `tower-reconnect` middleware crates.
This is required upstream for runconduit/conduit#442, and also just generally seems like the right thing to do as a library.
versions don't have to build both those versions and the older ones
that h2 is currently using.
Don't enable the regex support in env_logger. Applications that want
the regex support can enable it themselves; this will happen
automatically when they add their env_logger dependency.
Disable the env_logger dependency in quickcheck.
The result of this is that there are fewer dependencies. For example,
regex and its dependencies are no longer required at all, as can be
seen by observing the changes to the Cargo.lock. That said,
env_logger 0.5 does add more dependencies itself; however it seems
applications are going to use env_logger 0.5 anyway so this is still
a net gain.
Submitted on behalf of Buoyant, Inc.
Signed-off-by: Brian Smith <brian@briansmith.org>
Provides a middleware that sets a maximum number of requests that can be
in-flight for the service. A request is defined to be in-flight from the
time `call` is invoked to the time the returned response future
resolves.
This maximum is enforced across all clones of the service instance.
Test `Balance::poll_ready` by creating a random number of services, each
of which must be polled a random number of times before becoming ready.
As the balancer is polled, the test ensures that it does not become
ready prematurely and that services are promoted from not_ready to
ready.
`Balance::num_ready()` and `Balance::is_ready()` have been added to
expose the number of ready services, as well as `Balance::num_not_ready()`
to expose the number of pending services.
The new _demo_ example sends a million simulated requests through each
load balancer configuration and records the observed latency
distributions.
Furthermore, this fixes a critical bug in `Balancer`, where we did not
properly iterate through not-ready nodes.
* Use (0..n-1).rev() to iterate from right-to-left
Previously, tower-balance used a fixed round-robin strategy for load
distribution.
This change makes `Balance` generic over its load metric and selection
strategy. The following new traits have been introduced to satisfy this:
- `tower_balance::Load` provides a concrete load metric (i.e. for a service);
- `tower_balance::Choose` provides a strategy for selecting a node;
There are two load balancing configurations supported out-of-the-box:
- `tower_balance::round_robin` provides a load-ignorant round-robin balancer.
- `tower_balance::power_of_two_choices` uses the Power of Two Choices to
distribute requests to the least-loaded node. This should be used in conjunction
with `tower_balance::load::WithPendingRequests` to decorate a `Discover` instance
so that all services it produces implement `Load`.
A dual MIT / Apache 2.0 license does not make any sense. Since the
intent of the original license was to be dual under MIT or Apache 2.0,
restricting to ony MIT is OK.