Commit Graph

12 Commits

Author SHA1 Message Date
Jon Gjengset ba1fdd755b ready-cache: Prepare for 0.3.1 release
This also fixes up the various documentation URLs, which were still
pointing to 0.1.x.
2020-02-24 13:14:23 -05:00
Jon Gjengset 414e3b0809
ready-cache: Avoid panic on strange race (#420)
It's been observed that occasionally tower-ready-cache would panic
trying to find an already canceled service in `cancel_pending_txs`
(#415). The source of the race is not entirely clear, but extensive
debugging demonstrated that occasionally a call to `evict` would send on
the `CancelTx` for a service, yet that service would be yielded back
from `pending` in `poll_pending` in a non-`Canceled` state. This
is equivalent to saying that this code may panic:

```rust
async {
  let (tx, rx) = oneshot::channel();
  tx.send(42).unwrap();
  yield_once().await;
  rx.try_recv().unwrap(); // <- may occasionally panic
}
```

I have not been able to demonstrate a self-contained example failing in
this way, but it's the only explanation I have found for the observed
bug. Pinning the entire runtime to one core still produced the bug,
which indicates that it is not a memory ordering issue. Replacing
oneshot with `mpsc::channel(1)` still produced the bug, which indicates
that the bug is not with the implementation of `oneshot`. Logs also
indicate that the `ChannelTx` we send on in `evict()` truly is the same
one associated with the `ChannelRx` polled in `Pending::poll`, so we're
not getting our wires crossed somewhere. It truly is bizarre.

This patch resolves the issue by considering a failure to find a
ready/errored service's `CancelTx` as another signal that a service has
been removed. Specifically, if `poll_pending` finds a service that
returns `Ok` or `Err`, but does _not_ find its `CancelTx`, then it
assumes that it must be because the service _was_ canceled, but did not
observe that cancellation signal.

As an explanation, this isn't entirely satisfactory, since we do not
fully understand the underlying problem. It _may_ be that a canceled
service could remain in the pending state for a very long time if it
does not become ready _and_ does not see the cancellation signal (so it
returns `Poll::Pending` and is not removed). That, in turn, might cause
an issue if the driver of the `ReadyCache` then chooses to re-use a key
they believe they have evicted. However, any such case _must_ first hit
the panic that exists in the code today, so this is still an improvement
over the status quo.

Fixes #415.
2020-02-24 13:03:43 -05:00
Jon Gjengset be156e733d ready-cache: restore assert for dropped cancel tx
When ready-cache was upgraded from futures 0.1 to `std::future` in
e2f1a49cf3, this `expect` was removed, and
the code instead silently ignores the error. That's probably not what we
want, so this patch restores that assertion.
2020-02-20 17:08:07 -05:00
Jon Gjengset ae34c9b4a1 Add more tower-ready-cache tests 2020-02-20 16:33:54 -05:00
Lucio Franco d63665515c
ready-cache: Add readme (#402) 2019-12-19 17:56:43 -05:00
Lucio Franco 86eef82d2f
Remove default features for futures dep (#399)
* Remove default features for futures dep

* Add missing alloc feature
2019-12-19 14:20:41 -05:00
Juan Alvarez 1843416dfe remove service, make and layer path deps (#382) 2019-12-06 11:59:56 -05:00
Lucio Franco 423ecee7e9
Remove unused deps (#381) 2019-12-05 23:42:01 -05:00
Lucio Franco e2f1a49cf3
Update the rest of the crates and upgrade ready cache to `std::f… (#379)
* Update hedge, filter, load, load-shed, and more

* Update ready cache

* Prepare release for ready-cache

* fix merge

* Update balance

* Prepare balance release
2019-12-05 14:21:47 -05:00
David Barsky a4c02f5d9c Revert "get building"
186a0fb4a3
2019-11-28 15:21:27 -05:00
David Barsky 186a0fb4a3 get building 2019-11-28 15:15:41 -05:00
Oliver Gould 7e55b7fa0b
Introduce tower-ready-cache (#303)
In #293, `balance` was refactored to manage dispatching requests over a
set of equivalent inner services that may or may not be ready.

This change extracts the core logic of managing a cache of ready
services into a dedicated crate, leaving the balance crate to deal with
node selection.
2019-11-12 09:44:16 -08:00