tpu-client/tpu_connection_cache is refactored out the module and moved to connection-cache/connection_cache and the logic in client/connection_cache is consolidated to connection-cache/connection_cache as well. client/connection_cache only has a thin wrapper which forward calls to connection-cache/connection_cache and deal with constructions of quic/udp connection cache for clients using them both.2.
The TpuConnection is refactored to ClientConnection to make it generic and functions renamed to be proper for other workflows. eg. tpu_addr -> server_addr, send_transaction --> send_data and etc...
The enum dispatch is removed so that we can make the bulk of code of quic and udp agnostic of each other. The client is possible to load quic or udp only into its runtime.
The generic type parameter in the tpu-client/tpu_connection_cache is removed in order to create both quic and udp connection cache and use the object to send transactions with multiple branching when sending data. The generic type parameters and associated types are dropped in other types in order to make the trait "object safe" for this purpose.
I have annotated the code explaining the reasoning and the refactoring source -> destination.
There is no functional changes
bench-tps has been performed for rpc-client, thin-client and tpu-client. And it is found the performance number largely match the ones before the refactoring.
ConnectionCache is being used for managing connections for sending messages using quic. In the current implementation the connection's endpoint is created in a lazy initialized fashion and set to one per connection pool. In repair we need all connections to use the same Endpoint so that the server can send back the response to the same Endpoint.
* Support bi-directional quic communication, use the same endpoint for the quic server and client
This is needed for supporting using quic for repair
* Added comments on the bi-directional communication tests
* Removed some debug logs
* clippy issue
* Move ConnectionCache back to solana-client, and duplicate ThinClient, TpuClient there
* Dedupe thin_client modules
* Dedupe tpu_client modules
* Move TpuClient to TpuConnectionCache
* Move ThinClient to TpuConnectionCache
* Move TpuConnection and quic/udp trait implementations back to solana-client
* Remove enum_dispatch from solana-tpu-client
* Move udp-client to its own crate
* Move quic-client to its own crate
* Restrict the usable port range of the validator such that adding QUIC_PORT_OFFSET never gets us an invalid port. Also validate this for incoming ContactInfos
* Require the proper port range in ContactInfo::valid_client_facing_addr
* Use asserts instead of panics, and enforce USABLE_PORT_RANGE for all the ports in ContactInfo
* Fix typo
* Make the quic client return errors on the quinn endpoint connect() call,
not just the result of awaiting the connect() call, as the connect()
call can itself fail realistically (e.g. due to expected/invalid IPs, etc)
* Update USABLE_PORT_RANGE to a better range and use port_range_validator to validate dynamic-port-range rather than a panic
* Fall back on UDP when the remote peer's tpu port is too large to have QUIC_PORT_OFFSET added to it
* Get rid of tpu port sanitization in ContactInfo
* Turn USABLE_PORT_RANGE into a Range and make connnection_cache fall back on UDP when the tpu port is out of range
* Fix build
* Dummy commit
* Reert dummy commit
* dummy commit
* revert dummy commit
* Fix typo
* Fix range validation
* Fix formatting
* Fix USABLE_PORT_RANGE
* Remove USABLE_PORT_RANGE
* Avoid creating a QuicLazyInitializedEndpoint when forcing the use of UDP
* Implement test for connection cache overflow
* Use client certs in QUIC to get peer's stake
* fixes to cert processing
* integrate the code
* clippy
* more cleanup
* sort cargo deps
* test fixes
* info -> debug
* Remove UseQuic type
Move to storing the UdpSocket on ConnectionCache and accepting a bool
* Remove use_quic from ConnectionCache constructor
Replace with separate with_udp constructor to force callers to choose
* Connection pool in connection cache and handle connection errors
1. The connection not has a pool of connections per address, configurable, default 4
2. The connections per address share a lazy initialized endpoint
3. Handle connection issues better, avoid race conditions
4. Various log improvement for help debug connection issues
* client: Remove static connection cache, plumb it instead
* Add TpuClient::new_with_connection_cache to not break downstream
* Refactor get_connection and RwLock into ConnectionCache
* Fix merge conflicts from new async TpuClient
* Remove `ConnectionCache::set_use_quic`
* Move DEFAULT_TPU_USE_QUIC to client, use ConnectionCache::default()
1. Move logics related to creating endpoint, creating new connection and retry 0rtt connections into a wrapper construct QuicNewConnection to put the logic together.
2. get_or_add_connection: logic is much simplified to allow manage the connection cache -- the QUIC connection is lazy constructed. get_or_add_connection should no longer be a global hotspot.
3. Per connection stats update is moved out of get_or_add_connection to avoid bad connection impacting good connections.
* Use RwLock instead of Mutex in QUIC connection cache
* replace LruCache with HashMap
* fix tests
* fix tests
* refactor
* add cache eviction for a random connection on reaching upperbound
* cleanup
* Increase connection timeouts
* Bump quic connection cache to 1024
* Use constant for quic connection timeout and add warm cache service
* Fixes to QUIC warmup service
* fix check failure
* fixes after rebase
* fix timeout test
Co-authored-by: Pankaj Garg <pankaj@solana.com>
Add an interface send_wire_transaction_batch_async to TpuConnection to allow for sending batches without waiting for completion
Co-authored-by: Anatoly Yakovenko <anatoly@solana.com>