![]() role ops are usually only sparsely filled, there are currently 20 function pointers but several roles only fill in two. No single role has more than 14 of the ops. On a 32/64 bit build this part of the ops struct takes a fixed 80 / 160 bytes then. First reduce the type of the callback reason part from uint16_t to uint8_t, this saves 12 bytes unconditionally. Change to a separate function pointer array with a nybble index array, it costs 10 bytes for the index and a pointer to the separate array, for 32-bit the cost is 2 + (4 x ops_used) and for 64-bit 6 + (8 x ops_used) for 2 x ops_used it means 32-bit: 10 vs 80 / 64-bit: 22 vs 160 For a typical system with h1 (9), h2 (14), listen (2), netlink (2), pipe (1), raw_skt (3), ws (12), == 43 ops_used out of 140, it means the .rodata for this reduced from 32-bit: 560 -> 174 (386 byte saving) and 64-bit: 1120 -> 350 (770 byte saving) This doesn't account for the changed function ops calling code, two ways were tried, a preprocessor macro and explicit functions For an x86_64 gcc 10 build with most options, release mode, .text + .rodata before patch: 553282 accessor macro: 552714 (568 byte saving) accessor functions: 553674 (392 bytes worse than without patch) therefore we went with the macros |
||
---|---|---|
.. | ||
client | ||
adopt.c | ||
close.c | ||
CMakeLists.txt | ||
detailed-latency.c | ||
dummy-callback.c | ||
lws-dsh.c | ||
network.c | ||
output.c | ||
pollfd.c | ||
private-lib-core-net.h | ||
README.md | ||
route.c | ||
sequencer.c | ||
server.c | ||
service.c | ||
socks5-client.c | ||
sorted-usec-list.c | ||
state.c | ||
stats.c | ||
vhost.c | ||
wsi-timeout.c | ||
wsi.c |
Implementation background
Client connection Queueing
By default lws treats each client connection as completely separate, and each is made from scratch with its own network connection independently.
If the user code sets the LCCSCF_PIPELINE
bit on info.ssl_connection
when
creating the client connection though, lws attempts to optimize multiple client
connections to the same place by sharing any existing connection and its tls
tunnel where possible.
There are two basic approaches, for h1 additional connections of the same type and endpoint basically queue on a leader and happen sequentially.
For muxed protocols like h2, they may also queue if the initial connection is not up yet, but subsequently the will all join the existing connection simultaneously "broadside".
h1 queueing
The initial wsi to start the network connection becomes the "leader" that
subsequent connection attempts will queue against. Each vhost has a dll2_owner
wsi->dll_cli_active_conns_owner
that "leaders" who are actually making network
connections themselves can register on as "active client connections".
Other client wsi being created who find there is already a leader on the active client connection list for the vhost, can join their dll2 wsi->dll2_cli_txn_queue to the leader's wsi->dll2_cli_txn_queue_owner to "queue" on the leader.
The user code does not know which wsi was first or is queued, it just waits for stuff to happen the same either way.
When the "leader" wsi connects, it performs its client transaction as normal,
and at the end arrives at lws_http_transaction_completed_client()
. Here, it
calls through to the lws_mux _lws_generic_transaction_completed_active_conn()
helper. This helper sees if anything else is queued, and if so, migrates assets
like the SSL *, the socket fd, and any remaining queue from the original leader
to the head of the list, which replaces the old leader as the "active client
connection" any subsequent connects would queue on.
It has to be done this way so that user code which may know each client wsi by its wsi, or have marked it with an opaque_user_data pointer, is getting its specific request handled by the wsi it expects it to be handled by.
A side effect of this, and in order to be able to handle POSTs cleanly, lws does not attempt to send the headers for the next queued child before the previous child has finished.
The process of moving the SSL context and fd etc between the queued wsi continues until the queue is all handled.
muxed protocol queueing and stream binding
h2 connections act the same as h1 before the initial connection has been made, but once it is made all the queued connections join the network connection as child mux streams immediately, "broadside", binding the stream to the existing network connection.