mirror of https://github.com/warmcat/libwebsockets.git synced 2025-03-30 00:00:16 +01:00

History

Andy Green 3549a94ce6 roles: compress role ops structs role ops are usually only sparsely filled, there are currently 20 function pointers but several roles only fill in two. No single role has more than 14 of the ops. On a 32/64 bit build this part of the ops struct takes a fixed 80 / 160 bytes then. First reduce the type of the callback reason part from uint16_t to uint8_t, this saves 12 bytes unconditionally. Change to a separate function pointer array with a nybble index array, it costs 10 bytes for the index and a pointer to the separate array, for 32-bit the cost is 2 + (4 x ops_used) and for 64-bit 6 + (8 x ops_used) for 2 x ops_used it means 32-bit: 10 vs 80 / 64-bit: 22 vs 160 For a typical system with h1 (9), h2 (14), listen (2), netlink (2), pipe (1), raw_skt (3), ws (12), == 43 ops_used out of 140, it means the .rodata for this reduced from 32-bit: 560 -> 174 (386 byte saving) and 64-bit: 1120 -> 350 (770 byte saving) This doesn't account for the changed function ops calling code, two ways were tried, a preprocessor macro and explicit functions For an x86_64 gcc 10 build with most options, release mode, .text + .rodata before patch: 553282 accessor macro: 552714 (568 byte saving) accessor functions: 553674 (392 bytes worse than without patch) therefore we went with the macros		2020-11-28 10:58:38 +00:00
..
client	roles: compress role ops structs	2020-11-28 10:58:38 +00:00
adopt.c	client: rfc6724 dns results sorting	2020-11-28 10:58:07 +00:00
close.c	roles: compress role ops structs	2020-11-28 10:58:38 +00:00
CMakeLists.txt	client: rfc6724 dns results sorting	2020-11-28 10:58:07 +00:00
detailed-latency.c	detailed latency stats	2019-09-22 03:06:59 -07:00
dummy-callback.c	clean: reduce log verbosity in various places	2020-08-10 15:04:10 +01:00
lws-dsh.c	Coverity fixes	2019-08-19 10:12:20 +01:00
network.c	client: rfc6724 dns results sorting	2020-11-28 10:58:07 +00:00
output.c	roles: compress role ops structs	2020-11-28 10:58:38 +00:00
pollfd.c	roles: compress role ops structs	2020-11-28 10:58:38 +00:00
private-lib-core-net.h	client: rfc6724 dns results sorting	2020-11-28 10:58:07 +00:00
README.md	minimal-http-client-multi: add POST	2020-02-21 17:32:41 +00:00
route.c	client: rfc6724 dns results sorting	2020-11-28 10:58:07 +00:00
sequencer.c	sul: multiple timer domains	2020-06-02 08:37:10 +01:00
server.c	event libs: default to building as dynamically loaded plugins	2020-08-31 16:51:37 +01:00
service.c	roles: compress role ops structs	2020-11-28 10:58:38 +00:00
socks5-client.c	fakewsi: replace with smaller substructure	2020-07-20 06:28:52 +01:00
sorted-usec-list.c	fix zombie sul detection	2020-10-10 06:43:32 +01:00
state.c	lws_smd: system message distribution	2020-06-27 07:57:22 +01:00
stats.c	cleaning	2020-01-05 22:17:58 +00:00
vhost.c	roles: compress role ops structs	2020-11-28 10:58:38 +00:00
wsi-timeout.c	roles: compress role ops structs	2020-11-28 10:58:38 +00:00
wsi.c	roles: compress role ops structs	2020-11-28 10:58:38 +00:00

README.md

Implementation background

Client connection Queueing

By default lws treats each client connection as completely separate, and each is made from scratch with its own network connection independently.

If the user code sets the LCCSCF_PIPELINE bit on info.ssl_connection when creating the client connection though, lws attempts to optimize multiple client connections to the same place by sharing any existing connection and its tls tunnel where possible.

There are two basic approaches, for h1 additional connections of the same type and endpoint basically queue on a leader and happen sequentially.

For muxed protocols like h2, they may also queue if the initial connection is not up yet, but subsequently the will all join the existing connection simultaneously "broadside".

h1 queueing

The initial wsi to start the network connection becomes the "leader" that subsequent connection attempts will queue against. Each vhost has a dll2_owner wsi->dll_cli_active_conns_owner that "leaders" who are actually making network connections themselves can register on as "active client connections".

Other client wsi being created who find there is already a leader on the active client connection list for the vhost, can join their dll2 wsi->dll2_cli_txn_queue to the leader's wsi->dll2_cli_txn_queue_owner to "queue" on the leader.

The user code does not know which wsi was first or is queued, it just waits for stuff to happen the same either way.

When the "leader" wsi connects, it performs its client transaction as normal, and at the end arrives at lws_http_transaction_completed_client(). Here, it calls through to the lws_mux _lws_generic_transaction_completed_active_conn() helper. This helper sees if anything else is queued, and if so, migrates assets like the SSL *, the socket fd, and any remaining queue from the original leader to the head of the list, which replaces the old leader as the "active client connection" any subsequent connects would queue on.

It has to be done this way so that user code which may know each client wsi by its wsi, or have marked it with an opaque_user_data pointer, is getting its specific request handled by the wsi it expects it to be handled by.

A side effect of this, and in order to be able to handle POSTs cleanly, lws does not attempt to send the headers for the next queued child before the previous child has finished.

The process of moving the SSL context and fd etc between the queued wsi continues until the queue is all handled.

muxed protocol queueing and stream binding

h2 connections act the same as h1 before the initial connection has been made, but once it is made all the queued connections join the network connection as child mux streams immediately, "broadside", binding the stream to the existing network connection.