If facing a captive portal, we may seem to get a tcp level connection okay
but find that communication is silently dropped, leading to us timing out
in LRS_WAITING_SERVER_REPLY.
If so, we need to handle it as a connection fail in order to satisfy at
least Captive Portal detection.
There are a few build options that are trying to keep and report
various statistics
- DETAILED_LATENCY
- SERVER_STATUS
- WITH_STATS
remove all those and establish a generic rplacement, lws_metrics.
lws_metrics makes its stats available via an lws_system ops function
pointer that the user code can set.
Openmetrics export is supported, for, eg, prometheus scraping.
This is a huge patch that should be a global NOP.
For unix type platforms it enables -Wconversion to issue warnings (-> error)
for all automatic casts that seem less than ideal but are normally concealed
by the toolchain.
This is things like passing an int to a size_t argument. Once enabled, I
went through all args on my default build (which build most things) and
tried to make the removed default cast explicit.
With that approach it neither change nor bloat the code, since it compiles
to whatever it was doing before, just with the casts made explicit... in a
few cases I changed some length args from int to size_t but largely left
the causes alone.
From now on, new code that is relying on less than ideal casting
will complain and nudge me to improve it by warnings.
This adds some new objects and helpers for keeping and logging
info on grouped allocations, a group is, eg, SS handles or client
wsis.
Allocated objects get a context-unique "tag" string intended to replace
%p / wsi pointers etc. Pointers quickly become confusing when
allocations are freed and reused, the tag string won't repeat
until you produce 2^64 objects in a context.
In addition the tag string documents the object group, with prefixes
like "wsi-" or "vh-" and contain object-specific additional
information like the vhost name, address / port or the role of the wsi.
At creation time the lws code can use a format string and args
to add whatever group-specific info makes sense, eg, a wsi bound
to a secure stream can also append the guid of the secure stream,
it's copied into the new object tag and so is still available
cleanly after the stream is destroyed if the wsi outlives it.
role ops are usually only sparsely filled, there are currently 20
function pointers but several roles only fill in two. No single
role has more than 14 of the ops. On a 32/64 bit build this part
of the ops struct takes a fixed 80 / 160 bytes then.
First reduce the type of the callback reason part from uint16_t to
uint8_t, this saves 12 bytes unconditionally.
Change to a separate function pointer array with a nybble index
array, it costs 10 bytes for the index and a pointer to the
separate array, for 32-bit the cost is
2 + (4 x ops_used)
and for 64-bit
6 + (8 x ops_used)
for 2 x ops_used it means 32-bit: 10 vs 80 / 64-bit: 22 vs 160
For a typical system with h1 (9), h2 (14), listen (2), netlink (2),
pipe (1), raw_skt (3), ws (12), == 43 ops_used out of 140, it means
the .rodata for this reduced from 32-bit: 560 -> 174 (386 byte
saving) and 64-bit: 1120 -> 350 (770 byte saving)
This doesn't account for the changed function ops calling code, two
ways were tried, a preprocessor macro and explicit functions
For an x86_64 gcc 10 build with most options, release mode,
.text + .rodata
before patch: 553282
accessor macro: 552714 (568 byte saving)
accessor functions: 553674 (392 bytes worse than without patch)
therefore we went with the macros
With SMP + event lib, extra locking is required when dealing with cross-thread
adoption case, and cross-vhost cases like wsi close, we need to hold the pt or
context lock.
These lock apis are NOPs when LWS_MAX_SMP == 1 which is the default.
Currently we always reserve a fakewsi per pt so events that don't have a related actual
wsi, like vhost-protocol-init or vhost cert init via protocol callback can make callbacks
that look reasonable to user protocol handler code expecting a valid wsi every time.
This patch splits out stuff that user callbacks often unconditionally expect to be in
a wsi, like context pointer, vhost pointer etc into a substructure, which is composed
into struct lws at the top of it. Internal references (struct lws is opaque, so there
are only internal references) are all updated to go via the substructre, the compiler
should make that a NOP.
Helpers are added when fakewsi is used and referenced.
If not PLAT_FREERTOS, we continue to provide a full fakewsi in the pt as before,
although the helpers improve consistency by zeroing down the substructure. There is
a huge amount of user code out there over the last 10 years that did not always have
the minimal examples to follow, some of it does some unexpected things.
If it is PLAT_FREERTOS, that is a newer thing in lws and users have the benefit of
being able to follow the minimal examples' approach. For PLAT_FREERTOS we don't
reserve the fakewsi in the pt any more, saving around 800 bytes. The helpers then
create a struct lws_a (the substructure) on the stack, zero it down (but it is only
like 4 pointers) and prepare it with whatever we know like the context.
Then we cast it to a struct lws * and use it in the user protocol handler call.
In this case, the remainder of the struct lws is undefined. However the amount of
old protocol handlers that might touch things outside of the substructure in
PLAT_FREERTOS is very limited compared to legacy lws user code and the saving is
significant on constrained devices.
User handlers should not be touching everything in a wsi every time anyway, there
are several cases where there is no valid wsi to do the call with. Dereference of
things outside the substructure should only happen when the callback reason shows
there is a valid wsi bound to the activity (as in all the minimal examples).
Adapt the pt sul owner list to be an array, and define two different lists,
one that acts like before and is the default for existing users, and another
that has the ability to cooperate with systemwide suspend to restrict the
interval spent suspended so that it will wake in time for the earliest
thing on this wake-suspend sul list.
Clean the api a bit and add lws_sul_cancel() that only needs the sul as the
argument.
Add a flag for client creation info to indicate that this client connection
is important enough that, eg, validity checking it to detect silently dead
connections should go on the wake-suspend sul list. That flag is exposed in
secure streams policy so it can be added to a streamtype with
"swake_validity": true
Deprecate out the old vhost timer stuff that predates sul. Add a flag
LWS_WITH_DEPRECATED_THINGS in cmake so users can get it back temporarily
before it will be removed in a v4.2.
Adapt all remaining in-tree users of it to use explicit suls.
Secure Streams is an optional layer on top of lws that separates policy
like endpoint selection and tls cert validation into a device JSON
policy document.
Code that wants to open a client connection just specifies a streamtype name,
and no longer deals with details like the endpoint, the protocol (!) or anything
else other than payloads and optionally generic metadata; the JSON policy
contains all the details for each streamtype. h1, h2, ws and mqtt client
connections are supported.
Logical secure streams outlive any particular connection and supports "nailed-up"
connectivity regardless of underlying connection stability.
This should be a NOP for h2 support and only affects internal
apis. But it lets us reuse the working and reliable h2 mux
arrangements directly in other protocols later, and share code
so building for h2 + new protocols can take advantage of common
mux child handling struct and code.
Break out common mux handling struct into its own type.
Convert all uses of members that used to be in wsi->h2 to wsi->mux
Audit all references to the members and break out generic helpers
for anything that is useful for other mux-capable protocols to
reuse wsi->mux related features.
As it is, if time_t is 32-bit on the platform it might lead to
arithmetic overflow, so force it to lws_usec_t (uint64_t) even
though it works OK here on x86_64.
Add a minimal example aimed at testing the wsi hrtimer stability
consistently across platforms.
Add and disable by default hrtimer dump code (this is too expensive
and specific to internal testing to leave in for debug mode even if
it's not printed). If you hack it enabled, it will dump the sul
list for the pt and assert if the list is disordered.
Refactor everything around ping / pong handling in ws and h2, so there
is instead a protocol-independent validity lws_sul tracking how long it
has been since the last exchange that confirms the operation of the
network connection in both directions.
Clean out periodic role callback and replace the last two role users
with discrete lws_sul for each pt.
It was already correct but add helpers to isolate and deduplicate
processing adding and closing a generically immortal stream.
Change the default 31s h2 network connection timeout to be settable
by .keepalive_timeout if nonzero.
Add a public api allowing a client h2 stream to transition to
half-closed LOCAL (by sending a 0-byte DATA with END_STREAM) and
mark itself as immortal to create a read-only long-poll stream
if the server allows it.
Add a vhost server option flag LWS_SERVER_OPTION_VH_H2_HALF_CLOSED_LONG_POLL
which allows the vhost to treat half-closed remotes as immortal long
poll streams.
wsi timeout, wsi hrtimer, sequencer timeout and vh-protocol timer
all now participate on a single sorted us list.
The whole idea of polling wakes is thrown out, poll waits ignore the
timeout field and always use infinite timeouts.
Introduce a public api that can schedule its own callback from the event
loop with us resolution (usually ms is all the platform can do).
Upgrade timeouts and sequencer timeouts to also be able to use us resolution.
Introduce a prepared fakewsi in the pt, so we don't have to allocate
one on the heap when we need it.
Directly handle vh-protocol timer if LWS_MAX_SMP == 1
There are quite a few linked-lists of things that want events after
some period. This introduces a type binding an lws_dll2 for the
list and a lws_usec_t for the duration.
The wsi timeouts, the hrtimer and the sequencer timeouts are converted
to use these, also in the common event wait calculation.
lws_dll2 removes the downsides of lws_dll and adds new features like a
running member count and explicit owner type... it's cleaner and more
robust (eg, nodes know their owner, so they can casually switch between
list owners and remove themselves without the code knowing the owner).
This deprecates lws_dll, but since it's public it allows it to continue
to be built for 4.0 release if you give cmake LWS_WITH_DEPRECATED_LWS_DLL.
All remaining internal users of lws_dll are migrated to lws_dll2.
Adapt service loops and event libs to use microsecond waits
internally, for hrtimer and sequencer. Reduce granularity
according to platform / event lib wait.
Add a helper so there's a single place to extend it.