There are a few build options that are trying to keep and report
various statistics
- DETAILED_LATENCY
- SERVER_STATUS
- WITH_STATS
remove all those and establish a generic rplacement, lws_metrics.
lws_metrics makes its stats available via an lws_system ops function
pointer that the user code can set.
Openmetrics export is supported, for, eg, prometheus scraping.
This is a huge patch that should be a global NOP.
For unix type platforms it enables -Wconversion to issue warnings (-> error)
for all automatic casts that seem less than ideal but are normally concealed
by the toolchain.
This is things like passing an int to a size_t argument. Once enabled, I
went through all args on my default build (which build most things) and
tried to make the removed default cast explicit.
With that approach it neither change nor bloat the code, since it compiles
to whatever it was doing before, just with the casts made explicit... in a
few cases I changed some length args from int to size_t but largely left
the causes alone.
From now on, new code that is relying on less than ideal casting
will complain and nudge me to improve it by warnings.
This adds some new objects and helpers for keeping and logging
info on grouped allocations, a group is, eg, SS handles or client
wsis.
Allocated objects get a context-unique "tag" string intended to replace
%p / wsi pointers etc. Pointers quickly become confusing when
allocations are freed and reused, the tag string won't repeat
until you produce 2^64 objects in a context.
In addition the tag string documents the object group, with prefixes
like "wsi-" or "vh-" and contain object-specific additional
information like the vhost name, address / port or the role of the wsi.
At creation time the lws code can use a format string and args
to add whatever group-specific info makes sense, eg, a wsi bound
to a secure stream can also append the guid of the secure stream,
it's copied into the new object tag and so is still available
cleanly after the stream is destroyed if the wsi outlives it.
role ops are usually only sparsely filled, there are currently 20
function pointers but several roles only fill in two. No single
role has more than 14 of the ops. On a 32/64 bit build this part
of the ops struct takes a fixed 80 / 160 bytes then.
First reduce the type of the callback reason part from uint16_t to
uint8_t, this saves 12 bytes unconditionally.
Change to a separate function pointer array with a nybble index
array, it costs 10 bytes for the index and a pointer to the
separate array, for 32-bit the cost is
2 + (4 x ops_used)
and for 64-bit
6 + (8 x ops_used)
for 2 x ops_used it means 32-bit: 10 vs 80 / 64-bit: 22 vs 160
For a typical system with h1 (9), h2 (14), listen (2), netlink (2),
pipe (1), raw_skt (3), ws (12), == 43 ops_used out of 140, it means
the .rodata for this reduced from 32-bit: 560 -> 174 (386 byte
saving) and 64-bit: 1120 -> 350 (770 byte saving)
This doesn't account for the changed function ops calling code, two
ways were tried, a preprocessor macro and explicit functions
For an x86_64 gcc 10 build with most options, release mode,
.text + .rodata
before patch: 553282
accessor macro: 552714 (568 byte saving)
accessor functions: 553674 (392 bytes worse than without patch)
therefore we went with the macros
RFC6724 defines an ipv6-centric DNS result sorting algorithm, that
takes route and source address route information for the results
given by the DNS resolution, and sorts them in order of preferability,
which defines the order they should be tried in.
If LWS_WITH_NETLINK, then lws takes care about collecting and monitoring
the interface, route and source address information, and uses it to
perform the RFC6724 sorting to re-sort the DNS before trying to make
the connections.
Currently we always reserve a fakewsi per pt so events that don't have a related actual
wsi, like vhost-protocol-init or vhost cert init via protocol callback can make callbacks
that look reasonable to user protocol handler code expecting a valid wsi every time.
This patch splits out stuff that user callbacks often unconditionally expect to be in
a wsi, like context pointer, vhost pointer etc into a substructure, which is composed
into struct lws at the top of it. Internal references (struct lws is opaque, so there
are only internal references) are all updated to go via the substructre, the compiler
should make that a NOP.
Helpers are added when fakewsi is used and referenced.
If not PLAT_FREERTOS, we continue to provide a full fakewsi in the pt as before,
although the helpers improve consistency by zeroing down the substructure. There is
a huge amount of user code out there over the last 10 years that did not always have
the minimal examples to follow, some of it does some unexpected things.
If it is PLAT_FREERTOS, that is a newer thing in lws and users have the benefit of
being able to follow the minimal examples' approach. For PLAT_FREERTOS we don't
reserve the fakewsi in the pt any more, saving around 800 bytes. The helpers then
create a struct lws_a (the substructure) on the stack, zero it down (but it is only
like 4 pointers) and prepare it with whatever we know like the context.
Then we cast it to a struct lws * and use it in the user protocol handler call.
In this case, the remainder of the struct lws is undefined. However the amount of
old protocol handlers that might touch things outside of the substructure in
PLAT_FREERTOS is very limited compared to legacy lws user code and the saving is
significant on constrained devices.
User handlers should not be touching everything in a wsi every time anyway, there
are several cases where there is no valid wsi to do the call with. Dereference of
things outside the substructure should only happen when the callback reason shows
there is a valid wsi bound to the activity (as in all the minimal examples).
Establish a new distributed CMake architecture with CMake code related to
a source directory moving to be in the subdir in its own CMakeLists.txt.
In particular, there's now one in ./lib which calls through to ones
further down the directory tree like ./lib/plat/xxx, ./lib/roles/xxx etc.
This cuts the main CMakelists.txt from 98KB -> 33KB, about a 66% reduction,
and it's much easier to maintain sub-CMakeLists.txt that are in the same
directory as the sources they manage, and conceal all the details that that
level.
Child CMakelists.txt become responsible for:
- include_directories() definition (this is not supported by CMake
directly, it passes it back up via PARENT_SCOPE vars in helper
macros)
- Addition child CMakeLists.txt inclusion, for example toplevel ->
role -> role subdir
- Source file addition to the build
- Dependent library path resolution... this is now a private thing
in the child CMakeLists.txt, it just passes back any adaptations
to include_directories() and the LIB_LIST without filling the
parent namespace with the details
(Includes fixes from Yichen Gu)
Currently the incoming ebuf is always replaced to point to either a whole
buflist segment, or up to the (pt_serv_buf - LWS_PRE) length in the pt_serv_buf.
This is called on path for handling http read... some user code reasonably wants to
restrict the read size to what it can handle.
Change the other lws_buflist_aware_read() callers to zero ebuf before calling, and for
those have it keep the current behaviour; but if non-NULL ebuf.token on incoming, as
in http read path case, restrict both reported len of buflist content and the read length
to the incoming ebuf.len so the user code can control what it will get at one time.
Additionally muxed protocol wsi have no choice but to read what was sent to them
since it's HOL-blocking for other streams and its own WINDOW_UPDATEs. So add an
internal param to lws_buflist_aware_read() forcing read even if buflist content
is available.
This provides support to build lws using the linkit 7697 public SDK
from here https://docs.labs.mediatek.com/resource/mt7687-mt7697/en/downloads
This toolchain has some challenges, its int32_t / uint32_t are long,
so assumptions about format strings for those being %u / %d / %x all
break. This fixes all the cases for the features enabled by the
default cmake settings.
Refactor everything around ping / pong handling in ws and h2, so there
is instead a protocol-independent validity lws_sul tracking how long it
has been since the last exchange that confirms the operation of the
network connection in both directions.
Clean out periodic role callback and replace the last two role users
with discrete lws_sul for each pt.
With http, the protocol doesn't indicate where the headers end and the
next transaction or body begin. Until now, we handled that for client
header response parsing by reading from the tls buffer bytewise.
This modernizes the code to read in up to 256-byte chunks and parse
the chunks in one hit (the parse API is already set up for doing this
elsewhere).
Now we have a generic input buflist, adapt the parser loop to go through
that and arrange that any leftovers are placed on there.
This adds the option to have lws do its own dns resolution on
the event loop, without blocking. Existing implementations get
the name resolution done by the libc, which is blocking. In
the case you are opening client connections but need to carefully
manage latency, another connection opening and doing the name
resolution becomes a big problem.
Currently it supports
- ipv4 / A records
- ipv6 / AAAA records
- ipv4-over-ipv6 ::ffff:1.2.3.4 A record promotion for ipv6
- only one server supported over UDP :53
- nameserver discovery on linux, windows, freertos
It also has some nice advantages
- lws-style paranoid response parsing
- random unique tid generation to increase difficulty of poisoning
- it's really integrated with the lws event loop, it does not spawn
threads or use the libc resolver, and of course no blocking at all
- platform-specific server address capturing (from /etc/resolv.conf
on linux, windows apis on windows)
- it has LRU caching
- piggybacking (multiple requests before the first completes go on
a list on the first request, not spawn multiple requests)
- observes TTL in cache
- TTL and timeout use lws_sul timers on the event loop
- ipv6 pieces only built if cmake LWS_IPV6 enabled
Improve the code around stash, getting rid of the strdups for a net
code reduction. Remove the special destroy helper for stash since
it becomes a one-liner.
Trade several stack allocs in the client reset function for a single
sized brief heap alloc to reduce peak stack alloc by around 700 bytes.
Various kinds of input stashing were replaced with a single buflist before
v3.0... this patch replaces the partial send arrangements with its own buflist
in the same way.
Buflists as the name says are growable lists of allocations in a linked-list
that take care of book-keeping what's added and removed (even if what is
removed is less than the current buffer on the list).
The immediate result is that we no longer have to freak out if we had a partial
buffered and new output is coming... we can just pile it on the end of the
buflist and keep draining the front of it.
Likewise we no longer need to be rabid about reporting multiple attempts to
send stuff without going back to the event loop, although not doing that
will introduce inefficiencies we don't have to term it "illegal" any more.
Since buflists have proven reliable on the input side and the logic for dealing
with truncated "non-network events" was already there this internal-only change
should be relatively self-contained.
HTTP server protocols have had for a while LWS_CALLBACK_HTTP_DROP/BIND_PROTOCOL
callbacks that mark when a wsi is attched to a protocol and detached.
It turns out this is generally useful for everything to know when a wsi is
joining a protocol and definitively completely finished with a protocol.
Particularly with client wsi where you provided the userdata externally, this
makes a clear point to free() it on the protocol binding being dropped.
This patch adds protocol bind / unbind callbacks to the role definition and
lets them operate on all roles. For the various roles
HTTP server: LWS_CALLBACK_HTTP_BIND/DROP_PROTOCOL as before
HTTP client: LWS_CALLBACK_CLIENT_HTTP_BIND/DROP_PROTOCOL
ws server: LWS_CALLBACK_WS_SERVER_BIND/DROP_PROTOCOL
ws client: LWS_CALLBACK_WS_CLIENT_BIND/DROP_PROTOCOL
raw file: LWS_CALLBACK_RAW_FILE_BIND/DROP_PROTOCOL
raw skt: LWS_CALLBACK_RAW_SKT_BIND/DROP_PROTOCOL
- split raw role into separate skt and file
- remove all special knowledge from the adoption
apis and migrate to core
- remove all special knowledge from client_connect
stuff, and have it discovered by iterating the
role callbacks to let those choose how to bind;
migrate to core
- retire the old deprecated client apis pre-
client_connect_info