Some platforms need two sockets with AF_INET and AF_INET6 to listen to both
protocols.
This patch changes the single listen socket each vhost could previously
handle to become an lws_dll2 and adapts the related code to handle them as
a linked-list rather than as a singleton.
The next patch adapts the listen / server code to create multiple listen
wsi for vhosts listening on multiple ip protocols.
There are a few build options that are trying to keep and report
various statistics
- DETAILED_LATENCY
- SERVER_STATUS
- WITH_STATS
remove all those and establish a generic rplacement, lws_metrics.
lws_metrics makes its stats available via an lws_system ops function
pointer that the user code can set.
Openmetrics export is supported, for, eg, prometheus scraping.
For SMP case, it was desirable to have a netlink listener per pt so they
could deal with pt-level changes in the pt's local service thread. But
Linux restricts the process to just one netlink listener.
We worked around it by only listening on pt[0], this aligns us a bit more
with the reality and moves to a single routing table in the context.
There's still more to do for SMP case locking.
This is a huge patch that should be a global NOP.
For unix type platforms it enables -Wconversion to issue warnings (-> error)
for all automatic casts that seem less than ideal but are normally concealed
by the toolchain.
This is things like passing an int to a size_t argument. Once enabled, I
went through all args on my default build (which build most things) and
tried to make the removed default cast explicit.
With that approach it neither change nor bloat the code, since it compiles
to whatever it was doing before, just with the casts made explicit... in a
few cases I changed some length args from int to size_t but largely left
the causes alone.
From now on, new code that is relying on less than ideal casting
will complain and nudge me to improve it by warnings.
This adds some new objects and helpers for keeping and logging
info on grouped allocations, a group is, eg, SS handles or client
wsis.
Allocated objects get a context-unique "tag" string intended to replace
%p / wsi pointers etc. Pointers quickly become confusing when
allocations are freed and reused, the tag string won't repeat
until you produce 2^64 objects in a context.
In addition the tag string documents the object group, with prefixes
like "wsi-" or "vh-" and contain object-specific additional
information like the vhost name, address / port or the role of the wsi.
At creation time the lws code can use a format string and args
to add whatever group-specific info makes sense, eg, a wsi bound
to a secure stream can also append the guid of the secure stream,
it's copied into the new object tag and so is still available
cleanly after the stream is destroyed if the wsi outlives it.
role ops are usually only sparsely filled, there are currently 20
function pointers but several roles only fill in two. No single
role has more than 14 of the ops. On a 32/64 bit build this part
of the ops struct takes a fixed 80 / 160 bytes then.
First reduce the type of the callback reason part from uint16_t to
uint8_t, this saves 12 bytes unconditionally.
Change to a separate function pointer array with a nybble index
array, it costs 10 bytes for the index and a pointer to the
separate array, for 32-bit the cost is
2 + (4 x ops_used)
and for 64-bit
6 + (8 x ops_used)
for 2 x ops_used it means 32-bit: 10 vs 80 / 64-bit: 22 vs 160
For a typical system with h1 (9), h2 (14), listen (2), netlink (2),
pipe (1), raw_skt (3), ws (12), == 43 ops_used out of 140, it means
the .rodata for this reduced from 32-bit: 560 -> 174 (386 byte
saving) and 64-bit: 1120 -> 350 (770 byte saving)
This doesn't account for the changed function ops calling code, two
ways were tried, a preprocessor macro and explicit functions
For an x86_64 gcc 10 build with most options, release mode,
.text + .rodata
before patch: 553282
accessor macro: 552714 (568 byte saving)
accessor functions: 553674 (392 bytes worse than without patch)
therefore we went with the macros
This creates a role for RFC3549 Netlink monitoring.
If the OS supports it (currently, linux) then each pt creates a wsi
with the netlink role and dumps the current routing table at pt init.
It then maintains a cache of the routing table in each pt.
Upon routing table changes an SMD message is issued as an event, and
Captive Portal Detection is triggered.
All of the pt's current connections are reassessed for routability under
the changed routing table, those that no longer have a valid route or
gateway are closed.
Event lib support as it has been isn't scaling well, at the low level
libevent and libev headers have a namespace conflict so they can't
both be built into the same image, and at the distro level, binding
all the event libs to libwebsockets.so makes a bloaty situation for
packaging, lws will drag in all the event libs every time.
This patch implements the plan discussed here
https://github.com/warmcat/libwebsockets/issues/1980
and refactors the event lib support so they are built into isolated
plugins and bound at runtime according to what the application says
it wants to use. The event lib plugins can be packaged individually
so that only the needed sets of support are installed (perhaps none
of them if the user code is OK with the default poll() loop). And
dependent user code can mark the specific event loop plugin package
as required so pieces are added as needed.
The eventlib-foreign example is also refactored to build the selected
lib support isolated.
A readme is added detailing the changes and how to use them.
https://libwebsockets.org/git/libwebsockets/tree/READMEs/README.event-libs.md
Currently we always reserve a fakewsi per pt so events that don't have a related actual
wsi, like vhost-protocol-init or vhost cert init via protocol callback can make callbacks
that look reasonable to user protocol handler code expecting a valid wsi every time.
This patch splits out stuff that user callbacks often unconditionally expect to be in
a wsi, like context pointer, vhost pointer etc into a substructure, which is composed
into struct lws at the top of it. Internal references (struct lws is opaque, so there
are only internal references) are all updated to go via the substructre, the compiler
should make that a NOP.
Helpers are added when fakewsi is used and referenced.
If not PLAT_FREERTOS, we continue to provide a full fakewsi in the pt as before,
although the helpers improve consistency by zeroing down the substructure. There is
a huge amount of user code out there over the last 10 years that did not always have
the minimal examples to follow, some of it does some unexpected things.
If it is PLAT_FREERTOS, that is a newer thing in lws and users have the benefit of
being able to follow the minimal examples' approach. For PLAT_FREERTOS we don't
reserve the fakewsi in the pt any more, saving around 800 bytes. The helpers then
create a struct lws_a (the substructure) on the stack, zero it down (but it is only
like 4 pointers) and prepare it with whatever we know like the context.
Then we cast it to a struct lws * and use it in the user protocol handler call.
In this case, the remainder of the struct lws is undefined. However the amount of
old protocol handlers that might touch things outside of the substructure in
PLAT_FREERTOS is very limited compared to legacy lws user code and the saving is
significant on constrained devices.
User handlers should not be touching everything in a wsi every time anyway, there
are several cases where there is no valid wsi to do the call with. Dereference of
things outside the substructure should only happen when the callback reason shows
there is a valid wsi bound to the activity (as in all the minimal examples).
Replace the bash selftest plumbing with CTest.
To use the selftests, build with -DLWS_WITH_MINIMAL_EXAMPLES=1
and `CTEST_OUTPUT_ON_FAILURE=1 make test` or just
`make test`.
To disable tests that require internet access, also give
-DLWS_CTEST_INTERNET_AVAILABLE=0
Remove travis and appveyor scripts on master.
Remove travis and appveyor decals on README.md.
By default this doesn't change any existing logging behaviour at all.
But it allows you to define cmake options to force or force-disable the
build of individual log levels using new cmake option bitfields
LWS_LOGGING_BITFIELD_SET and LWS_LOGGING_BITFIELD_CLEAR.
Eg, -DLWS_LOGGING_BITFIELD_SET="(LLL_INFO)" can force INFO log level
built even in release mode. -DLWS_LOGGING_BITFIELD_CLEAR="(LLL_NOTICE)"
will likewise remove NOTICE logging from the build regardless of
DEBUG or RELEASE mode.
Adds client support for MQTT QoS0 and QoS1, compatible with AWS IoT
Supports stream binding where independent client connections to the
same endpoint can mux on a single tcp + tls connection with topic
routing managed internally.
The vfork optimized spawn, stdxxx and terminal handling in the cgi
implementation is quite mature and sophisticated, and useful for
other things unrelated to cgi. Break it out into its own public
api under LWS_WITH_SPAWN, off by default.
Expand it so the parent wsi is optional, and the role and protocol
bindings for stdxxx pipes can be set. Allow optional sul timeout
and external lws_dll2 owner for extant children.
Remove inline style from minimal http-server-cgi
This adds support for POST in both h1 and h2 queues / stream binding.
The previous queueing tried to keep the "leader" wsi who made the
actual connection around and have it act on the transaction queue
tail if it had done its own thing.
This refactors it so instead, who is the "leader" moves down the
queue and the queued guys inherit the fd, SSL * and queue from the
old leader as they take over.
This lets them operate in their own wsi identity directly and gets
rid of all the "effective wsi" checks, which was applied incompletely
and getting out of hand considering the separate lws_mux checks for
h2 and other muxed protocols alongside it.
This change also allows one wsi at a time to own the transaction for
POST. --post is added as an option to lws-minimal-http-client-multi
and 6 extra selftests with POST on h1/h2, pipelined or not and
staggered or not are added to the CI.
This should be a NOP for h2 support and only affects internal
apis. But it lets us reuse the working and reliable h2 mux
arrangements directly in other protocols later, and share code
so building for h2 + new protocols can take advantage of common
mux child handling struct and code.
Break out common mux handling struct into its own type.
Convert all uses of members that used to be in wsi->h2 to wsi->mux
Audit all references to the members and break out generic helpers
for anything that is useful for other mux-capable protocols to
reuse wsi->mux related features.
Remove LWS_LATENCY.
Add the option LWS_WITH_DETAILED_LATENCY, allowing lws to collect very detailed
information on every read and write, and allow the user code to provide
a callback to process events.
External poll support generates a lot of messages on a busy system
for no value unless you're one of the few people using it. It's
not recommended for new users and is there for backwards compatibility.
Make it not built by default and selectable by cmake option.
wsi timeout, wsi hrtimer, sequencer timeout and vh-protocol timer
all now participate on a single sorted us list.
The whole idea of polling wakes is thrown out, poll waits ignore the
timeout field and always use infinite timeouts.
Introduce a public api that can schedule its own callback from the event
loop with us resolution (usually ms is all the platform can do).
Upgrade timeouts and sequencer timeouts to also be able to use us resolution.
Introduce a prepared fakewsi in the pt, so we don't have to allocate
one on the heap when we need it.
Directly handle vh-protocol timer if LWS_MAX_SMP == 1
lws_dll2 removes the downsides of lws_dll and adds new features like a
running member count and explicit owner type... it's cleaner and more
robust (eg, nodes know their owner, so they can casually switch between
list owners and remove themselves without the code knowing the owner).
This deprecates lws_dll, but since it's public it allows it to continue
to be built for 4.0 release if you give cmake LWS_WITH_DEPRECATED_LWS_DLL.
All remaining internal users of lws_dll are migrated to lws_dll2.
The logic in the loops for insertion and deletion from the
mini, forced to non ulimit max fds in the pt mode was not
quite right.
It showed up in hard to reproduce problem with the ws client
spam test that uses the mini mode, on travis. This should
fix the root cause.
An lws context usually contains a processwide fd -> wsi lookup table.
This allows any possible fd returned by a *nix type OS to be immediately
converted to a wsi just by indexing an array of struct lws * the size of
the highest possible fd, as found by ulimit -n or similar.
This works modestly for Linux type systems where the default ulimit -n for
a process is 1024, it means a 4KB or 8KB lookup table for 32-bit or
64-bit systems.
However in the case your lws usage is much simpler, like one outgoing
client connection and no serving, this represents increasing waste. It's
made much worse if the system has a much larger default ulimit -n, eg 1M,
the table is occupying 4MB or 8MB, of which you will only use one.
Even so, because lws can't be sure the OS won't return a socket fd at any
number up to (ulimit -n - 1), it has to allocate the whole lookup table
at the moment.
This patch looks to see if the context creation info is setting
info->fd_limit_per_thread... if it leaves it at the default 0, then
everything is as it was before this patch. However if finds that
(info->fd_limit_per_thread * actual_number_of_service_threads) where
the default number of service threads is 1, is less than the fd limit
set by ulimit -n, lws switches to a slower lookup table scheme, which
only allocates the requested number of slots. Lookups happen then by
iterating the table and comparing rather than indexing the array
directly, which is obviously somewhat of a performance hit.
However in the case where you know lws will only have a very few wsi
maximum, this method can very usefully trade off speed to be able to
avoid the allocation sized by ulimit -n.
minimal examples for client that can make use of this are also modified
by this patch to use the smaller context allocations.