This provides a build option LWS_WITH_CONMON that lets user code recover
detailed connection stats on client connections with the LCCSCF_CONMON
flag.
In addition to latencies for dns, socket connection, tls and first protocol
response where possible, it also provides the user code an unfiltered list
of DNS responses that the client received, and the peer it actually
succeded to connect to.
Otherwise sai is sometimes failing to get the correct process exit code
spawn: use WEXITSTATUS macro
On openbsd at least, the process retcode isn't in the low 8 bits, but must
be recovered using the official macro.
Really not having any logs makes it difficult to know what is really
happening, but if that's you're thing this will align debug and release
modes to just have ERR and USER if you give WITH_NO_LOGS
Until now we set metadata value pointers into the onward wsi ah data
area... that's OK until we get a situation the wsi has gone away before we
have a chance to deliver the metadata over the proxy link.
Add a variant lws_ss_alloc_set_metadata() that allocates space on the heap
and takes a copy of the input metadata. Change ss-h1 to alloc copies of
its metadata so we no longer race the wsi ah lifetime.
lws_ss_set_metadata can fail... eg, due to transient OOM situation... if it does,
caller must take appropriate action like disconnect and retry.
So mark the api as requiring the result checking, and make sure all the
examples do it.
It's perfectly possible we will have destroyed the wsi and report that
back in the return code. So let's not dumbly defreference the wsi to
make a log inbetweentimes.
Found with fault injection and valgrind.
In the case that we try ipv6 that isn't routable, we get a POLLHUP, that
marks the wsi as unusable (for writes, not pending reads), that's what
we want.
But in the case we go around and retry other dns results that are
routable, we have to clear the wsi unusable flag. Otherwise we will
connect and find that we can't write on the connection...
If the DNS lookup fails, we just sit out the remaining connect time.
The adapts it to reuse the wsi->sul_connect_timeout to schedule DNS lookup
retries until we're out of time.
Eventually we want to try other things as well, this is aligned with that.
Found with fault injection.
There are a few build options that are trying to keep and report
various statistics
- DETAILED_LATENCY
- SERVER_STATUS
- WITH_STATS
remove all those and establish a generic rplacement, lws_metrics.
lws_metrics makes its stats available via an lws_system ops function
pointer that the user code can set.
Openmetrics export is supported, for, eg, prometheus scraping.
For SMP case, it was desirable to have a netlink listener per pt so they
could deal with pt-level changes in the pt's local service thread. But
Linux restricts the process to just one netlink listener.
We worked around it by only listening on pt[0], this aligns us a bit more
with the reality and moves to a single routing table in the context.
There's still more to do for SMP case locking.
If the client library loses the proxy connection, it can receive
an endless stream of 0 length rx instead of understanding that
the UDS peer has gone.
Handle that correctly so the client reacts to the loss of the
proxy link by trying to reacquire it.
Adapt the sspc state to be suitable for retry in that case,
by dropping any dsh and letting the logical ss know that he
is DISCONNECTED, if he thought he was CONNECTED.
The state tracking and violation detection is very powerful at enforcing
only legal transitions, but if it's busy, we don't get to see which stream
had to problem. Add a pointer to the handle lc tag, do that rather than
just pass the handle so we can deal with ss and sspc handles cleanly.