The late_bail discrete unpick flow is missing some pieces compared
to lws_ss_destroy. Unify the creation fail flow to also use
lws_ss_destroy so everything in one place.
Make lws_ss_destroy() not issue any states if the creation flow
didn't get as far as issuing CREATING.
Normally when doing a Client Connection Error handling,
we can action any ss relationship straight away since
we are in a wsi callback without any ss-aware parents
in the call stack.
But in the specific case we're doing the initial onward
wsi connection part on behalf of a ss, in fact the call
stack does have earlier parents holding references on
the related ss.
For example
secstream_h1 (ss-h1.c:470) CCE
lws_inform_client_conn_fail (close.c:319) fails early
lws_client_connect_2_dnsreq (connect2.c:349)
lws_http_client_connect_via_info2 (connect.c:71)
lws_header_table_attach (parsers.c:291)
rops_client_bind_h1 (ops-h1.c:1001)
lws_client_connect_via_info (connect.c:429) start onward connect
_lws_ss_client_connect (secure-streams.c:859)
_lws_ss_request_tx (secure-streams.c:1577)
lws_ss_request_tx (secure-streams.c:1515) request tx
ss_cpd_state (captive-portal-detect.c:50)
lws_ss_event_helper (secure-streams.c:408)
lws_ss_create (secure-streams.c:1256) SS Create
Under these conditions, we can't action the DESTROY_ME that
is coming when the CCE exhausts the retries.
This patch adds a flag that is set during the SS's onward wsi
connection attempt and causes it to stash rather than action
the result code.
The result code is brought out from the stash when we return to
_lws_ss_client_connect level, and passed up in the SS flow until
it is actioned, cleanly aborting the ss create.
This causes the blocking dns lookup to treat EAI_NONAME as immediately
fatal, this is usually caused by an assertive NXDOMAIN from the DNS server
or similar.
Not being able to reach the server should continue to retry.
In order to make the problem visible, it reports the situation using
CLIENT_CONNECTION_ERROR, even though it is still inside the outer client
creation call.
Trying to use the opaque pointer in the handle to point to the conn isn't
going to work when we need it to point to the ss handle.
Move it to have its on place in the handle.
If the larger application is defining vhosts using lejp-conf JSON, it's
often more convenient to describe the vhost for ss server binding to
that.
If the server policy endpoint (usually used to describe the server
interface bind) begins with '!', take the remainder of the endpoint
string as the name of a preexisting vhost to bind ss server to at
creation-time.
This provides a way to get ahold of LWS_WITH_CONMON telemetry from Secure
Streams, it works the same with direct onward connections or via the proxy.
You can mark streamtypes with a "perf": true policy attribute... this
causes the onward connections on those streamtypes to collect information
about the connection performance, and the unsorted DNS results.
Streams with that policy attribute receive extra data in their rx callback,
with the LWSSS_FLAG_PERF_JSON flag set on it, containing JSON describing the
performance of the onward connection taken from CONMON data, in a JSON
representation. Streams without the "perf" attribute set never receive
this extra rx.
The received JSON is based on the CONMON struct info and looks like
{"peer":"46.105.127.147","dns_us":596,"sockconn_us":31382,"tls_us":28180,"txn_resp_us:23015,"dns":["2001:41d0:2:ee93::1","46.105.127.147"]}
A new minimal example minimal-secure-streams-perf is added that collects
this data on an HTTP GET from warmcat.com, and is built with a -client
version as well if LWS_WITH_SECURE_STREAMS_PROXY_API is set, that operates
via the ss proxy and produces the same result at the client.
lws_ss_set_metadata can fail... eg, due to transient OOM situation... if it does,
caller must take appropriate action like disconnect and retry.
So mark the api as requiring the result checking, and make sure all the
examples do it.
There are a few build options that are trying to keep and report
various statistics
- DETAILED_LATENCY
- SERVER_STATUS
- WITH_STATS
remove all those and establish a generic rplacement, lws_metrics.
lws_metrics makes its stats available via an lws_system ops function
pointer that the user code can set.
Openmetrics export is supported, for, eg, prometheus scraping.
The state tracking and violation detection is very powerful at enforcing
only legal transitions, but if it's busy, we don't get to see which stream
had to problem. Add a pointer to the handle lc tag, do that rather than
just pass the handle so we can deal with ss and sspc handles cleanly.
The various stream transitions for direct ss, SSPC, smd, and
different protocols are all handled in different code, let's
stop hoping for the best and add a state transition validation
function that is used everywhere we pass a state change to a
user callback, and knows what is valid for the user state()
callback to see next, given the last state it was shown.
Let's assert if lws manages to violate that so we can find
where the problem is and provide a stricter guarantee about
what user state handler will see, no matter if ss or sspc
or other cases.
To facilitate that, move the states to start from 1, where
0 indicates the state unset.
Let's add a byte on the first message that sspc clients send,
indicating the version of the serialization protocol that the
client was built with.
Start the version at 1, we will add some more changes in other
patches and call v1 (now it has the versioning baked in)
the first real supported serialization version, this patch must
be applied with the next patches to actually represent v1
protocol changes.
This doesn't require user setting, the client is told what version
it supports in LWS_SSS_CLIENT_PROTOCOL_VERSION. The proxy knows
what version(s) it can support and loudly hangs up on the client
if it doesn't understand its protocol version.
This is a huge patch that should be a global NOP.
For unix type platforms it enables -Wconversion to issue warnings (-> error)
for all automatic casts that seem less than ideal but are normally concealed
by the toolchain.
This is things like passing an int to a size_t argument. Once enabled, I
went through all args on my default build (which build most things) and
tried to make the removed default cast explicit.
With that approach it neither change nor bloat the code, since it compiles
to whatever it was doing before, just with the casts made explicit... in a
few cases I changed some length args from int to size_t but largely left
the causes alone.
From now on, new code that is relying on less than ideal casting
will complain and nudge me to improve it by warnings.
This adds some new objects and helpers for keeping and logging
info on grouped allocations, a group is, eg, SS handles or client
wsis.
Allocated objects get a context-unique "tag" string intended to replace
%p / wsi pointers etc. Pointers quickly become confusing when
allocations are freed and reused, the tag string won't repeat
until you produce 2^64 objects in a context.
In addition the tag string documents the object group, with prefixes
like "wsi-" or "vh-" and contain object-specific additional
information like the vhost name, address / port or the role of the wsi.
At creation time the lws code can use a format string and args
to add whatever group-specific info makes sense, eg, a wsi bound
to a secure stream can also append the guid of the secure stream,
it's copied into the new object tag and so is still available
cleanly after the stream is destroyed if the wsi outlives it.
For LWSSSCS_UNREACHABLE state, the additional ord arg has b0 set if the
reason for the unreachability is because the DNS server itself was not
reachable (implying either DNS server is wrongly set, or is not reachable
due to not having connectivity through to it)
Since client_connect and request_tx can be called from code that expects
the ss handle to be in scope, these calls can't deal with destroying the
ss handle and must pass the lws_ss_state_return_t disposition back to
the caller to handle.
C++ APIs wrapping SS client
These are intended to provide an experimental protocol-independent c++
api even more abstracted than secure streams, along the lines of
"wget -Omyfile https://example.com/thing"
WIP