Normally when doing a Client Connection Error handling,
we can action any ss relationship straight away since
we are in a wsi callback without any ss-aware parents
in the call stack.
But in the specific case we're doing the initial onward
wsi connection part on behalf of a ss, in fact the call
stack does have earlier parents holding references on
the related ss.
For example
secstream_h1 (ss-h1.c:470) CCE
lws_inform_client_conn_fail (close.c:319) fails early
lws_client_connect_2_dnsreq (connect2.c:349)
lws_http_client_connect_via_info2 (connect.c:71)
lws_header_table_attach (parsers.c:291)
rops_client_bind_h1 (ops-h1.c:1001)
lws_client_connect_via_info (connect.c:429) start onward connect
_lws_ss_client_connect (secure-streams.c:859)
_lws_ss_request_tx (secure-streams.c:1577)
lws_ss_request_tx (secure-streams.c:1515) request tx
ss_cpd_state (captive-portal-detect.c:50)
lws_ss_event_helper (secure-streams.c:408)
lws_ss_create (secure-streams.c:1256) SS Create
Under these conditions, we can't action the DESTROY_ME that
is coming when the CCE exhausts the retries.
This patch adds a flag that is set during the SS's onward wsi
connection attempt and causes it to stash rather than action
the result code.
The result code is brought out from the stash when we return to
_lws_ss_client_connect level, and passed up in the SS flow until
it is actioned, cleanly aborting the ss create.
In sai, on Xenial (only...) noticed that the wsi is still bound to the ss
handle, and can reference it even after the ss has been destroyed on
ss-testsfail sometimes.
Leave the handle knowing its wsi and able to detach it later during close.
User reports problems with the close / retry flow not happening if we don't
pass thru the nwsi close... it may be happening before the sid1 migration.
Just log it and don't end the handling before the passthru. Logging it
because there was a reason for the change to not passing it through...
Defer recording the ss metrics histogram until wsi close, so it has a
chance to collect all the tags that apply.
Defer dumping metrics until the FINALIZE phase of context destroy, so we
had a chance to get any metrics recorded.
This provides a way to get ahold of LWS_WITH_CONMON telemetry from Secure
Streams, it works the same with direct onward connections or via the proxy.
You can mark streamtypes with a "perf": true policy attribute... this
causes the onward connections on those streamtypes to collect information
about the connection performance, and the unsorted DNS results.
Streams with that policy attribute receive extra data in their rx callback,
with the LWSSS_FLAG_PERF_JSON flag set on it, containing JSON describing the
performance of the onward connection taken from CONMON data, in a JSON
representation. Streams without the "perf" attribute set never receive
this extra rx.
The received JSON is based on the CONMON struct info and looks like
{"peer":"46.105.127.147","dns_us":596,"sockconn_us":31382,"tls_us":28180,"txn_resp_us:23015,"dns":["2001:41d0:2:ee93::1","46.105.127.147"]}
A new minimal example minimal-secure-streams-perf is added that collects
this data on an HTTP GET from warmcat.com, and is built with a -client
version as well if LWS_WITH_SECURE_STREAMS_PROXY_API is set, that operates
via the ss proxy and produces the same result at the client.
Setting the CONNECTED state only when SUBACK is received if the stream has
defined a subscription topic. This is to avoid SS from sending out SUBSCRIBE
right after CONNACK, even when the connection is not valid.
Until now we set metadata value pointers into the onward wsi ah data
area... that's OK until we get a situation the wsi has gone away before we
have a chance to deliver the metadata over the proxy link.
Add a variant lws_ss_alloc_set_metadata() that allocates space on the heap
and takes a copy of the input metadata. Change ss-h1 to alloc copies of
its metadata so we no longer race the wsi ah lifetime.
lws_ss_set_metadata can fail... eg, due to transient OOM situation... if it does,
caller must take appropriate action like disconnect and retry.
So mark the api as requiring the result checking, and make sure all the
examples do it.
There are a few build options that are trying to keep and report
various statistics
- DETAILED_LATENCY
- SERVER_STATUS
- WITH_STATS
remove all those and establish a generic rplacement, lws_metrics.
lws_metrics makes its stats available via an lws_system ops function
pointer that the user code can set.
Openmetrics export is supported, for, eg, prometheus scraping.
The various stream transitions for direct ss, SSPC, smd, and
different protocols are all handled in different code, let's
stop hoping for the best and add a state transition validation
function that is used everywhere we pass a state change to a
user callback, and knows what is valid for the user state()
callback to see next, given the last state it was shown.
Let's assert if lws manages to violate that so we can find
where the problem is and provide a stricter guarantee about
what user state handler will see, no matter if ss or sspc
or other cases.
To facilitate that, move the states to start from 1, where
0 indicates the state unset.
This is a huge patch that should be a global NOP.
For unix type platforms it enables -Wconversion to issue warnings (-> error)
for all automatic casts that seem less than ideal but are normally concealed
by the toolchain.
This is things like passing an int to a size_t argument. Once enabled, I
went through all args on my default build (which build most things) and
tried to make the removed default cast explicit.
With that approach it neither change nor bloat the code, since it compiles
to whatever it was doing before, just with the casts made explicit... in a
few cases I changed some length args from int to size_t but largely left
the causes alone.
From now on, new code that is relying on less than ideal casting
will complain and nudge me to improve it by warnings.
This adds some new objects and helpers for keeping and logging
info on grouped allocations, a group is, eg, SS handles or client
wsis.
Allocated objects get a context-unique "tag" string intended to replace
%p / wsi pointers etc. Pointers quickly become confusing when
allocations are freed and reused, the tag string won't repeat
until you produce 2^64 objects in a context.
In addition the tag string documents the object group, with prefixes
like "wsi-" or "vh-" and contain object-specific additional
information like the vhost name, address / port or the role of the wsi.
At creation time the lws code can use a format string and args
to add whatever group-specific info makes sense, eg, a wsi bound
to a secure stream can also append the guid of the secure stream,
it's copied into the new object tag and so is still available
cleanly after the stream is destroyed if the wsi outlives it.
Since client_connect and request_tx can be called from code that expects
the ss handle to be in scope, these calls can't deal with destroying the
ss handle and must pass the lws_ss_state_return_t disposition back to
the caller to handle.
Break out the core ss_set_metadata action into a subfunction that
takes the lws_ss_metadata_t, and is fixed to retire heap-based
values before they go out of scope, and adapt the exported version
to call through to that.
Simplify extract_metadata() to reuse the subfunction as well, in
both well-known and custom header cases.
Teach lws how to deal with date: and retry-after:
Add quick selftest into apt-test-lws_tokenize
Expand lws_retry_sul_schedule_retry_wsi() to check for retry_after and
increase the backoff if a larger one found.
Finally, change SS h1 protocol to handle 503 + retry-after: as a
failure, and apply any increased backoff from retry-after
automatically.
This adds a per-streamtype JSON mapping table in the policy.
In addition to the previous flow, it lets you generate custom
SS state notifications for specific http response codes, eg:
"http_resp_map": [ { "530": 1530 }, { "531": 1531 } ],
It's not recommended to overload the transport-layer response
code with application layer responses. It's better to return
a 200 and then in the application protocol inside http, explain
what happened from the application perspective, usually with
JSON. But this is designed to let you handle existing systems
that do overload the transport layer response code.
SS states for user use start at LWSSSCS_USER_BASE, which is
1000.
You can do a basic test with minimal-secure-streams and --respmap
flag, this will go to httpbin.org and get a 404, and the warmcat.com
policy has the mapping for 404 -> LWSSSCS_USER_BASE (1000).
Since the mapping emits states, these are serialized and handled
like any other state in the proxy case.
The policy2c example / tool is also updated to handle the additional
mapping tables.
At the moment you can define and set per-stream metadata at the client,
which will be string-substituted and if configured in the policy, set in
related outgoing protocol specific content like h1 headers.
This patch extends the metadata concept to also check incoming protocol-
specific content like h1 headers and where it matches the binding in the
streamtype's metadata entry, make it available to the client by name, via
a new lws_ss_get_metadata() api.
Currently warmcat.com has additional headers for
server: lwsws (well-known header name)
test-custom-header: hello (custom header name)
minimal-secure-streams test is updated to try to recover these both
in direct and -client (via proxy) versions. The corresponding metadata
part of the "mintest" stream policy from warmcat.com is
{
"srv": "server:"
}, {
"test": "test-custom-header:"
},
If built direct, or at the proxy, the stream has access to the static
policy metadata definitions and can store the rx metadata in the stream
metadata allocation, with heap-allocated a value. For client side that
talks to a proxy, only the proxy knows the policy, and it returns rx
metadata inside the serialized link to the client, which stores it on
the heap attached to the stream.
In addition an optimization for mapping static policy metadata definitions
to individual stream handle metadata is changed to match by name.
Formalize the LWSSSSRET_ enums into a type "lws_ss_state_return_t"
returned by the rx, tx and state callbacks, and some private helpers
lws_ss_backoff() and lws_ss_event_helper().
Remove LWSSSSRET_SS_HANDLE_DESTROYED concept... the two helpers that could
have destroyed the ss and returned that, now return LWSSSSRET_DESTROY_ME
to the caller to perform or pass up to their caller instead.
Handle helper returns in all the ss protocols and update the rx / tx
calls to have their returns from rx / tx / event helper and ss backoff
all handled by unified code.