diff --git a/READMEs/README.event-loops-intro.md b/READMEs/README.event-loops-intro.md new file mode 100644 index 000000000..35b516101 --- /dev/null +++ b/READMEs/README.event-loops-intro.md @@ -0,0 +1,285 @@ +# Considerations around Event Loops + +Much of the software we use is written around an **event loop**. Some examples + + - Chrome / Chromium, transmission, tmux, ntp SNTP... [libevent](https://libevent.org/) + - node.js / cjdns / Julia / cmake ... [libuv](https://archive.is/64pOt) + - Gstreamer, Gnome / GTK apps ... [glib](https://people.gnome.org/~desrt/glib-docs/glib-The-Main-Event-Loop.html) + - SystemD ... sdevent + - OpenWRT ... uloop + +Many applications roll their own event loop using poll() or epoll() or similar, +using the same techniques. Another set of apps use message dispatchers that +take the same approach, but are for cases that don't need to support sockets. +Event libraries provide crossplatform abstractions for this functoinality, and +provide the best backend for their event waits on the platform automagically. + +libwebsockets networking operations require an event loop, it provides a default +one for the platform (based on poll() for Unix) if needed, but also can natively +use any of the event loop libraries listed above, including "foreign" loops +already created and managed by the application. + +## What is an 'event loop'? + +Event loops have the following characteristics: + + - they have a **single thread**, therefore they do not require locking + - they are **not threadsafe** + - they require **nonblocking IO** + - they **sleep** while there are no events (aka the "event wait") + - if one or more event seen, they call back into user code to handle each in + turn and then return to the wait (ie, "loop") + +### They have a single thread + +By doing everything in turn on a single thread, there can be no possibility of +conflicting access to resources from different threads... if the single thread +is in callback A, it cannot be in two places at the same time and also in +callback B accessing the same thing: it can never run any other code +concurrently, only sequentially, by design. + +It means that all mutexes and other synchronization and locking can be +eliminated, along with the many kinds of bugs related to them. + +### They are not threadsafe + +Event loops mandate doing everything in a single thread. You cannot call their +apis from other threads, since there is no protection against reentrancy. + +Lws apis cannot be called safely from any thread other than the event loop one, +with the sole exception of `lws_cancel_service()`. + +### They have nonblocking IO + +With blocking IO, you have to create threads in order to block them to learn +when your IO could proceed. In an event loop, all descriptors are set to use +nonblocking mode, we only attempt to read or write when we have been informed by +an event that there is something to read, or it is possible to write. + +So sacrificial, blocking discrete IO threads are also eliminated, we just do +what we should do sequentially, when we get the event indicating that we should +do it. + +### They sleep while there are no events + +An OS "wait" of some kind is used to sleep the event loop thread until something +to do. There's an explicit wait on file descriptors that have pending read or +write, and also an implicit wait for the next scheduled event. Even if idle for +descriptor events, the event loop will wake and handle scheduled events at the +right time. + +In an idle system, the event loop stays in the wait and takes 0% CPU. + +### If one or more event, they handle them and then return to sleep + +As you can expect from "event loop", it is an infinite loop alternating between +sleeping in the event wait and sequentially servicing pending events, by calling +callbacks for each event on each object. + +The callbacks handle the event and then "return to the event loop". The state +of things in the loop itself is guaranteed to stay consistent while in a user +callback, until you return from the callback to the event loop, when socket +closes may be processed and lead to object destruction. + +Event libraries like libevent are operating the same way, once you start the +event loop, it sits in an inifinite loop in the library, calling back on events +until you "stop" or "break" the loop by calling apis. + +## Why are event libraries popular? + +Developers prefer an external library solution for the event loop because: + + - the quality is generally higher than self-rolled ones. Someone else is + maintaining it, a fulltime team in some cases. + - the event libraries are crossplatform, they will pick the most effective + event wait for the platform without the developer having to know the details. + For example most libs can conceal whether the platform is windows or unix, + and use native waits like epoll() or WSA accordingly. + - If your application uses a event library, it is possible to integrate very + cleanly with other libraries like lws that can use the same event library. + That is extremely messy or downright impossible to do with hand-rolled loops. + +Compared to just throwing threads on it + + - thread lifecycle has to be closely managed, threads must start and must be + brought to an end in a controlled way. Event loops may end and destroy + objects they control at any time a callback returns to the event loop. + + - threads may do things sequentially or genuinely concurrently, this requires + locking and careful management so only deterministic and expected things + happen at the user data. + + - threads do not scale well to, eg, serving tens of thousands of connections; + web servers use event loops. + +## Multiple codebases cooperating on one event loop + +The ideal situation is all your code operates via a single event loop thread. +For lws-only code, including lws_protocols callbacks, this is the normal state +of affairs. + +When there is other code that also needs to handle events, say already existing +application code, or code handling a protocol not supported by lws, there are a +few options to allow them to work together, which is "best" depends on the +details of what you're trying to do and what the existing code looks like. +In descending order of desirability: + +### 1) Use a common event library for both lws and application code + +This is the best choice for Linux-class devices. If you write your application +to use, eg, a libevent loop, then you only need to configure lws to also use +your libevent loop for them to be able to interoperate perfectly. Lws will +operate as a guest on this "foreign loop", and can cleanly create and destroy +its context on the loop without disturbing the loop. + +In addition, your application can merge and interoperate with any other +libevent-capable libraries the same way, and compared to hand-rolled loops, the +quality will be higher. + +### 2) Use lws native wsi semantics in the other code too + +Lws supports raw sockets and file fd abstractions inside the event loop. So if +your other code fits into that model, one way is to express your connections as +"RAW" wsis and handle them using lws_protocols callback semantics. + +This ties the application code to lws, but it has the advantage that the +resulting code is aware of the underlying event loop implementation and will +work no matter what it is. + +### 3) Make a custom lws event lib shim for your custom loop + +Lws provides an ops struct abstraction in order to integrate with event +libraries, you can find it in ./includes/libwebsockets/lws-eventlib-exports.h. + +Lws uses this interface to implement its own event library plugins, but you can +also use it to make your own customized event loop shim, in the case there is +too much written for your custom event loop to be practical to change it. + +In other words this is a way to write a customized event lib "plugin" and tell +the lws_context to use it at creation time. See [minimal-http-server.c](https://libwebsockets.org/git/libwebsockets/tree/minimal-examples/http-server/minimal-http-server-eventlib-custom/minimal-http-server.c) + +### 4) Cooperate at thread level + +This is less desirable because it gives up on unifying the code to run from a +single thread, it means the codebases cannot call each other's apis directly. + +In this scheme the existing threads do their own thing, lock a shared +area of memory and list what they want done from the lws thread context, before +calling `lws_cancel_service()` to break the lws event wait. Lws will then +broadcast a `LWS_CALLBACK_EVENT_WAIT_CANCELLED` protocol callback, the handler +for which can lock the shared area and perform the requested operations from the +lws thread context. + +### 5) Glue the loops together to wait sequentially (don't do this) + +If you have two or more chunks of code with their own waits, it may be tempting +to have them wait sequentially in an outer event loop. (This is only possible +with the lws default loop and not the event library support, event libraries +have this loop inside their own `...run(loop)` apis.) + +``` + while (1) { + do_lws_wait(); /* interrupted at short intervals */ + do_app_wait(); /* interrupted at short intervals */ + } +``` + +This never works well, either: + + - the whole thing spins at 100% CPU when idle, or + + - the waits have timeouts where they sleep for short periods, but then the + latency to service on set of events is increased by the idle timeout period + of the wait for other set of events + +## Common Misunderstandings + +### "Real Men Use Threads" + +Sometimes you need threads or child processes. But typically, whatever you're +trying to do does not literally require threads. Threads are an architectural +choice that can go either way depending on the goal and the constraints. + +Any thread you add should have a clear reason to specifically be a thread and +not done on the event loop, without a new thread or the consequent locking (and +bugs). + +### But blocking IO is faster and simpler + +No, blocking IO has a lot of costs to conceal the event wait by blocking. + +For any IO that may wait, you must spawn an IO thread for it, purely to handle +the situation you get blocked in read() or write() for an arbitrary amount of +time. It buys you a simple story in one place, that you will proceed on the +thread if read() or write() has completed, but costs threads and locking to get +to that. + +Event loops dispense with the threads and locking, and still provide a simple +story, you will get called back when data arrives or you may send. + +Event loops can scale much better, a busy server with 50,000 connections active +does not have to pay the overhead of 50,000 threads and their competing for +locking. + +With blocked threads, the thread can do no useful work at all while it is stuck +waiting. With event loops the thread can service other events until something +happens on the fd. + +### Threads are inexpensive + +In the cases you really need threads, you must have them, or fork off another +process. But if you don't really need them, they bring with them a lot of +expense, some you may only notice when your code runs on constrained targets + + - threads have an OS-side footprint both as objects and in the scheduler + + - thread context switches are not slow on modern CPUs, but have side effects + like cache flushing + + - threads are designed to be blocked for arbitrary amounts of time if you use + blocking IO apis like write() or read(). Then how much concurrency is really + happening? Since blocked threads just go away silently, it is hard to know + when in fact your thread is almost always blocked and not doing useful work. + + - threads require their own stack, which is on embedded is typically suffering + from a dedicated worst-case allocation where the headroom is usually idle + + - locking must be handled, and missed locking or lock order bugs found + +### But... what about latency if only one thing happens at a time? + + - Typically, at CPU speeds, nothing is happening at any given time on most + systems, the event loop is spending most of its time in the event wait + asleep at 0% cpu. + + - The POSIX sockets layer is disjoint from the actual network device driver. + It means that once you hand off the packet to the networking stack, the POSIX + api just returns and leaves the rest of the scheduling, retries etc to the + networking stack and device, descriptor queuing is driven by interrupts in + the driver part completely unaffected by the event loop part. + + - Passing data around via POSIX apis between the user code and the networking + stack tends to return almost immediately since its onward path is managed + later in another, usually interrupt, context. + + - So long as enough packets-worth of data are in the network stack ready to be + handed to descriptors, actual throughput is completely insensitive to jitter + or latency at the application event loop + + - The network device itself is inherently serializing packets, it can only send + one thing at a time. The networking stack locking also introduces hidden + serialization by blocking multiple threads. + + - Many user systems are decoupled like the network stack and POSIX... the user + event loop and its latencies do not affect backend processes occurring in + interrupt or internal thread or other process contexts + +## Conclusion + +Event loops have been around for a very long time and are in wide use today due +to their advantages. Working with them successfully requires understand how to +use them and why they have the advantages and restrictions they do. + +The best results come from all the participants joining the same loop directly. +Using a common event library in the participating codebases allows completely +different code can call each other's apis safely without locking.