Event Server
Servers typically handle three types of events: File descriptor, signal and timeouts1. File Descriptor Events
There are several system calls available to receive file descriptor events.
- select (2) is the most portable and least efficient.
- poll (2) is nearly as portable, less inefficient, and has a very intelligible interface.
- Linux has epoll (4) which is a vastly more efficient variant of poll.
- FreeBSD has kqueue, which is a single extensible kernel interface for all event handling.
- Finally, POSIX.4 defines asynchronous I/O (AIO).
- Portability: Maximum, Efficiency: Worst, Notification Type: readiness, level triggered
- Portability: Maximum, Efficiency: poor, Notification Type: readiness, level triggered
- Portability: solaris, Efficiency: acceptable, Notification Type: readiness, level triggered
- Portability: linux 2.4+ , Efficiency: good, Notification Type: readiness, level or edge triggered
- Portability: linux 2.6, freebsd, Efficiency: variable, Notification Type: completion
- Portability: bsd, os/x , Efficiency: good, Notification Type: completion and readiness, level or edge triggered
Using kqueue makes it easy to mix signal event and file descriptor event notification. There is an event filter for signals, interest is signaled similarly to file descriptors, and the events are delivered in same way as file descriptor events.
Besides kqueue, every other way is ugly.
If the signal is delivered during the poll system call, poll will be interrupted with return value EINTR, even if timeout is -1. Thus, you will handle the signal event with low latency. The bad news is, if a signal is delivered between the first sigprocmask call and the poll call, and the poll timeout is -1, poll will not be interrupted and the signal will not be handled until (if) a file descriptor event occurs. One way to guard against this is to have a maximum poll timeout, e.g., of 100ms, which means in exchange for the (slight) extra overhead of 10 system calls a second when idle, you will have a maximum signal latency of circa 100ms.
POSIX provides the pselect (2) system call, which is like the sequence
sigprocmask (SIG_SETMASK, &mask, &oldmask);
select (...)
sigprocmask (SIG_SETMASK, &oldmask, NULL);
except that the system call eliminates the possibility of a signal being delivered between when the sigprocmask call returns and the select call begins. It was designed for the usage outlined above with poll, and therefore sounds ideal. Unfortunately, pselect is broken under Linux. Also, it uses select, and we prefer poll.
Another alternative is to have your signal handler write to a file descriptor that is included in your poll set:
With this setup, you can have an arbitrary poll timeout and maintain low latency. However, it's important to use a pipe, so that the (typically 4 byte) write and read of the signo is atomic; under POSIX, only pipes guarantee a minimum atomic read/write size larger than 4 bytes. It's also important that the pipe be set to non-blocking (see below), to avoid deadlock.
Both techniques also apply to epoll.
3. Timeout Events
The next event type of interest is the timer, e.g., you want a 50ms timeout on a sub request. In general, you may have many more simultaneous timers pending in a complicated server than you have file descriptors, since there are generally multiple timeouts per request.
Once again, kqueues makes it easy. There is a timer filter type which is treated similarly to the file descriptor filters.
Also once again, every other way makes it ugly.
poll, epoll, etc. all provide a single timeout argument to the system call. The problem, then, is to consider the entire set of timers, determine the delta-t until the next timer goes off, and use that deltat as the timeout argument to the system call. By storing the timers in a binary tree sorted by (absolute, not relative) expiration time, the next expiring timer can be found in O (log (N))) time, and creating and removing timers can also be done in O (log (N)) time; the latter is especially important, since most timers are cancelled before expiring (since they are most often used to timeout subrequests, and most of the time, your subservers are within SLA). This technique utilizes the gettimeofday (2) system call.