Wednesday, April 10, 2013

Standard vs. Real-Time Unix Signals

Software Architecture


Imagine the following software architecture: a single parent process manages multiple child processes. The parent and child processes send Unix signals to manage and trigger activities between the two processes.

You've tested this basic functionality under reasonable load and all looks well. After initial deployment, everything is looking great. It's time to turn up the volume and bring more activity and clients on board.

After some time, a new bug report comes in saying that some requests are being lost in transit. Your logs show activity hitting the child process. You see the child signaling the parent to perform its next action. You see the parent handling a variety of signals from other processes, but the one from the relevant child process is lost to the mysterious void of /dev/null.

The Reliability of Standard Signals


It turns out that there are two classes of Unix signals: standard and real-time. Standard signals run from 1-32 and real-time signals typically run from 33-64 (Be sure to use SIGRTMIN and SIGRTMAX to avoid compatibility issues).

It also turns out that standard signals, while guaranteed to be delivered to a process by someone, are not guaranteed to be delivered by each process that sends them. This means that if multiple child processes send the same standard signal in a short period of time, the parent process may only receive one signal.

This is where the magic of real-time signals comes in. These signals can be queued and will be delivered for each process individually. Use sigqueue() to add signals to the queue of waiting signals. Be sure to check the return value for errors to prevent creating new bugs.

Cheers,

Joshua Ganes