While going through Postgres source code, I found a really cool way technique they were using to do non local jumps(acting sort of as an exception handling mechanism). Postgres is written in C and C doesn't really have a construct for exceptions, so how would they do something like this?
One very common functionality we observe in any interactive shell is when we press Ctrl-c
, that operation gets cancelled. You can take python repl as an example. When I pressed Ctrl-c
it aborted and printed KeyboardInterrupt
.
λ python
Python 3.11.5 (main, Sep 2 2023, 14:16:33) [GCC 13.2.1 20230801] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> print(
KeyboardInterrupt
>>>
Postgres also offers a similar interactive environment in the form of psql
binary which lets us run db operations interactively. They also obviously offer similar functionality of aborting when you press Ctrl-c
.
To understand how this functionality would be implemented, we need to know a few things. When we press Ctrl-c
, we are sending something called an interrupt. That interrupt is handled by the kernel by sending a signal to the running process(1). Our Ctrl-c
keypress will trigger a SIGINT
signal. And we have mechanism to handle signals(2) ourselves in user code. Using that, we can use that to do a lot of powerful things.
Let's take a look into the postgres(psql
to be specific) code to figure out how they do this.
The directory for psql
code lies at src/bin/psql
. The entrypoint(main function) is inside startup.c
, which internally calls MainLoop
function in mainloop.c
(aptly named).
- In
startup.c
, there's a call topsql_setup_cancel_handler
(which is a onelinersetup_cancel_handler(psql_cancel_callback)
) which eventually ends up callingsetup_cancel_handler()
insrc/fe_utils/cancel.c
(there signal handler for SIGINT is registered, and also a callback function is registered). Thehandle_sigint
function is registered for handlingSIGINT
, inside which it callscancel_callback
function if it's not NULL(which it won't be for our particular codepath).
So, basically psql_cancel_callback
function is run as part of the signal handler (handle_sigint
).
- Now, let's look at the main loop. The
MainLoop
function does a bunch of stuff at the start(which I have no clue about) but the actual loop is mainly the following code. It's a huge function which does a lot of things(expected since it's what drives the whole program). If you scroll through it you'll find that it's getting line and executing it(handling exits,clearing,etc. too), it's quite involved really.
I came across this particular code snippet(where I have my cursor in the screenshot). I had heard about something called setjmp/longjmp
before but hadn't really encountered them in real world code(Could be that I haven't seen a lot of real world code).
Short primer on non local jump construct in C
setjmp
: Marks the point where this was called as somewhat of a checkpoint by saving the execution statelongjmp
: We jump to the checkpoint from wherever we are. This can be across functions (of course locally too).
An example (taken straight from Wikipedia) will make it a bit more clearer. The wikipedia page does a great job of showing how it can be used.
#include <stdio.h>
#include <setjmp.h>
static jmp_buf buf;
void second() {
printf("second\n"); // prints
longjmp(buf,1); // jumps back to where setjmp was called - making setjmp now return 1
}
void first() {
second();
printf("first\n"); // does not print
}
int main() {
if (!setjmp(buf))
first(); // when executed, setjmp returned 0
else // when longjmp jumps back, setjmp returns 1
printf("main\n"); // prints
return 0;
}
Postgres doesn't use setjmp/longjmp
but uses sigsetjmp/siglongjmp
because they're supposed to be used if you're using them in signal handling context it seems(again a piece of knowledge from the linked wikipedia page).
Something fun to do: Look into if
setjmp/longjmp
allow us to implement delimited continuations(TODO: I need to understand them better)
Tying together all the pieces
We saw the sigsetjmp
call to establish a checkpoint(here). Now we need to find where siglongjmp
call is coming from. Well, that's easy, it's probably coming somewhere from the signal handler.
Seeing setup_cancel_handler, we see a callback function(i.e query_cancel_callback
, which is basically psql_cancel_callback
) being assigned to cancel_callback
variable and that is called in the handle_sigint function. I had already mentioned this before).
Voila, We see the siglongjmp
call here.
So, all of the stuff above is responsible for this small functionality.
Verifying what I figured out above isn't a load of crap
Let's make change to psql_cancel_callback and add a new print statement.
And let's also add a print after sigsetjmp
in MainLoop
After I compile with these changes, the behaviour of Ctrl-c
changes slightly in psql
. Now I get this
So, I guess it wasn't incorrect.