New subject: [ruby-core:111716] [Ruby master Feature#19322] Support spawning "private" child processes

7 Jan 2023

      Issue #19322 has been reported by kjtsanaktsidis (KJ Tsanaktsidis).

----------------------------------------
Feature #19322: Support spawning "private" child processes
https://bugs.ruby-lang.org/issues/19322

* Author: kjtsanaktsidis (KJ Tsanaktsidis)
* Status: Open
* Priority: Normal
----------------------------------------
## Background

The traditional Unix process APIs (`fork` etc) are poorly isolated. If a library spawns a child process, this is not transparent to the program using the library. Any signal handler for `SIGCHLD` in the program will be called when the spawned process exits, and even worse, if the parent calls `Process.waitpid2(-1)`, it will consume the returned status code, stealing it from the library!

Unfortunately, the practice of responding to `SIGCHLD` by calling `waitpid2(-1)` in a loop is a pretty common unixism. For example, Unicorn does it [here](https://yhbt.net/unicorn.git/tree/lib/unicorn/http_server.rb#n401). In short, there is no reliable way for a gem to spawn a child process in a way that can’t (unintentionally) be interfered with by other parts of the program.

## Existing solutions in OS’s

Several operating systems provide an improved API for spawning child processes which are fully isolated; that is, they do not generate `SIGCHLD` signals in the program, and are invisible to calls to `waitpid(2)`

* On Linux, such invisible processes can be made by calling `clone(2)` with a zero value in the low byte of `flags`. If the CLONE_PIDFD flag is also provided, then a file descriptor representing the process is also returned; this can be used to wait for and signal the process in a race-free way.
* On FreeBSD, the `pdfork(2)` syscall makes a process that does not signal SIGCHLD and is ignored by `waitpid(2)` calls that do not explicitly specify the pid (i.e. it is ignored when -1 is passed). It also returns a file descriptor representing the process.

Both of these APIs center around the idea of a process file descriptor. Rather than managing a child process using the old process-global wait/signal mechanisms, they return a file descriptor representing the process. Such a file descriptor can uniquely identify the spawned process, be used to wait on the process and get the status, send signals, and even participate in `poll(2)`. They also protect against pid-reuse race conditions; after a process has terminated and been reaped, the pidfd becomes invalid, and can’t randomly begin to refer to a different process.

## Proposed Ruby APIs

I think we should make a new API `Process.spawn_handle`, which accepts all the same parameters as `Process.spawn`. However, it does _not_ return a pid like `Process.spawn`, but rather a new type `Process::Handle`.

`Process::Handle` would identify a single spawned process, using a durable OS-supplied handle not subject to re-use risks (e.g. a pidfd). It would provide the following methods:

* `#pid` - get the pid that the handle is for.
* `#send_signal(signal)` - send a signal to the wrapped process (where "signal" is a symbol, string, or number with the same meaning as in `Process.kill`.
* `#wait` - blocks waiting for the program to exit, and then returns a `Process::Status` object representing e.g. the exit code. Like calling `waitpid`.
* `#wait_nonblock` - Returns a `Process::Status` object for the child process. If the child has not exited, it will be a status object for which `#exited?` is false. Does not block. Like calling `waitpid(WNOHANG)`.

Finally, the `Open3` family of methods would be extended to accept `handle:` as an additional keyword argument. When set to true, `Process.spawn_handle` will be used to start the child, and `Process::Handle` objects will be returned in place of pids.

Modifying backticks, `Kernel#system` and other process-creating methods which don't return pids to use `spawn_handle` internally would also be possible, but out of scope for an initial implementation of this ticket.

## OS compatibility

For this API to be useful to gem authors, it has to be widely available on the systems that they and their users care about. As discussed, the `clone(2)` syscall and `CLONE_PIDFD` flag can be used on Linux 5.2+ to implement `Process::Handle`. FreeBSD has `pdfork(2)` since v9.

I haven’t investigated Windows _deeply_, but I think Windows doesn’t really have the notion of process-global `waitpid` or `SIGCHLD` anyway. The `CreateProcess` function returns a `PROCESS_INFORMATION` struct, which returns a `HANDLE` for the child process, which seems analogous to a process FD.

However this does leave a large chunk of operating systems which don’t have this functionality built-in. Off the top of my head:

* MacOS, NetBSD, and OpenBSD have nothing. I stared pretty hard at the Darwin XNU source and couldn’t find a race-free way to convince it not to dispatch `SIGCHLD` for a particular process or stop it from being reaped by process-wide `wait4` calls.
* Linux < 5.2 is in some probably-pretty-widely-deployed-still distros - it’s the release kernel in Ubuntu 18.04 for example.

I have two ideas for how the semantics of `Process::Handle` could be emulated on such systems. However I recognise that they aren’t amazing so if anybody has some better ideas I would dearly love to hear them.

### Long-lived proxy

The first time `Process.spawn_handle` is used, we would fork/exec a long-lived “fork-helper” program. This could be a separate helper binary we compile with the build system, or perhaps just a re-invocation of the ruby interpreter with something like `ruby -e “Process._fork_helper”`. There would be a unix socketpair shared between the parent process & the helper.

Instead of actually forking when we’re calling `Process.spawn_handle`, we would instead send a message on this socket asking the helper to, _itself_, fork & exec the specified program. Any file descriptors etc needed in the child could also be sent over this socket. All of the `Process::Handle` methods would be proxies which called through to the helper binary.

This way, the ruby process is never actually the parent of the spawned child, so we would never get any SIGCHLD etc from it. The fork-helper program might generate a SIGCHLD, but it should persist until the ruby process exits; we would only generate a SIGCHLD signal if it crashed abnormally.

### Forward misdirected waits

With this approach, `Process.spawn_handle` would just `fork(2)`/`exec(2)` or `posix_spawn(2)` processes like normal. We would however keep a table of pids -> `Process::Handle` instances.

When Ruby’s C-level SIGCHLD handler is invoked, we would inspect that table and see if the pid has an associated `Process::Handle`. If so, we would skip calling any registered Ruby SIGCHLD handler; instead, we would call `waitpid` ourselves, update the status info on the handle object, and unblock anybody waiting on `Process::Handle#wait`.

Likewise, in the C-side implementation of `Process.waitpid2` etc, we would check the returned pid from the syscall against the handle table. If it matched, we would perform the same work as in the SIGCHLD case, and then re-start the original call to `Process.waitpid2`.

This approach keeps the process tree correct and involves less silly proxying, but it won’t hide the process from any callers to the raw `waitpid` library functions in C extensions. Doing that seems like a silly idea anyway though, so maybe that’s OK?

## Motivation

My use-case for this is that I’m working on a perf-based profiling tool for Ruby. To get around some Linux capability issues, I want my profiler gem (or CRuby patch, whatever it winds up being!) to fork a privileged helper binary to do some eBPF twiddling. But, if you’re profiling e.g. a Unicorn master process, the result of that binary exiting might be caught by Unicorn itself, rather than my (gem | interpreter feature).

In my case, I'm so deep in linux specific stuff that just calling `clone(2)` from my extension is probably fine, but I had enough of a look at this process management stuff I thought it would be worth asking the question if this might be useful to other, more normal, gems.

-- 
https://bugs.ruby-lang.org/

[ruby-core:111712] [Ruby master Feature#19322] Support spawning "private" child processes

kjtsanaktsidis (KJ Tsanaktsidis)

nobu (Nobuyoshi Nakada)

kjtsanaktsidis (KJ Tsanaktsidis)

nobu (Nobuyoshi Nakada)

kjtsanaktsidis (KJ Tsanaktsidis)

kjtsanaktsidis (KJ Tsanaktsidis)

Eregon (Benoit Daloze)

Eric Wong

Eregon (Benoit Daloze)

kjtsanaktsidis (KJ Tsanaktsidis)

Eric Wong

kjtsanaktsidis (KJ Tsanaktsidis)

Eregon (Benoit Daloze)

kjtsanaktsidis (KJ Tsanaktsidis)

tags

participants (4)