[ruby-core:111712] [Ruby master Feature#19322] Support spawning "private" child processes

Issue #19322 has been reported by kjtsanaktsidis (KJ Tsanaktsidis). ---------------------------------------- Feature #19322: Support spawning "private" child processes https://bugs.ruby-lang.org/issues/19322 * Author: kjtsanaktsidis (KJ Tsanaktsidis) * Status: Open * Priority: Normal ---------------------------------------- ## Background The traditional Unix process APIs (`fork` etc) are poorly isolated. If a library spawns a child process, this is not transparent to the program using the library. Any signal handler for `SIGCHLD` in the program will be called when the spawned process exits, and even worse, if the parent calls `Process.waitpid2(-1)`, it will consume the returned status code, stealing it from the library! Unfortunately, the practice of responding to `SIGCHLD` by calling `waitpid2(-1)` in a loop is a pretty common unixism. For example, Unicorn does it [here](https://yhbt.net/unicorn.git/tree/lib/unicorn/http_server.rb#n401). In short, there is no reliable way for a gem to spawn a child process in a way that can’t (unintentionally) be interfered with by other parts of the program. ## Existing solutions in OS’s Several operating systems provide an improved API for spawning child processes which are fully isolated; that is, they do not generate `SIGCHLD` signals in the program, and are invisible to calls to `waitpid(2)` * On Linux, such invisible processes can be made by calling `clone(2)` with a zero value in the low byte of `flags`. If the CLONE_PIDFD flag is also provided, then a file descriptor representing the process is also returned; this can be used to wait for and signal the process in a race-free way. * On FreeBSD, the `pdfork(2)` syscall makes a process that does not signal SIGCHLD and is ignored by `waitpid(2)` calls that do not explicitly specify the pid (i.e. it is ignored when -1 is passed). It also returns a file descriptor representing the process. Both of these APIs center around the idea of a process file descriptor. Rather than managing a child process using the old process-global wait/signal mechanisms, they return a file descriptor representing the process. Such a file descriptor can uniquely identify the spawned process, be used to wait on the process and get the status, send signals, and even participate in `poll(2)`. They also protect against pid-reuse race conditions; after a process has terminated and been reaped, the pidfd becomes invalid, and can’t randomly begin to refer to a different process. ## Proposed Ruby APIs I think we should make a new API `Process.spawn_handle`, which accepts all the same parameters as `Process.spawn`. However, it does _not_ return a pid like `Process.spawn`, but rather a new type `Process::Handle`. `Process::Handle` would identify a single spawned process, using a durable OS-supplied handle not subject to re-use risks (e.g. a pidfd). It would provide the following methods: * `#pid` - get the pid that the handle is for. * `#send_signal(signal)` - send a signal to the wrapped process (where "signal" is a symbol, string, or number with the same meaning as in `Process.kill`. * `#wait` - blocks waiting for the program to exit, and then returns a `Process::Status` object representing e.g. the exit code. Like calling `waitpid`. * `#wait_nonblock` - Returns a `Process::Status` object for the child process. If the child has not exited, it will be a status object for which `#exited?` is false. Does not block. Like calling `waitpid(WNOHANG)`. Finally, the `Open3` family of methods would be extended to accept `handle:` as an additional keyword argument. When set to true, `Process.spawn_handle` will be used to start the child, and `Process::Handle` objects will be returned in place of pids. Modifying backticks, `Kernel#system` and other process-creating methods which don't return pids to use `spawn_handle` internally would also be possible, but out of scope for an initial implementation of this ticket. ## OS compatibility For this API to be useful to gem authors, it has to be widely available on the systems that they and their users care about. As discussed, the `clone(2)` syscall and `CLONE_PIDFD` flag can be used on Linux 5.2+ to implement `Process::Handle`. FreeBSD has `pdfork(2)` since v9. I haven’t investigated Windows _deeply_, but I think Windows doesn’t really have the notion of process-global `waitpid` or `SIGCHLD` anyway. The `CreateProcess` function returns a `PROCESS_INFORMATION` struct, which returns a `HANDLE` for the child process, which seems analogous to a process FD. However this does leave a large chunk of operating systems which don’t have this functionality built-in. Off the top of my head: * MacOS, NetBSD, and OpenBSD have nothing. I stared pretty hard at the Darwin XNU source and couldn’t find a race-free way to convince it not to dispatch `SIGCHLD` for a particular process or stop it from being reaped by process-wide `wait4` calls. * Linux < 5.2 is in some probably-pretty-widely-deployed-still distros - it’s the release kernel in Ubuntu 18.04 for example. I have two ideas for how the semantics of `Process::Handle` could be emulated on such systems. However I recognise that they aren’t amazing so if anybody has some better ideas I would dearly love to hear them. ### Long-lived proxy The first time `Process.spawn_handle` is used, we would fork/exec a long-lived “fork-helper” program. This could be a separate helper binary we compile with the build system, or perhaps just a re-invocation of the ruby interpreter with something like `ruby -e “Process._fork_helper”`. There would be a unix socketpair shared between the parent process & the helper. Instead of actually forking when we’re calling `Process.spawn_handle`, we would instead send a message on this socket asking the helper to, _itself_, fork & exec the specified program. Any file descriptors etc needed in the child could also be sent over this socket. All of the `Process::Handle` methods would be proxies which called through to the helper binary. This way, the ruby process is never actually the parent of the spawned child, so we would never get any SIGCHLD etc from it. The fork-helper program might generate a SIGCHLD, but it should persist until the ruby process exits; we would only generate a SIGCHLD signal if it crashed abnormally. ### Forward misdirected waits With this approach, `Process.spawn_handle` would just `fork(2)`/`exec(2)` or `posix_spawn(2)` processes like normal. We would however keep a table of pids -> `Process::Handle` instances. When Ruby’s C-level SIGCHLD handler is invoked, we would inspect that table and see if the pid has an associated `Process::Handle`. If so, we would skip calling any registered Ruby SIGCHLD handler; instead, we would call `waitpid` ourselves, update the status info on the handle object, and unblock anybody waiting on `Process::Handle#wait`. Likewise, in the C-side implementation of `Process.waitpid2` etc, we would check the returned pid from the syscall against the handle table. If it matched, we would perform the same work as in the SIGCHLD case, and then re-start the original call to `Process.waitpid2`. This approach keeps the process tree correct and involves less silly proxying, but it won’t hide the process from any callers to the raw `waitpid` library functions in C extensions. Doing that seems like a silly idea anyway though, so maybe that’s OK? ## Motivation My use-case for this is that I’m working on a perf-based profiling tool for Ruby. To get around some Linux capability issues, I want my profiler gem (or CRuby patch, whatever it winds up being!) to fork a privileged helper binary to do some eBPF twiddling. But, if you’re profiling e.g. a Unicorn master process, the result of that binary exiting might be caught by Unicorn itself, rather than my (gem | interpreter feature). In my case, I'm so deep in linux specific stuff that just calling `clone(2)` from my extension is probably fine, but I had enough of a look at this process management stuff I thought it would be worth asking the question if this might be useful to other, more normal, gems. -- https://bugs.ruby-lang.org/

Issue #19322 has been updated by nobu (Nobuyoshi Nakada). Already possible solution would be a daemon process: ```ruby IO.popen("-", "r+") do |childio| if childio # In parent process Process.wait(childio.pid) # `Process.wait` no longer consume the returned status code. # Wait the grandchild process to finish childio.read elsif Process.fork # In child process exit else # In grandchild process do_something(STDIN, STDOUT) end end ``` ---------------------------------------- Feature #19322: Support spawning "private" child processes https://bugs.ruby-lang.org/issues/19322#change-101114 * Author: kjtsanaktsidis (KJ Tsanaktsidis) * Status: Open * Priority: Normal ---------------------------------------- ## Background The traditional Unix process APIs (`fork` etc) are poorly isolated. If a library spawns a child process, this is not transparent to the program using the library. Any signal handler for `SIGCHLD` in the program will be called when the spawned process exits, and even worse, if the parent calls `Process.waitpid2(-1)`, it will consume the returned status code, stealing it from the library! Unfortunately, the practice of responding to `SIGCHLD` by calling `waitpid2(-1)` in a loop is a pretty common unixism. For example, Unicorn does it [here](https://yhbt.net/unicorn.git/tree/lib/unicorn/http_server.rb#n401). In short, there is no reliable way for a gem to spawn a child process in a way that can’t (unintentionally) be interfered with by other parts of the program. ## Existing solutions in OS’s Several operating systems provide an improved API for spawning child processes which are fully isolated; that is, they do not generate `SIGCHLD` signals in the program, and are invisible to calls to `waitpid(2)` * On Linux, such invisible processes can be made by calling `clone(2)` with a zero value in the low byte of `flags`. If the CLONE_PIDFD flag is also provided, then a file descriptor representing the process is also returned; this can be used to wait for and signal the process in a race-free way. * On FreeBSD, the `pdfork(2)` syscall makes a process that does not signal SIGCHLD and is ignored by `waitpid(2)` calls that do not explicitly specify the pid (i.e. it is ignored when -1 is passed). It also returns a file descriptor representing the process. Both of these APIs center around the idea of a process file descriptor. Rather than managing a child process using the old process-global wait/signal mechanisms, they return a file descriptor representing the process. Such a file descriptor can uniquely identify the spawned process, be used to wait on the process and get the status, send signals, and even participate in `poll(2)`. They also protect against pid-reuse race conditions; after a process has terminated and been reaped, the pidfd becomes invalid, and can’t randomly begin to refer to a different process. ## Proposed Ruby APIs I think we should make a new API `Process.spawn_handle`, which accepts all the same parameters as `Process.spawn`. However, it does _not_ return a pid like `Process.spawn`, but rather a new type `Process::Handle`. `Process::Handle` would identify a single spawned process, using a durable OS-supplied handle not subject to re-use risks (e.g. a pidfd). It would provide the following methods: * `#pid` - get the pid that the handle is for. * `#send_signal(signal)` - send a signal to the wrapped process (where "signal" is a symbol, string, or number with the same meaning as in `Process.kill`. * `#wait` - blocks waiting for the program to exit, and then returns a `Process::Status` object representing e.g. the exit code. Like calling `waitpid`. * `#wait_nonblock` - Returns a `Process::Status` object for the child process. If the child has not exited, it will be a status object for which `#exited?` is false. Does not block. Like calling `waitpid(WNOHANG)`. Finally, the `Open3` family of methods would be extended to accept `handle:` as an additional keyword argument. When set to true, `Process.spawn_handle` will be used to start the child, and `Process::Handle` objects will be returned in place of pids. Modifying backticks, `Kernel#system` and other process-creating methods which don't return pids to use `spawn_handle` internally would also be possible, but out of scope for an initial implementation of this ticket. ## OS compatibility For this API to be useful to gem authors, it has to be widely available on the systems that they and their users care about. As discussed, the `clone(2)` syscall and `CLONE_PIDFD` flag can be used on Linux 5.2+ to implement `Process::Handle`. FreeBSD has `pdfork(2)` since v9. I haven’t investigated Windows _deeply_, but I think Windows doesn’t really have the notion of process-global `waitpid` or `SIGCHLD` anyway. The `CreateProcess` function returns a `PROCESS_INFORMATION` struct, which returns a `HANDLE` for the child process, which seems analogous to a process FD. However this does leave a large chunk of operating systems which don’t have this functionality built-in. Off the top of my head: * MacOS, NetBSD, and OpenBSD have nothing. I stared pretty hard at the Darwin XNU source and couldn’t find a race-free way to convince it not to dispatch `SIGCHLD` for a particular process or stop it from being reaped by process-wide `wait4` calls. * Linux < 5.2 is in some probably-pretty-widely-deployed-still distros - it’s the release kernel in Ubuntu 18.04 for example. I have two ideas for how the semantics of `Process::Handle` could be emulated on such systems. However I recognise that they aren’t amazing so if anybody has some better ideas I would dearly love to hear them. ### Long-lived proxy The first time `Process.spawn_handle` is used, we would fork/exec a long-lived “fork-helper” program. This could be a separate helper binary we compile with the build system, or perhaps just a re-invocation of the ruby interpreter with something like `ruby -e “Process._fork_helper”`. There would be a unix socketpair shared between the parent process & the helper. Instead of actually forking when we’re calling `Process.spawn_handle`, we would instead send a message on this socket asking the helper to, _itself_, fork & exec the specified program. Any file descriptors etc needed in the child could also be sent over this socket. All of the `Process::Handle` methods would be proxies which called through to the helper binary. This way, the ruby process is never actually the parent of the spawned child, so we would never get any SIGCHLD etc from it. The fork-helper program might generate a SIGCHLD, but it should persist until the ruby process exits; we would only generate a SIGCHLD signal if it crashed abnormally. ### Forward misdirected waits With this approach, `Process.spawn_handle` would just `fork(2)`/`exec(2)` or `posix_spawn(2)` processes like normal. We would however keep a table of pids -> `Process::Handle` instances. When Ruby’s C-level SIGCHLD handler is invoked, we would inspect that table and see if the pid has an associated `Process::Handle`. If so, we would skip calling any registered Ruby SIGCHLD handler; instead, we would call `waitpid` ourselves, update the status info on the handle object, and unblock anybody waiting on `Process::Handle#wait`. Likewise, in the C-side implementation of `Process.waitpid2` etc, we would check the returned pid from the syscall against the handle table. If it matched, we would perform the same work as in the SIGCHLD case, and then re-start the original call to `Process.waitpid2`. This approach keeps the process tree correct and involves less silly proxying, but it won’t hide the process from any callers to the raw `waitpid` library functions in C extensions. Doing that seems like a silly idea anyway though, so maybe that’s OK? ## Motivation My use-case for this is that I’m working on a perf-based profiling tool for Ruby. To get around some Linux capability issues, I want my profiler gem (or CRuby patch, whatever it winds up being!) to fork a privileged helper binary to do some eBPF twiddling. But, if you’re profiling e.g. a Unicorn master process, the result of that binary exiting might be caught by Unicorn itself, rather than my (gem | interpreter feature). In my case, I'm so deep in linux specific stuff that just calling `clone(2)` from my extension is probably fine, but I had enough of a look at this process management stuff I thought it would be worth asking the question if this might be useful to other, more normal, gems. -- https://bugs.ruby-lang.org/

Issue #19322 has been updated by kjtsanaktsidis (KJ Tsanaktsidis). I did think about something in that shape @nobu for unsupported systems, but I think there are two problems - - SIGCHLD will still be received, which is undesirable on its own - if a different thread is running `Process.waitpid2(-1)` concurrently, there is guarantee who will reap the intermediate parent process - their call or our call to waitpid. ---------------------------------------- Feature #19322: Support spawning "private" child processes https://bugs.ruby-lang.org/issues/19322#change-101115 * Author: kjtsanaktsidis (KJ Tsanaktsidis) * Status: Open * Priority: Normal ---------------------------------------- ## Background The traditional Unix process APIs (`fork` etc) are poorly isolated. If a library spawns a child process, this is not transparent to the program using the library. Any signal handler for `SIGCHLD` in the program will be called when the spawned process exits, and even worse, if the parent calls `Process.waitpid2(-1)`, it will consume the returned status code, stealing it from the library! Unfortunately, the practice of responding to `SIGCHLD` by calling `waitpid2(-1)` in a loop is a pretty common unixism. For example, Unicorn does it [here](https://yhbt.net/unicorn.git/tree/lib/unicorn/http_server.rb#n401). In short, there is no reliable way for a gem to spawn a child process in a way that can’t (unintentionally) be interfered with by other parts of the program. ## Existing solutions in OS’s Several operating systems provide an improved API for spawning child processes which are fully isolated; that is, they do not generate `SIGCHLD` signals in the program, and are invisible to calls to `waitpid(2)` * On Linux, such invisible processes can be made by calling `clone(2)` with a zero value in the low byte of `flags`. If the CLONE_PIDFD flag is also provided, then a file descriptor representing the process is also returned; this can be used to wait for and signal the process in a race-free way. * On FreeBSD, the `pdfork(2)` syscall makes a process that does not signal SIGCHLD and is ignored by `waitpid(2)` calls that do not explicitly specify the pid (i.e. it is ignored when -1 is passed). It also returns a file descriptor representing the process. Both of these APIs center around the idea of a process file descriptor. Rather than managing a child process using the old process-global wait/signal mechanisms, they return a file descriptor representing the process. Such a file descriptor can uniquely identify the spawned process, be used to wait on the process and get the status, send signals, and even participate in `poll(2)`. They also protect against pid-reuse race conditions; after a process has terminated and been reaped, the pidfd becomes invalid, and can’t randomly begin to refer to a different process. ## Proposed Ruby APIs I think we should make a new API `Process.spawn_handle`, which accepts all the same parameters as `Process.spawn`. However, it does _not_ return a pid like `Process.spawn`, but rather a new type `Process::Handle`. `Process::Handle` would identify a single spawned process, using a durable OS-supplied handle not subject to re-use risks (e.g. a pidfd). It would provide the following methods: * `#pid` - get the pid that the handle is for. * `#send_signal(signal)` - send a signal to the wrapped process (where "signal" is a symbol, string, or number with the same meaning as in `Process.kill`. * `#wait` - blocks waiting for the program to exit, and then returns a `Process::Status` object representing e.g. the exit code. Like calling `waitpid`. * `#wait_nonblock` - Returns a `Process::Status` object for the child process. If the child has not exited, it will be a status object for which `#exited?` is false. Does not block. Like calling `waitpid(WNOHANG)`. Finally, the `Open3` family of methods would be extended to accept `handle:` as an additional keyword argument. When set to true, `Process.spawn_handle` will be used to start the child, and `Process::Handle` objects will be returned in place of pids. Modifying backticks, `Kernel#system` and other process-creating methods which don't return pids to use `spawn_handle` internally would also be possible, but out of scope for an initial implementation of this ticket. ## OS compatibility For this API to be useful to gem authors, it has to be widely available on the systems that they and their users care about. As discussed, the `clone(2)` syscall and `CLONE_PIDFD` flag can be used on Linux 5.2+ to implement `Process::Handle`. FreeBSD has `pdfork(2)` since v9. I haven’t investigated Windows _deeply_, but I think Windows doesn’t really have the notion of process-global `waitpid` or `SIGCHLD` anyway. The `CreateProcess` function returns a `PROCESS_INFORMATION` struct, which returns a `HANDLE` for the child process, which seems analogous to a process FD. However this does leave a large chunk of operating systems which don’t have this functionality built-in. Off the top of my head: * MacOS, NetBSD, and OpenBSD have nothing. I stared pretty hard at the Darwin XNU source and couldn’t find a race-free way to convince it not to dispatch `SIGCHLD` for a particular process or stop it from being reaped by process-wide `wait4` calls. * Linux < 5.2 is in some probably-pretty-widely-deployed-still distros - it’s the release kernel in Ubuntu 18.04 for example. I have two ideas for how the semantics of `Process::Handle` could be emulated on such systems. However I recognise that they aren’t amazing so if anybody has some better ideas I would dearly love to hear them. ### Long-lived proxy The first time `Process.spawn_handle` is used, we would fork/exec a long-lived “fork-helper” program. This could be a separate helper binary we compile with the build system, or perhaps just a re-invocation of the ruby interpreter with something like `ruby -e “Process._fork_helper”`. There would be a unix socketpair shared between the parent process & the helper. Instead of actually forking when we’re calling `Process.spawn_handle`, we would instead send a message on this socket asking the helper to, _itself_, fork & exec the specified program. Any file descriptors etc needed in the child could also be sent over this socket. All of the `Process::Handle` methods would be proxies which called through to the helper binary. This way, the ruby process is never actually the parent of the spawned child, so we would never get any SIGCHLD etc from it. The fork-helper program might generate a SIGCHLD, but it should persist until the ruby process exits; we would only generate a SIGCHLD signal if it crashed abnormally. ### Forward misdirected waits With this approach, `Process.spawn_handle` would just `fork(2)`/`exec(2)` or `posix_spawn(2)` processes like normal. We would however keep a table of pids -> `Process::Handle` instances. When Ruby’s C-level SIGCHLD handler is invoked, we would inspect that table and see if the pid has an associated `Process::Handle`. If so, we would skip calling any registered Ruby SIGCHLD handler; instead, we would call `waitpid` ourselves, update the status info on the handle object, and unblock anybody waiting on `Process::Handle#wait`. Likewise, in the C-side implementation of `Process.waitpid2` etc, we would check the returned pid from the syscall against the handle table. If it matched, we would perform the same work as in the SIGCHLD case, and then re-start the original call to `Process.waitpid2`. This approach keeps the process tree correct and involves less silly proxying, but it won’t hide the process from any callers to the raw `waitpid` library functions in C extensions. Doing that seems like a silly idea anyway though, so maybe that’s OK? ## Motivation My use-case for this is that I’m working on a perf-based profiling tool for Ruby. To get around some Linux capability issues, I want my profiler gem (or CRuby patch, whatever it winds up being!) to fork a privileged helper binary to do some eBPF twiddling. But, if you’re profiling e.g. a Unicorn master process, the result of that binary exiting might be caught by Unicorn itself, rather than my (gem | interpreter feature). In my case, I'm so deep in linux specific stuff that just calling `clone(2)` from my extension is probably fine, but I had enough of a look at this process management stuff I thought it would be worth asking the question if this might be useful to other, more normal, gems. -- https://bugs.ruby-lang.org/

Issue #19322 has been updated by nobu (Nobuyoshi Nakada). I'm not very positive to implement platform specific methods, and rather suggest to create a gem as the first step. ---------------------------------------- Feature #19322: Support spawning "private" child processes https://bugs.ruby-lang.org/issues/19322#change-101116 * Author: kjtsanaktsidis (KJ Tsanaktsidis) * Status: Open * Priority: Normal ---------------------------------------- ## Background The traditional Unix process APIs (`fork` etc) are poorly isolated. If a library spawns a child process, this is not transparent to the program using the library. Any signal handler for `SIGCHLD` in the program will be called when the spawned process exits, and even worse, if the parent calls `Process.waitpid2(-1)`, it will consume the returned status code, stealing it from the library! Unfortunately, the practice of responding to `SIGCHLD` by calling `waitpid2(-1)` in a loop is a pretty common unixism. For example, Unicorn does it [here](https://yhbt.net/unicorn.git/tree/lib/unicorn/http_server.rb#n401). In short, there is no reliable way for a gem to spawn a child process in a way that can’t (unintentionally) be interfered with by other parts of the program. ## Existing solutions in OS’s Several operating systems provide an improved API for spawning child processes which are fully isolated; that is, they do not generate `SIGCHLD` signals in the program, and are invisible to calls to `waitpid(2)` * On Linux, such invisible processes can be made by calling `clone(2)` with a zero value in the low byte of `flags`. If the CLONE_PIDFD flag is also provided, then a file descriptor representing the process is also returned; this can be used to wait for and signal the process in a race-free way. * On FreeBSD, the `pdfork(2)` syscall makes a process that does not signal SIGCHLD and is ignored by `waitpid(2)` calls that do not explicitly specify the pid (i.e. it is ignored when -1 is passed). It also returns a file descriptor representing the process. Both of these APIs center around the idea of a process file descriptor. Rather than managing a child process using the old process-global wait/signal mechanisms, they return a file descriptor representing the process. Such a file descriptor can uniquely identify the spawned process, be used to wait on the process and get the status, send signals, and even participate in `poll(2)`. They also protect against pid-reuse race conditions; after a process has terminated and been reaped, the pidfd becomes invalid, and can’t randomly begin to refer to a different process. ## Proposed Ruby APIs I think we should make a new API `Process.spawn_handle`, which accepts all the same parameters as `Process.spawn`. However, it does _not_ return a pid like `Process.spawn`, but rather a new type `Process::Handle`. `Process::Handle` would identify a single spawned process, using a durable OS-supplied handle not subject to re-use risks (e.g. a pidfd). It would provide the following methods: * `#pid` - get the pid that the handle is for. * `#send_signal(signal)` - send a signal to the wrapped process (where "signal" is a symbol, string, or number with the same meaning as in `Process.kill`. * `#wait` - blocks waiting for the program to exit, and then returns a `Process::Status` object representing e.g. the exit code. Like calling `waitpid`. * `#wait_nonblock` - Returns a `Process::Status` object for the child process. If the child has not exited, it will be a status object for which `#exited?` is false. Does not block. Like calling `waitpid(WNOHANG)`. Finally, the `Open3` family of methods would be extended to accept `handle:` as an additional keyword argument. When set to true, `Process.spawn_handle` will be used to start the child, and `Process::Handle` objects will be returned in place of pids. Modifying backticks, `Kernel#system` and other process-creating methods which don't return pids to use `spawn_handle` internally would also be possible, but out of scope for an initial implementation of this ticket. ## OS compatibility For this API to be useful to gem authors, it has to be widely available on the systems that they and their users care about. As discussed, the `clone(2)` syscall and `CLONE_PIDFD` flag can be used on Linux 5.2+ to implement `Process::Handle`. FreeBSD has `pdfork(2)` since v9. I haven’t investigated Windows _deeply_, but I think Windows doesn’t really have the notion of process-global `waitpid` or `SIGCHLD` anyway. The `CreateProcess` function returns a `PROCESS_INFORMATION` struct, which returns a `HANDLE` for the child process, which seems analogous to a process FD. However this does leave a large chunk of operating systems which don’t have this functionality built-in. Off the top of my head: * MacOS, NetBSD, and OpenBSD have nothing. I stared pretty hard at the Darwin XNU source and couldn’t find a race-free way to convince it not to dispatch `SIGCHLD` for a particular process or stop it from being reaped by process-wide `wait4` calls. * Linux < 5.2 is in some probably-pretty-widely-deployed-still distros - it’s the release kernel in Ubuntu 18.04 for example. I have two ideas for how the semantics of `Process::Handle` could be emulated on such systems. However I recognise that they aren’t amazing so if anybody has some better ideas I would dearly love to hear them. ### Long-lived proxy The first time `Process.spawn_handle` is used, we would fork/exec a long-lived “fork-helper” program. This could be a separate helper binary we compile with the build system, or perhaps just a re-invocation of the ruby interpreter with something like `ruby -e “Process._fork_helper”`. There would be a unix socketpair shared between the parent process & the helper. Instead of actually forking when we’re calling `Process.spawn_handle`, we would instead send a message on this socket asking the helper to, _itself_, fork & exec the specified program. Any file descriptors etc needed in the child could also be sent over this socket. All of the `Process::Handle` methods would be proxies which called through to the helper binary. This way, the ruby process is never actually the parent of the spawned child, so we would never get any SIGCHLD etc from it. The fork-helper program might generate a SIGCHLD, but it should persist until the ruby process exits; we would only generate a SIGCHLD signal if it crashed abnormally. ### Forward misdirected waits With this approach, `Process.spawn_handle` would just `fork(2)`/`exec(2)` or `posix_spawn(2)` processes like normal. We would however keep a table of pids -> `Process::Handle` instances. When Ruby’s C-level SIGCHLD handler is invoked, we would inspect that table and see if the pid has an associated `Process::Handle`. If so, we would skip calling any registered Ruby SIGCHLD handler; instead, we would call `waitpid` ourselves, update the status info on the handle object, and unblock anybody waiting on `Process::Handle#wait`. Likewise, in the C-side implementation of `Process.waitpid2` etc, we would check the returned pid from the syscall against the handle table. If it matched, we would perform the same work as in the SIGCHLD case, and then re-start the original call to `Process.waitpid2`. This approach keeps the process tree correct and involves less silly proxying, but it won’t hide the process from any callers to the raw `waitpid` library functions in C extensions. Doing that seems like a silly idea anyway though, so maybe that’s OK? ## Motivation My use-case for this is that I’m working on a perf-based profiling tool for Ruby. To get around some Linux capability issues, I want my profiler gem (or CRuby patch, whatever it winds up being!) to fork a privileged helper binary to do some eBPF twiddling. But, if you’re profiling e.g. a Unicorn master process, the result of that binary exiting might be caught by Unicorn itself, rather than my (gem | interpreter feature). In my case, I'm so deep in linux specific stuff that just calling `clone(2)` from my extension is probably fine, but I had enough of a look at this process management stuff I thought it would be worth asking the question if this might be useful to other, more normal, gems. -- https://bugs.ruby-lang.org/

Issue #19322 has been updated by kjtsanaktsidis (KJ Tsanaktsidis). Apologies if I wasn’t clear, but I definitely don’t intend for the proposed interface to be platform specific. It would make use of clone/pdfork if they were available, but the fallback implementation (either the “proxy everything through an intermediate server” one or the “trap all calls to waitpid2 and lie about them” one) would be used elsewhere. On all platforms the observed behaviour of `Process::Handle` should be the same I think. I can try an implementation in a gem, the fallback should be possible from a gem by monkey patching the relevant process methods I think ---------------------------------------- Feature #19322: Support spawning "private" child processes https://bugs.ruby-lang.org/issues/19322#change-101117 * Author: kjtsanaktsidis (KJ Tsanaktsidis) * Status: Open * Priority: Normal ---------------------------------------- ## Background The traditional Unix process APIs (`fork` etc) are poorly isolated. If a library spawns a child process, this is not transparent to the program using the library. Any signal handler for `SIGCHLD` in the program will be called when the spawned process exits, and even worse, if the parent calls `Process.waitpid2(-1)`, it will consume the returned status code, stealing it from the library! Unfortunately, the practice of responding to `SIGCHLD` by calling `waitpid2(-1)` in a loop is a pretty common unixism. For example, Unicorn does it [here](https://yhbt.net/unicorn.git/tree/lib/unicorn/http_server.rb#n401). In short, there is no reliable way for a gem to spawn a child process in a way that can’t (unintentionally) be interfered with by other parts of the program. ## Existing solutions in OS’s Several operating systems provide an improved API for spawning child processes which are fully isolated; that is, they do not generate `SIGCHLD` signals in the program, and are invisible to calls to `waitpid(2)` * On Linux, such invisible processes can be made by calling `clone(2)` with a zero value in the low byte of `flags`. If the CLONE_PIDFD flag is also provided, then a file descriptor representing the process is also returned; this can be used to wait for and signal the process in a race-free way. * On FreeBSD, the `pdfork(2)` syscall makes a process that does not signal SIGCHLD and is ignored by `waitpid(2)` calls that do not explicitly specify the pid (i.e. it is ignored when -1 is passed). It also returns a file descriptor representing the process. Both of these APIs center around the idea of a process file descriptor. Rather than managing a child process using the old process-global wait/signal mechanisms, they return a file descriptor representing the process. Such a file descriptor can uniquely identify the spawned process, be used to wait on the process and get the status, send signals, and even participate in `poll(2)`. They also protect against pid-reuse race conditions; after a process has terminated and been reaped, the pidfd becomes invalid, and can’t randomly begin to refer to a different process. ## Proposed Ruby APIs I think we should make a new API `Process.spawn_handle`, which accepts all the same parameters as `Process.spawn`. However, it does _not_ return a pid like `Process.spawn`, but rather a new type `Process::Handle`. `Process::Handle` would identify a single spawned process, using a durable OS-supplied handle not subject to re-use risks (e.g. a pidfd). It would provide the following methods: * `#pid` - get the pid that the handle is for. * `#send_signal(signal)` - send a signal to the wrapped process (where "signal" is a symbol, string, or number with the same meaning as in `Process.kill`. * `#wait` - blocks waiting for the program to exit, and then returns a `Process::Status` object representing e.g. the exit code. Like calling `waitpid`. * `#wait_nonblock` - Returns a `Process::Status` object for the child process. If the child has not exited, it will be a status object for which `#exited?` is false. Does not block. Like calling `waitpid(WNOHANG)`. Finally, the `Open3` family of methods would be extended to accept `handle:` as an additional keyword argument. When set to true, `Process.spawn_handle` will be used to start the child, and `Process::Handle` objects will be returned in place of pids. Modifying backticks, `Kernel#system` and other process-creating methods which don't return pids to use `spawn_handle` internally would also be possible, but out of scope for an initial implementation of this ticket. ## OS compatibility For this API to be useful to gem authors, it has to be widely available on the systems that they and their users care about. As discussed, the `clone(2)` syscall and `CLONE_PIDFD` flag can be used on Linux 5.2+ to implement `Process::Handle`. FreeBSD has `pdfork(2)` since v9. I haven’t investigated Windows _deeply_, but I think Windows doesn’t really have the notion of process-global `waitpid` or `SIGCHLD` anyway. The `CreateProcess` function returns a `PROCESS_INFORMATION` struct, which returns a `HANDLE` for the child process, which seems analogous to a process FD. However this does leave a large chunk of operating systems which don’t have this functionality built-in. Off the top of my head: * MacOS, NetBSD, and OpenBSD have nothing. I stared pretty hard at the Darwin XNU source and couldn’t find a race-free way to convince it not to dispatch `SIGCHLD` for a particular process or stop it from being reaped by process-wide `wait4` calls. * Linux < 5.2 is in some probably-pretty-widely-deployed-still distros - it’s the release kernel in Ubuntu 18.04 for example. I have two ideas for how the semantics of `Process::Handle` could be emulated on such systems. However I recognise that they aren’t amazing so if anybody has some better ideas I would dearly love to hear them. ### Long-lived proxy The first time `Process.spawn_handle` is used, we would fork/exec a long-lived “fork-helper” program. This could be a separate helper binary we compile with the build system, or perhaps just a re-invocation of the ruby interpreter with something like `ruby -e “Process._fork_helper”`. There would be a unix socketpair shared between the parent process & the helper. Instead of actually forking when we’re calling `Process.spawn_handle`, we would instead send a message on this socket asking the helper to, _itself_, fork & exec the specified program. Any file descriptors etc needed in the child could also be sent over this socket. All of the `Process::Handle` methods would be proxies which called through to the helper binary. This way, the ruby process is never actually the parent of the spawned child, so we would never get any SIGCHLD etc from it. The fork-helper program might generate a SIGCHLD, but it should persist until the ruby process exits; we would only generate a SIGCHLD signal if it crashed abnormally. ### Forward misdirected waits With this approach, `Process.spawn_handle` would just `fork(2)`/`exec(2)` or `posix_spawn(2)` processes like normal. We would however keep a table of pids -> `Process::Handle` instances. When Ruby’s C-level SIGCHLD handler is invoked, we would inspect that table and see if the pid has an associated `Process::Handle`. If so, we would skip calling any registered Ruby SIGCHLD handler; instead, we would call `waitpid` ourselves, update the status info on the handle object, and unblock anybody waiting on `Process::Handle#wait`. Likewise, in the C-side implementation of `Process.waitpid2` etc, we would check the returned pid from the syscall against the handle table. If it matched, we would perform the same work as in the SIGCHLD case, and then re-start the original call to `Process.waitpid2`. This approach keeps the process tree correct and involves less silly proxying, but it won’t hide the process from any callers to the raw `waitpid` library functions in C extensions. Doing that seems like a silly idea anyway though, so maybe that’s OK? ## Motivation My use-case for this is that I’m working on a perf-based profiling tool for Ruby. To get around some Linux capability issues, I want my profiler gem (or CRuby patch, whatever it winds up being!) to fork a privileged helper binary to do some eBPF twiddling. But, if you’re profiling e.g. a Unicorn master process, the result of that binary exiting might be caught by Unicorn itself, rather than my (gem | interpreter feature). In my case, I'm so deep in linux specific stuff that just calling `clone(2)` from my extension is probably fine, but I had enough of a look at this process management stuff I thought it would be worth asking the question if this might be useful to other, more normal, gems. -- https://bugs.ruby-lang.org/

Issue #19322 has been updated by kjtsanaktsidis (KJ Tsanaktsidis). So I ran into a pretty serious snag when trying to implement my idea with Linux pidfds - you can make a hidden process which is unwaitable & doesn't raise SIGCHLD, but if that process exec's, those special properties go away and it goes back to behaving just like a normal child process. That led me to look more carefully at Ruby's current handling of SIGCHLD/waitpid, and I think there's room in there to implement the API I proposed without leaning on any new system API's beyond SIGCHLD & waitpid. Plus, my proposal would clean up a bit of special-casing for mjit which is currently floating around inside `process.c`. WDYT @nobu? ---------------------------------------- Feature #19322: Support spawning "private" child processes https://bugs.ruby-lang.org/issues/19322#change-101192 * Author: kjtsanaktsidis (KJ Tsanaktsidis) * Status: Open * Priority: Normal ---------------------------------------- ## Background The traditional Unix process APIs (`fork` etc) are poorly isolated. If a library spawns a child process, this is not transparent to the program using the library. Any signal handler for `SIGCHLD` in the program will be called when the spawned process exits, and even worse, if the parent calls `Process.waitpid2(-1)`, it will consume the returned status code, stealing it from the library! Unfortunately, the practice of responding to `SIGCHLD` by calling `waitpid2(-1)` in a loop is a pretty common unixism. For example, Unicorn does it [here](https://yhbt.net/unicorn.git/tree/lib/unicorn/http_server.rb#n401). In short, there is no reliable way for a gem to spawn a child process in a way that can’t (unintentionally) be interfered with by other parts of the program. ## Problem statement Consider the following program. ```ruby # Imagine this part of the program is in some top-level application event loop # or something - similar to how Unicorn works. It detects child processes exiting # and takes some action (possibly restarting a crashed worker, for example). Signal.trap(:CHLD) do loop do begin pid, status = Process.waitpid2 -1 puts "Signal handler reaped #{pid} #{status.inspect}" rescue Errno::ECHILD puts "Signal handler reaped nothing" break end end end # Imagine that _this_ part of the program is buried deep in some gem. It knows # nothing about the application SIGCHLD handling, and quite possibly the application # author might not even know this gem spawns a child process to do its work! require 'open3' loop do o, status = Open3.capture2("/bin/sh", "-c", "echo 'hello'") puts "ran command, got #{o.chomp} #{status.inspect}" end ``` In current versions of Ruby, _some_ loop iterations will function correctly, and print something like this. The gem gets the `Process::Status` object from its command and can know if e.g. it exited abnormally. ``` ran command, got ohaithar #<Process::Status: pid 1153687 exit 0> Signal handler reaped nothing ``` However, other iterations of the loop print this. The signal handler runs and calls `Process.waitpid2(-1)` before the code in open3 can do so. Then, the gem code does not get a `Process::Status` object! This is also potentially bad for the application; it reaped a child process it didn't even know existed, and it might cause some surprising bugs if the application author didn't know this was a possibility. ``` Signal handler reaped 1153596 #<Process::Status: pid 1153596 exit 0> Signal handler reaped nothing ran command, got ohaithar nil ``` We would like a family of APIs which allow a gem to spawn a child process and guarantees that the gem can wait on it. Some concurrent call to `Process.waitpid2(-1)` (or even `Process.waitpid2($some_lucky_guess_for_the_pid)`) should not steal the status out from underneath the code which created the process. Ideally, we should even suppress the `SIGCHLD` signal to avoid the application signal handler needlessly waking up. ## Proposed Ruby-level APIs. I propose we create the following new methods in Ruby. * `Process.spawn_private` * `Process.fork_private` These methods behave identically to their non-_private versions in all respect, except instead of returning a pid, they return an object of type `Process::PrivateHandle`. `Process::PrivateHandle` would have the following methods: * `pid()` - returns the pid for the created process * `wait()` - blocks the caller until the created process has exited, and returns a `Process::Status` object. If the handle has _already_ had `#wait` called on it, it returns the same `Process::Status` object as was returned then immediately. This is unlike `Process.waitpid` and friends, which would raise an ECHILD in this case (or, in the face of pid wraparound, potentially wait on some other totally unrelated child process with the same pid). * `wait_nonblock()` - if the created process has exited, behaves like `#wait`; otherwise, it returns a `Process::Status` object for which `#exited?` returns false. * `kill(...)` - if the created process has not been reaped via a call to `#wait`, performs identically to `Process.kill ..., pid`. Otherwise, if the process _has_ been reaped, raises `Errno::ESRCH` immediately without issuing a system call. This ensures that, if pids wrap around, that the wrong process is not signaled by mistake. A call to `Process.wait`, `Process.waitpid`, or `Process.waitpid2` will _never_ return a `Process::Status` for a process started with a `_private` method, even if that call is made with the pid of the child process. The _only_ way to reap a private child process is through `Process::PrivateHandle`. The implementation of `IO.popen`, `Kernel#system`, `Kernel#popen`, backticks, and the `Open3` module would be changed to use this private process mechanism internally, although they do not return pids so they do not need to have their interfaces changed. (note though - I don't believe `Kernel#system` suffers from the same problem as the `open3` example above, because it does not yield the GVL nor check interrupts in between spawning the child and waiting on it) ## Implementation strategy I believe this can be implemented, in broad strokes, with an approach like this: * Keep a global table mapping pids -> handles for processes created with `fork_private` or `spawn_private`. * When a child process is waited on, consult the handle table. If there is a handle registered, and the wait call was made without the handle, do NOT return the reaped status. Instead, save the status against the handle, and repeat the call to `waitpid`. * If the wait call _was_ made with the handle, we can return the * Once a handle has had the child status saved against it, it is removed from the table. * A subsequent call to wait on that pi the handle will look up the saved information and return it without making a system call. In fact, most of the infrastructure to do this correctly is already in place - it was added by @k0kubun and @normalperson four years ago - https://bugs.ruby-lang.org/issues/14867. MJIT had a similar problem to the one described in this issue; it needs to fork a C compiler, but if the application performs a `Process.waitpid2(-1)`, it could wind up reaping the gcc process out from underneath mjit. This code has changed considerably over the course of last year, but my understanding is that mjit still uses this infrastructure to protect its Ruby child-process from becoming visible to Ruby code. In any case, the way waitpid works _currently_, is that... * Ruby actually does all calls to `waitpid` as `WNOHANG` (i.e. nonblocking) internally. * If a call to `waitpid` finds no children, it blocks the thread, representing the state in a structure of type `struct waitpid_state`. * Ruby also keeps a list of all `waitpid_state`'s that are currently being waited for, `vm->waiting_pids` and `vm->waiting_grps`. * These structures are protected with a specific mutex, `vm->waitpid_lock`. * Ruby internally uses the SIGCHLD signal to reap the dead children, and then find a waiting call to `waitpid` (via the two lists) to actually dispatch the reaped status to. * If some caller is waiting for a specific pid, that _always_ takes priority over some other caller that's waiting for a pid-group (e.g. `-1`). mjit's child process is protected, because: * When mjit forks, it uses a method `rb_mjit_fork` to do so. * That calls the actual `fork` implementation _whilst still holding_ `vm->waitpid_lock` * Before yielding the lock, it inserts an entry in `vm->waiting_pids` saying that mjit is waiting for the just-created child. * Since direct waits for pids always take precedence over pid-groups, this ensures that mjit will always reap its own children. I believe this mechanism can be extended and generalised to power the proposed API, and mjit could itself use that rather than having mjit-specific handling in `process.c`. ## POC implementation I sketched out a _very_ rough POC to see if what I said above would be possible, and I think it is: https://github.com/ruby/ruby/commit/6009c564b16862001535f2b561f1a12f6e7e0c57 The following script behaves how I expect with this patch: ```ruby pid, h = Process.spawn_private "/bin/sh", "-c", "sleep 1; exit 69" puts "pid -> #{pid}" puts "h -> #{h}" # should ESRCH. sleep 2 begin Process.waitpid2 -1 rescue => e puts "waitpid err -> #{e}" end wpid, status = h.wait puts "wpid -> #{wpid}" puts "status -> #{status.inspect}" ``` ``` ktsanaktsidis@lima-linux1 ruby % ./tool/runruby.rb -- ./tst1.rb pid -> 1154105 h -> #<Process::PrivateHandle:0x0000ffff94014098> waitpid err -> No child processes wpid -> 1154105 status -> #<Process::Status: pid 1154105 exit 4> ``` The child process can be waited on with the handle, and the call to `waitpid2(-1)` finds nothing. ## Previous idea: OS-specific handles My first version of this proposal involved a similar API, but powering it with platform-specific concepts available on Linux, Windows, and FreeBSD which offer richer control than just pids & the `wait` syscall. In particular, I had believed that we could use the `clone` syscall in Linux to create a child process which: * Could be referred to by a unique file descriptor (a pidfd) which would be guaranteed never to be re-used (unlike a pid), * Would not generate a signal when it exited (i.e. no SIGCHLD). * Could not be waited on by an unspecting to `waitpid` (except if a special flag `__WCLONE` as passed). Unfortunately, when I tried to implement this, I ran into a pretty serious snag. It is possible to create such a process - BUT, when the process exec's, it goes _back_ to "raise-SIGCHLD-on-exit" and "allow-waiting-without-__WCLONE" modes. I guess this functionality in the clone syscall is really designed to power threads in Linux, rather than being a general-purpose "hidden process" API. So, I don't think we should use pidfds in this proposal. ## Motivation My use-case for this is that I’m working on a perf-based profiling tool for Ruby. To get around some Linux capability issues, I want my profiler gem (or CRuby patch, whatever it winds up being!) to fork a privileged helper binary to do some eBPF twiddling. But, if you’re profiling e.g. a Unicorn master process, the result of that binary exiting might be caught by Unicorn itself, rather than my (gem | interpreter feature). In my case, I'm so deep in linux specific stuff that just calling `clone(2)` from my extension is probably fine, but I had enough of a look at this process management stuff I thought it would be worth asking the question if this might be useful to other, more normal, gems. -- https://bugs.ruby-lang.org/

Issue #19322 has been updated by Eregon (Benoit Daloze). IMHO this sounds like some code is doing bad stuff and not properly caring about its own resources. In the example you shown, I believe it's none of Unicorn's business to reap arbitrary processes, it doesn't compose (I could be wrong, but this seems a general rule when it come to resources of a program: don't mess with what you don't own). Unicorn should keep a list of pid subprocesses it created, and only do something on the `Signal.trap(:CHLD) do` if it's one of these pids. I think it's a very frequent pattern to track pids of subprocesses and connect to trap handlers. IMHO the proposed API are way too big and invasive to workaround a bad library which does Process.waitall. The proposed workaround in https://bugs.ruby-lang.org/issues/19322#note-1 doesn't seem too bad. ---------------------------------------- Feature #19322: Support spawning "private" child processes https://bugs.ruby-lang.org/issues/19322#change-101199 * Author: kjtsanaktsidis (KJ Tsanaktsidis) * Status: Open * Priority: Normal ---------------------------------------- ## Background The traditional Unix process APIs (`fork` etc) are poorly isolated. If a library spawns a child process, this is not transparent to the program using the library. Any signal handler for `SIGCHLD` in the program will be called when the spawned process exits, and even worse, if the parent calls `Process.waitpid2(-1)`, it will consume the returned status code, stealing it from the library! Unfortunately, the practice of responding to `SIGCHLD` by calling `waitpid2(-1)` in a loop is a pretty common unixism. For example, Unicorn does it [here](https://yhbt.net/unicorn.git/tree/lib/unicorn/http_server.rb#n401). In short, there is no reliable way for a gem to spawn a child process in a way that can’t (unintentionally) be interfered with by other parts of the program. ## Problem statement Consider the following program. ```ruby # Imagine this part of the program is in some top-level application event loop # or something - similar to how Unicorn works. It detects child processes exiting # and takes some action (possibly restarting a crashed worker, for example). Signal.trap(:CHLD) do loop do begin pid, status = Process.waitpid2 -1 puts "Signal handler reaped #{pid} #{status.inspect}" rescue Errno::ECHILD puts "Signal handler reaped nothing" break end end end # Imagine that _this_ part of the program is buried deep in some gem. It knows # nothing about the application SIGCHLD handling, and quite possibly the application # author might not even know this gem spawns a child process to do its work! require 'open3' loop do o, status = Open3.capture2("/bin/sh", "-c", "echo 'hello'") puts "ran command, got #{o.chomp} #{status.inspect}" end ``` In current versions of Ruby, _some_ loop iterations will function correctly, and print something like this. The gem gets the `Process::Status` object from its command and can know if e.g. it exited abnormally. ``` ran command, got ohaithar #<Process::Status: pid 1153687 exit 0> Signal handler reaped nothing ``` However, other iterations of the loop print this. The signal handler runs and calls `Process.waitpid2(-1)` before the code in open3 can do so. Then, the gem code does not get a `Process::Status` object! This is also potentially bad for the application; it reaped a child process it didn't even know existed, and it might cause some surprising bugs if the application author didn't know this was a possibility. ``` Signal handler reaped 1153596 #<Process::Status: pid 1153596 exit 0> Signal handler reaped nothing ran command, got ohaithar nil ``` We would like a family of APIs which allow a gem to spawn a child process and guarantees that the gem can wait on it. Some concurrent call to `Process.waitpid2(-1)` (or even `Process.waitpid2($some_lucky_guess_for_the_pid)`) should not steal the status out from underneath the code which created the process. Ideally, we should even suppress the `SIGCHLD` signal to avoid the application signal handler needlessly waking up. ## Proposed Ruby-level APIs. I propose we create the following new methods in Ruby. * `Process.spawn_private` * `Process.fork_private` These methods behave identically to their non-_private versions in all respect, except instead of returning a pid, they return an object of type `Process::PrivateHandle`. `Process::PrivateHandle` would have the following methods: * `pid()` - returns the pid for the created process * `wait()` - blocks the caller until the created process has exited, and returns a `Process::Status` object. If the handle has _already_ had `#wait` called on it, it returns the same `Process::Status` object as was returned then immediately. This is unlike `Process.waitpid` and friends, which would raise an ECHILD in this case (or, in the face of pid wraparound, potentially wait on some other totally unrelated child process with the same pid). * `wait_nonblock()` - if the created process has exited, behaves like `#wait`; otherwise, it returns a `Process::Status` object for which `#exited?` returns false. * `kill(...)` - if the created process has not been reaped via a call to `#wait`, performs identically to `Process.kill ..., pid`. Otherwise, if the process _has_ been reaped, raises `Errno::ESRCH` immediately without issuing a system call. This ensures that, if pids wrap around, that the wrong process is not signaled by mistake. A call to `Process.wait`, `Process.waitpid`, or `Process.waitpid2` will _never_ return a `Process::Status` for a process started with a `_private` method, even if that call is made with the pid of the child process. The _only_ way to reap a private child process is through `Process::PrivateHandle`. The implementation of `IO.popen`, `Kernel#system`, `Kernel#popen`, backticks, and the `Open3` module would be changed to use this private process mechanism internally, although they do not return pids so they do not need to have their interfaces changed. (note though - I don't believe `Kernel#system` suffers from the same problem as the `open3` example above, because it does not yield the GVL nor check interrupts in between spawning the child and waiting on it) ## Implementation strategy I believe this can be implemented, in broad strokes, with an approach like this: * Keep a global table mapping pids -> handles for processes created with `fork_private` or `spawn_private`. * When a child process is waited on, consult the handle table. If there is a handle registered, and the wait call was made without the handle, do NOT return the reaped status. Instead, save the status against the handle, and repeat the call to `waitpid`. * If the wait call _was_ made with the handle, we can return the * Once a handle has had the child status saved against it, it is removed from the table. * A subsequent call to wait on that pi the handle will look up the saved information and return it without making a system call. In fact, most of the infrastructure to do this correctly is already in place - it was added by @k0kubun and @normalperson four years ago - https://bugs.ruby-lang.org/issues/14867. MJIT had a similar problem to the one described in this issue; it needs to fork a C compiler, but if the application performs a `Process.waitpid2(-1)`, it could wind up reaping the gcc process out from underneath mjit. This code has changed considerably over the course of last year, but my understanding is that mjit still uses this infrastructure to protect its Ruby child-process from becoming visible to Ruby code. In any case, the way waitpid works _currently_, is that... * Ruby actually does all calls to `waitpid` as `WNOHANG` (i.e. nonblocking) internally. * If a call to `waitpid` finds no children, it blocks the thread, representing the state in a structure of type `struct waitpid_state`. * Ruby also keeps a list of all `waitpid_state`'s that are currently being waited for, `vm->waiting_pids` and `vm->waiting_grps`. * These structures are protected with a specific mutex, `vm->waitpid_lock`. * Ruby internally uses the SIGCHLD signal to reap the dead children, and then find a waiting call to `waitpid` (via the two lists) to actually dispatch the reaped status to. * If some caller is waiting for a specific pid, that _always_ takes priority over some other caller that's waiting for a pid-group (e.g. `-1`). mjit's child process is protected, because: * When mjit forks, it uses a method `rb_mjit_fork` to do so. * That calls the actual `fork` implementation _whilst still holding_ `vm->waitpid_lock` * Before yielding the lock, it inserts an entry in `vm->waiting_pids` saying that mjit is waiting for the just-created child. * Since direct waits for pids always take precedence over pid-groups, this ensures that mjit will always reap its own children. I believe this mechanism can be extended and generalised to power the proposed API, and mjit could itself use that rather than having mjit-specific handling in `process.c`. ## POC implementation I sketched out a _very_ rough POC to see if what I said above would be possible, and I think it is: https://github.com/ruby/ruby/commit/6009c564b16862001535f2b561f1a12f6e7e0c57 The following script behaves how I expect with this patch: ```ruby pid, h = Process.spawn_private "/bin/sh", "-c", "sleep 1; exit 69" puts "pid -> #{pid}" puts "h -> #{h}" # should ESRCH. sleep 2 begin Process.waitpid2 -1 rescue => e puts "waitpid err -> #{e}" end wpid, status = h.wait puts "wpid -> #{wpid}" puts "status -> #{status.inspect}" ``` ``` ktsanaktsidis@lima-linux1 ruby % ./tool/runruby.rb -- ./tst1.rb pid -> 1154105 h -> #<Process::PrivateHandle:0x0000ffff94014098> waitpid err -> No child processes wpid -> 1154105 status -> #<Process::Status: pid 1154105 exit 4> ``` The child process can be waited on with the handle, and the call to `waitpid2(-1)` finds nothing. ## Previous idea: OS-specific handles My first version of this proposal involved a similar API, but powering it with platform-specific concepts available on Linux, Windows, and FreeBSD which offer richer control than just pids & the `wait` syscall. In particular, I had believed that we could use the `clone` syscall in Linux to create a child process which: * Could be referred to by a unique file descriptor (a pidfd) which would be guaranteed never to be re-used (unlike a pid), * Would not generate a signal when it exited (i.e. no SIGCHLD). * Could not be waited on by an unspecting to `waitpid` (except if a special flag `__WCLONE` as passed). Unfortunately, when I tried to implement this, I ran into a pretty serious snag. It is possible to create such a process - BUT, when the process exec's, it goes _back_ to "raise-SIGCHLD-on-exit" and "allow-waiting-without-__WCLONE" modes. I guess this functionality in the clone syscall is really designed to power threads in Linux, rather than being a general-purpose "hidden process" API. So, I don't think we should use pidfds in this proposal. ## Motivation My use-case for this is that I’m working on a perf-based profiling tool for Ruby. To get around some Linux capability issues, I want my profiler gem (or CRuby patch, whatever it winds up being!) to fork a privileged helper binary to do some eBPF twiddling. But, if you’re profiling e.g. a Unicorn master process, the result of that binary exiting might be caught by Unicorn itself, rather than my (gem | interpreter feature). In my case, I'm so deep in linux specific stuff that just calling `clone(2)` from my extension is probably fine, but I had enough of a look at this process management stuff I thought it would be worth asking the question if this might be useful to other, more normal, gems. -- https://bugs.ruby-lang.org/

"Eregon (Benoit Daloze) via ruby-core" <ruby-core@ml.ruby-lang.org> wrote:
IMHO this sounds like some code is doing bad stuff and not properly caring about its own resources.
In the example you shown, I believe it's none of Unicorn's business to reap arbitrary processes, it doesn't compose (I could be wrong, but this seems a general rule when it come to resources of a program: don't mess with what you don't own). Unicorn should keep a list of pid subprocesses it created, and only do something on the `Signal.trap(:CHLD) do` if it's one of these pids.
There'd be lots of zombies if unicorn did what you propose (at least for non-MJIT-Rubies). KJ: do you need to care about the exit status? Or just whether or not a process has exited? If it's only the latter, turning FD_CLOEXEC off on the write end of a pipe would let you IO.select/poll/epoll_wait on the read end to detect when the child+descendents are all dead: r, w = IO.pipe Process.spawn ..., w => w # share `w' with all descendents w.close IO.select([r, ...], ...) I've started using the above pattern in tests for setsid daemons lately.

Issue #19322 has been updated by Eregon (Benoit Daloze). Also the hacks for MJIT in process.c are already infamous, let's not add on top of it and force every Ruby implementation to have such complexity please. ---------------------------------------- Feature #19322: Support spawning "private" child processes https://bugs.ruby-lang.org/issues/19322#change-101200 * Author: kjtsanaktsidis (KJ Tsanaktsidis) * Status: Open * Priority: Normal ---------------------------------------- ## Background The traditional Unix process APIs (`fork` etc) are poorly isolated. If a library spawns a child process, this is not transparent to the program using the library. Any signal handler for `SIGCHLD` in the program will be called when the spawned process exits, and even worse, if the parent calls `Process.waitpid2(-1)`, it will consume the returned status code, stealing it from the library! Unfortunately, the practice of responding to `SIGCHLD` by calling `waitpid2(-1)` in a loop is a pretty common unixism. For example, Unicorn does it [here](https://yhbt.net/unicorn.git/tree/lib/unicorn/http_server.rb#n401). In short, there is no reliable way for a gem to spawn a child process in a way that can’t (unintentionally) be interfered with by other parts of the program. ## Problem statement Consider the following program. ```ruby # Imagine this part of the program is in some top-level application event loop # or something - similar to how Unicorn works. It detects child processes exiting # and takes some action (possibly restarting a crashed worker, for example). Signal.trap(:CHLD) do loop do begin pid, status = Process.waitpid2 -1 puts "Signal handler reaped #{pid} #{status.inspect}" rescue Errno::ECHILD puts "Signal handler reaped nothing" break end end end # Imagine that _this_ part of the program is buried deep in some gem. It knows # nothing about the application SIGCHLD handling, and quite possibly the application # author might not even know this gem spawns a child process to do its work! require 'open3' loop do o, status = Open3.capture2("/bin/sh", "-c", "echo 'hello'") puts "ran command, got #{o.chomp} #{status.inspect}" end ``` In current versions of Ruby, _some_ loop iterations will function correctly, and print something like this. The gem gets the `Process::Status` object from its command and can know if e.g. it exited abnormally. ``` ran command, got ohaithar #<Process::Status: pid 1153687 exit 0> Signal handler reaped nothing ``` However, other iterations of the loop print this. The signal handler runs and calls `Process.waitpid2(-1)` before the code in open3 can do so. Then, the gem code does not get a `Process::Status` object! This is also potentially bad for the application; it reaped a child process it didn't even know existed, and it might cause some surprising bugs if the application author didn't know this was a possibility. ``` Signal handler reaped 1153596 #<Process::Status: pid 1153596 exit 0> Signal handler reaped nothing ran command, got ohaithar nil ``` We would like a family of APIs which allow a gem to spawn a child process and guarantees that the gem can wait on it. Some concurrent call to `Process.waitpid2(-1)` (or even `Process.waitpid2($some_lucky_guess_for_the_pid)`) should not steal the status out from underneath the code which created the process. Ideally, we should even suppress the `SIGCHLD` signal to avoid the application signal handler needlessly waking up. ## Proposed Ruby-level APIs. I propose we create the following new methods in Ruby. * `Process.spawn_private` * `Process.fork_private` These methods behave identically to their non-_private versions in all respect, except instead of returning a pid, they return an object of type `Process::PrivateHandle`. `Process::PrivateHandle` would have the following methods: * `pid()` - returns the pid for the created process * `wait()` - blocks the caller until the created process has exited, and returns a `Process::Status` object. If the handle has _already_ had `#wait` called on it, it returns the same `Process::Status` object as was returned then immediately. This is unlike `Process.waitpid` and friends, which would raise an ECHILD in this case (or, in the face of pid wraparound, potentially wait on some other totally unrelated child process with the same pid). * `wait_nonblock()` - if the created process has exited, behaves like `#wait`; otherwise, it returns a `Process::Status` object for which `#exited?` returns false. * `kill(...)` - if the created process has not been reaped via a call to `#wait`, performs identically to `Process.kill ..., pid`. Otherwise, if the process _has_ been reaped, raises `Errno::ESRCH` immediately without issuing a system call. This ensures that, if pids wrap around, that the wrong process is not signaled by mistake. A call to `Process.wait`, `Process.waitpid`, or `Process.waitpid2` will _never_ return a `Process::Status` for a process started with a `_private` method, even if that call is made with the pid of the child process. The _only_ way to reap a private child process is through `Process::PrivateHandle`. The implementation of `IO.popen`, `Kernel#system`, `Kernel#popen`, backticks, and the `Open3` module would be changed to use this private process mechanism internally, although they do not return pids so they do not need to have their interfaces changed. (note though - I don't believe `Kernel#system` suffers from the same problem as the `open3` example above, because it does not yield the GVL nor check interrupts in between spawning the child and waiting on it) ## Implementation strategy I believe this can be implemented, in broad strokes, with an approach like this: * Keep a global table mapping pids -> handles for processes created with `fork_private` or `spawn_private`. * When a child process is waited on, consult the handle table. If there is a handle registered, and the wait call was made without the handle, do NOT return the reaped status. Instead, save the status against the handle, and repeat the call to `waitpid`. * If the wait call _was_ made with the handle, we can return the * Once a handle has had the child status saved against it, it is removed from the table. * A subsequent call to wait on that pi the handle will look up the saved information and return it without making a system call. In fact, most of the infrastructure to do this correctly is already in place - it was added by @k0kubun and @normalperson four years ago - https://bugs.ruby-lang.org/issues/14867. MJIT had a similar problem to the one described in this issue; it needs to fork a C compiler, but if the application performs a `Process.waitpid2(-1)`, it could wind up reaping the gcc process out from underneath mjit. This code has changed considerably over the course of last year, but my understanding is that mjit still uses this infrastructure to protect its Ruby child-process from becoming visible to Ruby code. In any case, the way waitpid works _currently_, is that... * Ruby actually does all calls to `waitpid` as `WNOHANG` (i.e. nonblocking) internally. * If a call to `waitpid` finds no children, it blocks the thread, representing the state in a structure of type `struct waitpid_state`. * Ruby also keeps a list of all `waitpid_state`'s that are currently being waited for, `vm->waiting_pids` and `vm->waiting_grps`. * These structures are protected with a specific mutex, `vm->waitpid_lock`. * Ruby internally uses the SIGCHLD signal to reap the dead children, and then find a waiting call to `waitpid` (via the two lists) to actually dispatch the reaped status to. * If some caller is waiting for a specific pid, that _always_ takes priority over some other caller that's waiting for a pid-group (e.g. `-1`). mjit's child process is protected, because: * When mjit forks, it uses a method `rb_mjit_fork` to do so. * That calls the actual `fork` implementation _whilst still holding_ `vm->waitpid_lock` * Before yielding the lock, it inserts an entry in `vm->waiting_pids` saying that mjit is waiting for the just-created child. * Since direct waits for pids always take precedence over pid-groups, this ensures that mjit will always reap its own children. I believe this mechanism can be extended and generalised to power the proposed API, and mjit could itself use that rather than having mjit-specific handling in `process.c`. ## POC implementation I sketched out a _very_ rough POC to see if what I said above would be possible, and I think it is: https://github.com/ruby/ruby/commit/6009c564b16862001535f2b561f1a12f6e7e0c57 The following script behaves how I expect with this patch: ```ruby pid, h = Process.spawn_private "/bin/sh", "-c", "sleep 1; exit 69" puts "pid -> #{pid}" puts "h -> #{h}" # should ESRCH. sleep 2 begin Process.waitpid2 -1 rescue => e puts "waitpid err -> #{e}" end wpid, status = h.wait puts "wpid -> #{wpid}" puts "status -> #{status.inspect}" ``` ``` ktsanaktsidis@lima-linux1 ruby % ./tool/runruby.rb -- ./tst1.rb pid -> 1154105 h -> #<Process::PrivateHandle:0x0000ffff94014098> waitpid err -> No child processes wpid -> 1154105 status -> #<Process::Status: pid 1154105 exit 4> ``` The child process can be waited on with the handle, and the call to `waitpid2(-1)` finds nothing. ## Previous idea: OS-specific handles My first version of this proposal involved a similar API, but powering it with platform-specific concepts available on Linux, Windows, and FreeBSD which offer richer control than just pids & the `wait` syscall. In particular, I had believed that we could use the `clone` syscall in Linux to create a child process which: * Could be referred to by a unique file descriptor (a pidfd) which would be guaranteed never to be re-used (unlike a pid), * Would not generate a signal when it exited (i.e. no SIGCHLD). * Could not be waited on by an unspecting to `waitpid` (except if a special flag `__WCLONE` as passed). Unfortunately, when I tried to implement this, I ran into a pretty serious snag. It is possible to create such a process - BUT, when the process exec's, it goes _back_ to "raise-SIGCHLD-on-exit" and "allow-waiting-without-__WCLONE" modes. I guess this functionality in the clone syscall is really designed to power threads in Linux, rather than being a general-purpose "hidden process" API. So, I don't think we should use pidfds in this proposal. ## Motivation My use-case for this is that I’m working on a perf-based profiling tool for Ruby. To get around some Linux capability issues, I want my profiler gem (or CRuby patch, whatever it winds up being!) to fork a privileged helper binary to do some eBPF twiddling. But, if you’re profiling e.g. a Unicorn master process, the result of that binary exiting might be caught by Unicorn itself, rather than my (gem | interpreter feature). In my case, I'm so deep in linux specific stuff that just calling `clone(2)` from my extension is probably fine, but I had enough of a look at this process management stuff I thought it would be worth asking the question if this might be useful to other, more normal, gems. -- https://bugs.ruby-lang.org/

Issue #19322 has been updated by kjtsanaktsidis (KJ Tsanaktsidis).
In the example you shown, I believe it's none of Unicorn's business to reap arbitrary processes
Firstly, I do want to note that I don't think this is just a Unicorn problem. This is the "classic unix" way of writing a preforking pool of workers of any kind, and I'm sure similar code exists in many deployed Ruby applications. There's no way, when responding to a SIGCHLD, to know what child died until _after_ you actually reap it and steal its status code. If Unicorn (as an example) wanted to avoid reaping children it does not own, it would need to perform O(N) waitpid system calls to wait on each of its known children and see if they've exited. Alternatively, it could do the pass-down-a-pipe trick that @normalperson pointed out above, but then you can't get the exit status.
it doesn't compose
That's the problem with the entire UNIX process API - it doesn't compose! Subprocesses exiting raise signals that run in other parts of the program, other parts of the program can accidentally wait for your subprocesses and steal the exit status from you, pid re-use means that a gem can't necessarily even _tell_ that its subprocess exited (is there now some new process with the same pid? you can't know unless you know for sure you didn't reap the previous one). The new APIs I've proposed here (`spawn_private` and `fork_private`) _do_ compose - when a subprocess is created, it can only be reaped by using the unique handle which came from its creation, and not from other random parts of a potentially very large application. My hope is that these APIs let gems (and Ruby itself, e.g. mjit) treat the spawning of subprocesses as an opaque implementation detail.
let's not add on top of it and force every Ruby implementation to have such complexity please.
I think other ruby implementations already need this complexity if they want threadsafe implementations of things like `Process.system`. For example, this program will quickly throw an exception under Truffleruby: ```ruby t1 = Thread.new do loop do pid, status = Process.waitpid2 -1 puts "Reaped pid #{pid} status #{status.inspect}" rescue Errno::ECHILD end end t2 = Thread.new do loop do child_success = system "/bin/sh -c 'exit 1'" puts "Child success? #{child_success}" end end t2.join t1.join ``` ``` % ruby bad.rb Reaped pid 8781 status #<Process::Status: pid 8781 exit 1> #<Thread:0x158 bad.rb:9 run> terminated with exception (report_on_exception is true): <internal:core> core/errno.rb:48:in `handle': No child processes - No child process: 8781 (Errno::ECHILD) from <internal:core> core/truffle/process_operations.rb:150:in `block in wait' from <internal:core> core/truffle/ffi/pointer.rb:255:in `new' from <internal:core> core/truffle/process_operations.rb:145:in `wait' from <internal:core> core/process.rb:591:in `wait' from <internal:core> core/kernel.rb:593:in `system' from bad.rb:11:in `block (2 levels) in <main>' from <internal:core> core/kernel.rb:407:in `loop' from bad.rb:10:in `block in <main>' <internal:core> core/errno.rb:48:in `handle': No child processes - No child process: 8781 (Errno::ECHILD) from <internal:core> core/truffle/process_operations.rb:150:in `block in wait' from <internal:core> core/truffle/ffi/pointer.rb:255:in `new' from <internal:core> core/truffle/process_operations.rb:145:in `wait' from <internal:core> core/process.rb:591:in `wait' from <internal:core> core/kernel.rb:593:in `system' from bad.rb:11:in `block (2 levels) in <main>' from <internal:core> core/kernel.rb:407:in `loop' from bad.rb:10:in `block in <main>' ``` It actually works under CRuby, because the direct wait for a specific pid always takes precedence over the wait on -1, and there is no interrupt check between when the child process is spawned and when waitpid is called in the `system` implementation.
KJ: do you need to care about the exit status?
I doubt I specifically need it for my use-case (the parent/child process already share a socketpair, and the parent would notice if it closed), but I kind of thought Ruby should offer non-hacky APIs for the use-case of "child processes in gems" in general, so I still wrote up my proposalj. ## Summary: Really, I think there are three ways of looking at this issue: 1. Programs doing `waitpid -1` are bad and wrong, nobody should ever do that, if any code in your program does this anywhere, then Ruby should no longer make any guarantees about subprocess management working correctly in the entire process. 2. Programs doing `waitpid -1` are widely deployed, it would be good if, when writing gems, there were APIs we could use which offer better isolation and composibility than the classic unix APIs, so that our gems work no matter what their containing processes are doing. 3. Gems should never be spawning child processes anyway. My thinking on this issue is camp 2. Like it or not (and really, I don't like it), `waitpid -1` has been part of the unix way of doing preforking worker pools since approximately forever, and it would be good if programs such programs could use gems without carefully checking whether they spawn any subprocesses in their implementation. Perhaps some more data needs to be gathered on just how common `waitpid -1` actually is? If people think this is something that moves the needle on this discussion, I'm happy to do some research on the topic. ---------------------------------------- Feature #19322: Support spawning "private" child processes https://bugs.ruby-lang.org/issues/19322#change-101204 * Author: kjtsanaktsidis (KJ Tsanaktsidis) * Status: Open * Priority: Normal ---------------------------------------- ## Background The traditional Unix process APIs (`fork` etc) are poorly isolated. If a library spawns a child process, this is not transparent to the program using the library. Any signal handler for `SIGCHLD` in the program will be called when the spawned process exits, and even worse, if the parent calls `Process.waitpid2(-1)`, it will consume the returned status code, stealing it from the library! Unfortunately, the practice of responding to `SIGCHLD` by calling `waitpid2(-1)` in a loop is a pretty common unixism. For example, Unicorn does it [here](https://yhbt.net/unicorn.git/tree/lib/unicorn/http_server.rb#n401). In short, there is no reliable way for a gem to spawn a child process in a way that can’t (unintentionally) be interfered with by other parts of the program. ## Problem statement Consider the following program. ```ruby # Imagine this part of the program is in some top-level application event loop # or something - similar to how Unicorn works. It detects child processes exiting # and takes some action (possibly restarting a crashed worker, for example). Signal.trap(:CHLD) do loop do begin pid, status = Process.waitpid2 -1 puts "Signal handler reaped #{pid} #{status.inspect}" rescue Errno::ECHILD puts "Signal handler reaped nothing" break end end end # Imagine that _this_ part of the program is buried deep in some gem. It knows # nothing about the application SIGCHLD handling, and quite possibly the application # author might not even know this gem spawns a child process to do its work! require 'open3' loop do o, status = Open3.capture2("/bin/sh", "-c", "echo 'hello'") puts "ran command, got #{o.chomp} #{status.inspect}" end ``` In current versions of Ruby, _some_ loop iterations will function correctly, and print something like this. The gem gets the `Process::Status` object from its command and can know if e.g. it exited abnormally. ``` ran command, got ohaithar #<Process::Status: pid 1153687 exit 0> Signal handler reaped nothing ``` However, other iterations of the loop print this. The signal handler runs and calls `Process.waitpid2(-1)` before the code in open3 can do so. Then, the gem code does not get a `Process::Status` object! This is also potentially bad for the application; it reaped a child process it didn't even know existed, and it might cause some surprising bugs if the application author didn't know this was a possibility. ``` Signal handler reaped 1153596 #<Process::Status: pid 1153596 exit 0> Signal handler reaped nothing ran command, got ohaithar nil ``` We would like a family of APIs which allow a gem to spawn a child process and guarantees that the gem can wait on it. Some concurrent call to `Process.waitpid2(-1)` (or even `Process.waitpid2($some_lucky_guess_for_the_pid)`) should not steal the status out from underneath the code which created the process. Ideally, we should even suppress the `SIGCHLD` signal to avoid the application signal handler needlessly waking up. ## Proposed Ruby-level APIs. I propose we create the following new methods in Ruby. * `Process.spawn_private` * `Process.fork_private` These methods behave identically to their non-_private versions in all respect, except instead of returning a pid, they return an object of type `Process::PrivateHandle`. `Process::PrivateHandle` would have the following methods: * `pid()` - returns the pid for the created process * `wait()` - blocks the caller until the created process has exited, and returns a `Process::Status` object. If the handle has _already_ had `#wait` called on it, it returns the same `Process::Status` object as was returned then immediately. This is unlike `Process.waitpid` and friends, which would raise an ECHILD in this case (or, in the face of pid wraparound, potentially wait on some other totally unrelated child process with the same pid). * `wait_nonblock()` - if the created process has exited, behaves like `#wait`; otherwise, it returns a `Process::Status` object for which `#exited?` returns false. * `kill(...)` - if the created process has not been reaped via a call to `#wait`, performs identically to `Process.kill ..., pid`. Otherwise, if the process _has_ been reaped, raises `Errno::ESRCH` immediately without issuing a system call. This ensures that, if pids wrap around, that the wrong process is not signaled by mistake. A call to `Process.wait`, `Process.waitpid`, or `Process.waitpid2` will _never_ return a `Process::Status` for a process started with a `_private` method, even if that call is made with the pid of the child process. The _only_ way to reap a private child process is through `Process::PrivateHandle`. The implementation of `IO.popen`, `Kernel#system`, `Kernel#popen`, backticks, and the `Open3` module would be changed to use this private process mechanism internally, although they do not return pids so they do not need to have their interfaces changed. (note though - I don't believe `Kernel#system` suffers from the same problem as the `open3` example above, because it does not yield the GVL nor check interrupts in between spawning the child and waiting on it) ## Implementation strategy I believe this can be implemented, in broad strokes, with an approach like this: * Keep a global table mapping pids -> handles for processes created with `fork_private` or `spawn_private`. * When a child process is waited on, consult the handle table. If there is a handle registered, and the wait call was made without the handle, do NOT return the reaped status. Instead, save the status against the handle, and repeat the call to `waitpid`. * If the wait call _was_ made with the handle, we can return the * Once a handle has had the child status saved against it, it is removed from the table. * A subsequent call to wait on that pi the handle will look up the saved information and return it without making a system call. In fact, most of the infrastructure to do this correctly is already in place - it was added by @k0kubun and @normalperson four years ago - https://bugs.ruby-lang.org/issues/14867. MJIT had a similar problem to the one described in this issue; it needs to fork a C compiler, but if the application performs a `Process.waitpid2(-1)`, it could wind up reaping the gcc process out from underneath mjit. This code has changed considerably over the course of last year, but my understanding is that mjit still uses this infrastructure to protect its Ruby child-process from becoming visible to Ruby code. In any case, the way waitpid works _currently_, is that... * Ruby actually does all calls to `waitpid` as `WNOHANG` (i.e. nonblocking) internally. * If a call to `waitpid` finds no children, it blocks the thread, representing the state in a structure of type `struct waitpid_state`. * Ruby also keeps a list of all `waitpid_state`'s that are currently being waited for, `vm->waiting_pids` and `vm->waiting_grps`. * These structures are protected with a specific mutex, `vm->waitpid_lock`. * Ruby internally uses the SIGCHLD signal to reap the dead children, and then find a waiting call to `waitpid` (via the two lists) to actually dispatch the reaped status to. * If some caller is waiting for a specific pid, that _always_ takes priority over some other caller that's waiting for a pid-group (e.g. `-1`). mjit's child process is protected, because: * When mjit forks, it uses a method `rb_mjit_fork` to do so. * That calls the actual `fork` implementation _whilst still holding_ `vm->waitpid_lock` * Before yielding the lock, it inserts an entry in `vm->waiting_pids` saying that mjit is waiting for the just-created child. * Since direct waits for pids always take precedence over pid-groups, this ensures that mjit will always reap its own children. I believe this mechanism can be extended and generalised to power the proposed API, and mjit could itself use that rather than having mjit-specific handling in `process.c`. ## POC implementation I sketched out a _very_ rough POC to see if what I said above would be possible, and I think it is: https://github.com/ruby/ruby/commit/6009c564b16862001535f2b561f1a12f6e7e0c57 The following script behaves how I expect with this patch: ```ruby pid, h = Process.spawn_private "/bin/sh", "-c", "sleep 1; exit 69" puts "pid -> #{pid}" puts "h -> #{h}" # should ESRCH. sleep 2 begin Process.waitpid2 -1 rescue => e puts "waitpid err -> #{e}" end wpid, status = h.wait puts "wpid -> #{wpid}" puts "status -> #{status.inspect}" ``` ``` ktsanaktsidis@lima-linux1 ruby % ./tool/runruby.rb -- ./tst1.rb pid -> 1154105 h -> #<Process::PrivateHandle:0x0000ffff94014098> waitpid err -> No child processes wpid -> 1154105 status -> #<Process::Status: pid 1154105 exit 4> ``` The child process can be waited on with the handle, and the call to `waitpid2(-1)` finds nothing. ## Previous idea: OS-specific handles My first version of this proposal involved a similar API, but powering it with platform-specific concepts available on Linux, Windows, and FreeBSD which offer richer control than just pids & the `wait` syscall. In particular, I had believed that we could use the `clone` syscall in Linux to create a child process which: * Could be referred to by a unique file descriptor (a pidfd) which would be guaranteed never to be re-used (unlike a pid), * Would not generate a signal when it exited (i.e. no SIGCHLD). * Could not be waited on by an unspecting to `waitpid` (except if a special flag `__WCLONE` as passed). Unfortunately, when I tried to implement this, I ran into a pretty serious snag. It is possible to create such a process - BUT, when the process exec's, it goes _back_ to "raise-SIGCHLD-on-exit" and "allow-waiting-without-__WCLONE" modes. I guess this functionality in the clone syscall is really designed to power threads in Linux, rather than being a general-purpose "hidden process" API. So, I don't think we should use pidfds in this proposal. ## Motivation My use-case for this is that I’m working on a perf-based profiling tool for Ruby. To get around some Linux capability issues, I want my profiler gem (or CRuby patch, whatever it winds up being!) to fork a privileged helper binary to do some eBPF twiddling. But, if you’re profiling e.g. a Unicorn master process, the result of that binary exiting might be caught by Unicorn itself, rather than my (gem | interpreter feature). In my case, I'm so deep in linux specific stuff that just calling `clone(2)` from my extension is probably fine, but I had enough of a look at this process management stuff I thought it would be worth asking the question if this might be useful to other, more normal, gems. -- https://bugs.ruby-lang.org/

"kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core" <ruby-core@ml.ruby-lang.org> wrote:
2. Programs doing `waitpid -1` are widely deployed, it would be good if, when writing gems, there were APIs we could use which offer better isolation and composibility than the classic unix APIs, so that our gems work no matter what their containing processes are doing.
My thinking on this issue is camp 2. Like it or not (and really, I don't like it), `waitpid -1` has been part of the unix way of doing preforking worker pools since approximately forever, and it would be good if programs such programs could use gems without carefully checking whether they spawn any subprocesses in their implementation.
Same here. I think the process.c stuff I worked on for MJIT can be extended easily to support registering per-PID callbacks: Process.wait(pid) { |wpid, status| ... } (But I'll let you or somebody else interested implement it)
Perhaps some more data needs to be gathered on just how common `waitpid -1` actually is? If people think this is something that moves the needle on this discussion, I'm happy to do some research on the topic.
Pretty common if Process.waitall exists. Breaking any common use case is unacceptable to me. But I'm of a minority opinion.

Issue #19322 has been updated by kjtsanaktsidis (KJ Tsanaktsidis). Hey @nobu, @eregon - any further thoughts on this? @nobu - I changed the proposal not to depend on any new platform-specific process management APIs, but instead to leverage the existing code for managing process waits in `process.c`. However, I don't think my idea can be a gem, because it needs to tightly integrate with the implementation of `Process.waitpid` to make sure calls to `Process.waitpid -1` don't steal the exit status of spawned programs. @eregon - I realise that having `Process.spawn_private` work in other Ruby implementations requires that they keep track of all (ruby-spawned) child processes and deliver the right exit statuses to the right waiters. However, I think if they want Cruby compatible handling of how `waitpid -1` and `Process.system` interact today, they already need to be doing this tracking. Is Cruby-compatible handling of `waitpid -1` actually a goal of Truffleruby? If so, I'm happy to try and contribute a patch for Truffleruby to implement this `private_spawn` stuff there as well (although I have zero experience with Truffleruby!) ---------------------------------------- Feature #19322: Support spawning "private" child processes https://bugs.ruby-lang.org/issues/19322#change-101308 * Author: kjtsanaktsidis (KJ Tsanaktsidis) * Status: Open * Priority: Normal ---------------------------------------- ## Background The traditional Unix process APIs (`fork` etc) are poorly isolated. If a library spawns a child process, this is not transparent to the program using the library. Any signal handler for `SIGCHLD` in the program will be called when the spawned process exits, and even worse, if the parent calls `Process.waitpid2(-1)`, it will consume the returned status code, stealing it from the library! Unfortunately, the practice of responding to `SIGCHLD` by calling `waitpid2(-1)` in a loop is a pretty common unixism. For example, Unicorn does it [here](https://yhbt.net/unicorn.git/tree/lib/unicorn/http_server.rb#n401). In short, there is no reliable way for a gem to spawn a child process in a way that can’t (unintentionally) be interfered with by other parts of the program. ## Problem statement Consider the following program. ```ruby # Imagine this part of the program is in some top-level application event loop # or something - similar to how Unicorn works. It detects child processes exiting # and takes some action (possibly restarting a crashed worker, for example). Signal.trap(:CHLD) do loop do begin pid, status = Process.waitpid2 -1 puts "Signal handler reaped #{pid} #{status.inspect}" rescue Errno::ECHILD puts "Signal handler reaped nothing" break end end end # Imagine that _this_ part of the program is buried deep in some gem. It knows # nothing about the application SIGCHLD handling, and quite possibly the application # author might not even know this gem spawns a child process to do its work! require 'open3' loop do o, status = Open3.capture2("/bin/sh", "-c", "echo 'hello'") puts "ran command, got #{o.chomp} #{status.inspect}" end ``` In current versions of Ruby, _some_ loop iterations will function correctly, and print something like this. The gem gets the `Process::Status` object from its command and can know if e.g. it exited abnormally. ``` ran command, got ohaithar #<Process::Status: pid 1153687 exit 0> Signal handler reaped nothing ``` However, other iterations of the loop print this. The signal handler runs and calls `Process.waitpid2(-1)` before the code in open3 can do so. Then, the gem code does not get a `Process::Status` object! This is also potentially bad for the application; it reaped a child process it didn't even know existed, and it might cause some surprising bugs if the application author didn't know this was a possibility. ``` Signal handler reaped 1153596 #<Process::Status: pid 1153596 exit 0> Signal handler reaped nothing ran command, got ohaithar nil ``` We would like a family of APIs which allow a gem to spawn a child process and guarantees that the gem can wait on it. Some concurrent call to `Process.waitpid2(-1)` (or even `Process.waitpid2($some_lucky_guess_for_the_pid)`) should not steal the status out from underneath the code which created the process. Ideally, we should even suppress the `SIGCHLD` signal to avoid the application signal handler needlessly waking up. ## Proposed Ruby-level APIs. I propose we create the following new methods in Ruby. * `Process.spawn_private` * `Process.fork_private` These methods behave identically to their non-_private versions in all respect, except instead of returning a pid, they return an object of type `Process::PrivateHandle`. `Process::PrivateHandle` would have the following methods: * `pid()` - returns the pid for the created process * `wait()` - blocks the caller until the created process has exited, and returns a `Process::Status` object. If the handle has _already_ had `#wait` called on it, it returns the same `Process::Status` object as was returned then immediately. This is unlike `Process.waitpid` and friends, which would raise an ECHILD in this case (or, in the face of pid wraparound, potentially wait on some other totally unrelated child process with the same pid). * `wait_nonblock()` - if the created process has exited, behaves like `#wait`; otherwise, it returns a `Process::Status` object for which `#exited?` returns false. * `kill(...)` - if the created process has not been reaped via a call to `#wait`, performs identically to `Process.kill ..., pid`. Otherwise, if the process _has_ been reaped, raises `Errno::ESRCH` immediately without issuing a system call. This ensures that, if pids wrap around, that the wrong process is not signaled by mistake. A call to `Process.wait`, `Process.waitpid`, or `Process.waitpid2` will _never_ return a `Process::Status` for a process started with a `_private` method, even if that call is made with the pid of the child process. The _only_ way to reap a private child process is through `Process::PrivateHandle`. The implementation of `IO.popen`, `Kernel#system`, `Kernel#popen`, backticks, and the `Open3` module would be changed to use this private process mechanism internally, although they do not return pids so they do not need to have their interfaces changed. (note though - I don't believe `Kernel#system` suffers from the same problem as the `open3` example above, because it does not yield the GVL nor check interrupts in between spawning the child and waiting on it) ## Implementation strategy I believe this can be implemented, in broad strokes, with an approach like this: * Keep a global table mapping pids -> handles for processes created with `fork_private` or `spawn_private`. * When a child process is waited on, consult the handle table. If there is a handle registered, and the wait call was made without the handle, do NOT return the reaped status. Instead, save the status against the handle, and repeat the call to `waitpid`. * If the wait call _was_ made with the handle, we can return the * Once a handle has had the child status saved against it, it is removed from the table. * A subsequent call to wait on that pi the handle will look up the saved information and return it without making a system call. In fact, most of the infrastructure to do this correctly is already in place - it was added by @k0kubun and @normalperson four years ago - https://bugs.ruby-lang.org/issues/14867. MJIT had a similar problem to the one described in this issue; it needs to fork a C compiler, but if the application performs a `Process.waitpid2(-1)`, it could wind up reaping the gcc process out from underneath mjit. This code has changed considerably over the course of last year, but my understanding is that mjit still uses this infrastructure to protect its Ruby child-process from becoming visible to Ruby code. In any case, the way waitpid works _currently_, is that... * Ruby actually does all calls to `waitpid` as `WNOHANG` (i.e. nonblocking) internally. * If a call to `waitpid` finds no children, it blocks the thread, representing the state in a structure of type `struct waitpid_state`. * Ruby also keeps a list of all `waitpid_state`'s that are currently being waited for, `vm->waiting_pids` and `vm->waiting_grps`. * These structures are protected with a specific mutex, `vm->waitpid_lock`. * Ruby internally uses the SIGCHLD signal to reap the dead children, and then find a waiting call to `waitpid` (via the two lists) to actually dispatch the reaped status to. * If some caller is waiting for a specific pid, that _always_ takes priority over some other caller that's waiting for a pid-group (e.g. `-1`). mjit's child process is protected, because: * When mjit forks, it uses a method `rb_mjit_fork` to do so. * That calls the actual `fork` implementation _whilst still holding_ `vm->waitpid_lock` * Before yielding the lock, it inserts an entry in `vm->waiting_pids` saying that mjit is waiting for the just-created child. * Since direct waits for pids always take precedence over pid-groups, this ensures that mjit will always reap its own children. I believe this mechanism can be extended and generalised to power the proposed API, and mjit could itself use that rather than having mjit-specific handling in `process.c`. ## POC implementation I sketched out a _very_ rough POC to see if what I said above would be possible, and I think it is: https://github.com/ruby/ruby/commit/6009c564b16862001535f2b561f1a12f6e7e0c57 The following script behaves how I expect with this patch: ```ruby pid, h = Process.spawn_private "/bin/sh", "-c", "sleep 1; exit 69" puts "pid -> #{pid}" puts "h -> #{h}" # should ESRCH. sleep 2 begin Process.waitpid2 -1 rescue => e puts "waitpid err -> #{e}" end wpid, status = h.wait puts "wpid -> #{wpid}" puts "status -> #{status.inspect}" ``` ``` ktsanaktsidis@lima-linux1 ruby % ./tool/runruby.rb -- ./tst1.rb pid -> 1154105 h -> #<Process::PrivateHandle:0x0000ffff94014098> waitpid err -> No child processes wpid -> 1154105 status -> #<Process::Status: pid 1154105 exit 4> ``` The child process can be waited on with the handle, and the call to `waitpid2(-1)` finds nothing. ## Previous idea: OS-specific handles My first version of this proposal involved a similar API, but powering it with platform-specific concepts available on Linux, Windows, and FreeBSD which offer richer control than just pids & the `wait` syscall. In particular, I had believed that we could use the `clone` syscall in Linux to create a child process which: * Could be referred to by a unique file descriptor (a pidfd) which would be guaranteed never to be re-used (unlike a pid), * Would not generate a signal when it exited (i.e. no SIGCHLD). * Could not be waited on by an unspecting to `waitpid` (except if a special flag `__WCLONE` as passed). Unfortunately, when I tried to implement this, I ran into a pretty serious snag. It is possible to create such a process - BUT, when the process exec's, it goes _back_ to "raise-SIGCHLD-on-exit" and "allow-waiting-without-__WCLONE" modes. I guess this functionality in the clone syscall is really designed to power threads in Linux, rather than being a general-purpose "hidden process" API. So, I don't think we should use pidfds in this proposal. ## Motivation My use-case for this is that I’m working on a perf-based profiling tool for Ruby. To get around some Linux capability issues, I want my profiler gem (or CRuby patch, whatever it winds up being!) to fork a privileged helper binary to do some eBPF twiddling. But, if you’re profiling e.g. a Unicorn master process, the result of that binary exiting might be caught by Unicorn itself, rather than my (gem | interpreter feature). In my case, I'm so deep in linux specific stuff that just calling `clone(2)` from my extension is probably fine, but I had enough of a look at this process management stuff I thought it would be worth asking the question if this might be useful to other, more normal, gems. -- https://bugs.ruby-lang.org/

Issue #19322 has been updated by Eregon (Benoit Daloze). I meant to reply to this earlier but could not. Right, in the SIGCHILD handler it's not possible to know the pid from Ruby's trap (it might be possible with `siginfo_t` of `sigaction()` but that's platform-dependent). The typical way to care about a resource is let the caller both allocate and release it. So things like `Process.wait fork {}`. That would not work as-is for Unicorn and similar use cases since it doesn't want to wait for that child process on the main thread. There are multiple solutions: * There is `Process.detach(pid)`. That creates one thread per pid, if that's too much overhead one could make their own with WNOHANG and sleep, then it's just 1 extra thread. * Alternatively, just do the `Process.wait fork {}` in a thread, and it's even simpler and easier to handle a child process terminating. * The pipe trick, I suppose this could be used with one pipe per pid, then it's also easy to detect which process is ready to be waited on. * Isn't one of the main points of process groups to deal with such a case? I guess we'd need to place the forks in a new process group and then we could wait on that whole process group (`Process.wait -group`). Maybe with an extra child process in between to setup the process group or so. * Maybe io_uring or similar API can wait for any of multiple processes to terminate? Probably not portable enough though.
Like it or not (and really, I don't like it), waitpid -1 has been part of the unix way of doing preforking worker pools since approximately forever,
Is there any reason to do it that way, that none of the solutions above addresses? I suppose that's easier but also more hacky, doesn't compose and breaks other places in the code waiting for processes. From your summary, I'm for 1, which I see as proper resource management: release what you own/allocated, don't release other resources you don't own. `Process.waitall` doesn't compose, `Process.wait pid/group` composes and works well. Also this new API wouldn't be adopted before a very long time by the many usages of `Kernel#spawn`/etc (far more than usages of `Process.waitall`). ---------------------------------------- Feature #19322: Support spawning "private" child processes https://bugs.ruby-lang.org/issues/19322#change-101659 * Author: kjtsanaktsidis (KJ Tsanaktsidis) * Status: Open * Priority: Normal ---------------------------------------- ## Background The traditional Unix process APIs (`fork` etc) are poorly isolated. If a library spawns a child process, this is not transparent to the program using the library. Any signal handler for `SIGCHLD` in the program will be called when the spawned process exits, and even worse, if the parent calls `Process.waitpid2(-1)`, it will consume the returned status code, stealing it from the library! Unfortunately, the practice of responding to `SIGCHLD` by calling `waitpid2(-1)` in a loop is a pretty common unixism. For example, Unicorn does it [here](https://yhbt.net/unicorn.git/tree/lib/unicorn/http_server.rb#n401). In short, there is no reliable way for a gem to spawn a child process in a way that can’t (unintentionally) be interfered with by other parts of the program. ## Problem statement Consider the following program. ```ruby # Imagine this part of the program is in some top-level application event loop # or something - similar to how Unicorn works. It detects child processes exiting # and takes some action (possibly restarting a crashed worker, for example). Signal.trap(:CHLD) do loop do begin pid, status = Process.waitpid2 -1 puts "Signal handler reaped #{pid} #{status.inspect}" rescue Errno::ECHILD puts "Signal handler reaped nothing" break end end end # Imagine that _this_ part of the program is buried deep in some gem. It knows # nothing about the application SIGCHLD handling, and quite possibly the application # author might not even know this gem spawns a child process to do its work! require 'open3' loop do o, status = Open3.capture2("/bin/sh", "-c", "echo 'hello'") puts "ran command, got #{o.chomp} #{status.inspect}" end ``` In current versions of Ruby, _some_ loop iterations will function correctly, and print something like this. The gem gets the `Process::Status` object from its command and can know if e.g. it exited abnormally. ``` ran command, got ohaithar #<Process::Status: pid 1153687 exit 0> Signal handler reaped nothing ``` However, other iterations of the loop print this. The signal handler runs and calls `Process.waitpid2(-1)` before the code in open3 can do so. Then, the gem code does not get a `Process::Status` object! This is also potentially bad for the application; it reaped a child process it didn't even know existed, and it might cause some surprising bugs if the application author didn't know this was a possibility. ``` Signal handler reaped 1153596 #<Process::Status: pid 1153596 exit 0> Signal handler reaped nothing ran command, got ohaithar nil ``` We would like a family of APIs which allow a gem to spawn a child process and guarantees that the gem can wait on it. Some concurrent call to `Process.waitpid2(-1)` (or even `Process.waitpid2($some_lucky_guess_for_the_pid)`) should not steal the status out from underneath the code which created the process. Ideally, we should even suppress the `SIGCHLD` signal to avoid the application signal handler needlessly waking up. ## Proposed Ruby-level APIs. I propose we create the following new methods in Ruby. * `Process.spawn_private` * `Process.fork_private` These methods behave identically to their non-_private versions in all respect, except instead of returning a pid, they return an object of type `Process::PrivateHandle`. `Process::PrivateHandle` would have the following methods: * `pid()` - returns the pid for the created process * `wait()` - blocks the caller until the created process has exited, and returns a `Process::Status` object. If the handle has _already_ had `#wait` called on it, it returns the same `Process::Status` object as was returned then immediately. This is unlike `Process.waitpid` and friends, which would raise an ECHILD in this case (or, in the face of pid wraparound, potentially wait on some other totally unrelated child process with the same pid). * `wait_nonblock()` - if the created process has exited, behaves like `#wait`; otherwise, it returns a `Process::Status` object for which `#exited?` returns false. * `kill(...)` - if the created process has not been reaped via a call to `#wait`, performs identically to `Process.kill ..., pid`. Otherwise, if the process _has_ been reaped, raises `Errno::ESRCH` immediately without issuing a system call. This ensures that, if pids wrap around, that the wrong process is not signaled by mistake. A call to `Process.wait`, `Process.waitpid`, or `Process.waitpid2` will _never_ return a `Process::Status` for a process started with a `_private` method, even if that call is made with the pid of the child process. The _only_ way to reap a private child process is through `Process::PrivateHandle`. The implementation of `IO.popen`, `Kernel#system`, `Kernel#popen`, backticks, and the `Open3` module would be changed to use this private process mechanism internally, although they do not return pids so they do not need to have their interfaces changed. (note though - I don't believe `Kernel#system` suffers from the same problem as the `open3` example above, because it does not yield the GVL nor check interrupts in between spawning the child and waiting on it) ## Implementation strategy I believe this can be implemented, in broad strokes, with an approach like this: * Keep a global table mapping pids -> handles for processes created with `fork_private` or `spawn_private`. * When a child process is waited on, consult the handle table. If there is a handle registered, and the wait call was made without the handle, do NOT return the reaped status. Instead, save the status against the handle, and repeat the call to `waitpid`. * If the wait call _was_ made with the handle, we can return the * Once a handle has had the child status saved against it, it is removed from the table. * A subsequent call to wait on that pi the handle will look up the saved information and return it without making a system call. In fact, most of the infrastructure to do this correctly is already in place - it was added by @k0kubun and @normalperson four years ago - https://bugs.ruby-lang.org/issues/14867. MJIT had a similar problem to the one described in this issue; it needs to fork a C compiler, but if the application performs a `Process.waitpid2(-1)`, it could wind up reaping the gcc process out from underneath mjit. This code has changed considerably over the course of last year, but my understanding is that mjit still uses this infrastructure to protect its Ruby child-process from becoming visible to Ruby code. In any case, the way waitpid works _currently_, is that... * Ruby actually does all calls to `waitpid` as `WNOHANG` (i.e. nonblocking) internally. * If a call to `waitpid` finds no children, it blocks the thread, representing the state in a structure of type `struct waitpid_state`. * Ruby also keeps a list of all `waitpid_state`'s that are currently being waited for, `vm->waiting_pids` and `vm->waiting_grps`. * These structures are protected with a specific mutex, `vm->waitpid_lock`. * Ruby internally uses the SIGCHLD signal to reap the dead children, and then find a waiting call to `waitpid` (via the two lists) to actually dispatch the reaped status to. * If some caller is waiting for a specific pid, that _always_ takes priority over some other caller that's waiting for a pid-group (e.g. `-1`). mjit's child process is protected, because: * When mjit forks, it uses a method `rb_mjit_fork` to do so. * That calls the actual `fork` implementation _whilst still holding_ `vm->waitpid_lock` * Before yielding the lock, it inserts an entry in `vm->waiting_pids` saying that mjit is waiting for the just-created child. * Since direct waits for pids always take precedence over pid-groups, this ensures that mjit will always reap its own children. I believe this mechanism can be extended and generalised to power the proposed API, and mjit could itself use that rather than having mjit-specific handling in `process.c`. ## POC implementation I sketched out a _very_ rough POC to see if what I said above would be possible, and I think it is: https://github.com/ruby/ruby/commit/6009c564b16862001535f2b561f1a12f6e7e0c57 The following script behaves how I expect with this patch: ```ruby pid, h = Process.spawn_private "/bin/sh", "-c", "sleep 1; exit 69" puts "pid -> #{pid}" puts "h -> #{h}" # should ESRCH. sleep 2 begin Process.waitpid2 -1 rescue => e puts "waitpid err -> #{e}" end wpid, status = h.wait puts "wpid -> #{wpid}" puts "status -> #{status.inspect}" ``` ``` ktsanaktsidis@lima-linux1 ruby % ./tool/runruby.rb -- ./tst1.rb pid -> 1154105 h -> #<Process::PrivateHandle:0x0000ffff94014098> waitpid err -> No child processes wpid -> 1154105 status -> #<Process::Status: pid 1154105 exit 4> ``` The child process can be waited on with the handle, and the call to `waitpid2(-1)` finds nothing. ## Previous idea: OS-specific handles My first version of this proposal involved a similar API, but powering it with platform-specific concepts available on Linux, Windows, and FreeBSD which offer richer control than just pids & the `wait` syscall. In particular, I had believed that we could use the `clone` syscall in Linux to create a child process which: * Could be referred to by a unique file descriptor (a pidfd) which would be guaranteed never to be re-used (unlike a pid), * Would not generate a signal when it exited (i.e. no SIGCHLD). * Could not be waited on by an unspecting to `waitpid` (except if a special flag `__WCLONE` as passed). Unfortunately, when I tried to implement this, I ran into a pretty serious snag. It is possible to create such a process - BUT, when the process exec's, it goes _back_ to "raise-SIGCHLD-on-exit" and "allow-waiting-without-__WCLONE" modes. I guess this functionality in the clone syscall is really designed to power threads in Linux, rather than being a general-purpose "hidden process" API. So, I don't think we should use pidfds in this proposal. ## Motivation My use-case for this is that I’m working on a perf-based profiling tool for Ruby. To get around some Linux capability issues, I want my profiler gem (or CRuby patch, whatever it winds up being!) to fork a privileged helper binary to do some eBPF twiddling. But, if you’re profiling e.g. a Unicorn master process, the result of that binary exiting might be caught by Unicorn itself, rather than my (gem | interpreter feature). In my case, I'm so deep in linux specific stuff that just calling `clone(2)` from my extension is probably fine, but I had enough of a look at this process management stuff I thought it would be worth asking the question if this might be useful to other, more normal, gems. -- https://bugs.ruby-lang.org/

Issue #19322 has been updated by kjtsanaktsidis (KJ Tsanaktsidis).
Also this new API wouldn't be adopted before a very long time by the many usages of Kernel#spawn/etc (far more than usages of Process.waitall).
This seems like a fairly compelling argument actually. You're right, I want to write some code that I can use safely in Unicorn, but there's probably a large body of existing code calling `Process#spawn` et al that people are also running inside forking servers and is "mostly working". So maybe I should have a look at solving this on the Unicorn side rather than the Ruby side. I wonder, then, if we should aim to _deprecate_ `Process.waitall` and friends, given that it can intefere with standard library modules like open3.
* There is Process.detach(pid). That creates one thread per pid, if that's too much overhead one could make their own with WNOHANG and sleep, then it's just 1 extra thread. * Alternatively, just do the Process.wait fork {} in a thread, and it's even simpler and easier to handle a child process terminating.
I think it'd be nice to avoid having to make threads just to manage subprocesses through blocking API calls if possible
* The pipe trick, I suppose this could be used with one pipe per pid, then it's also easy to detect which process is ready to be waited on.
Could definitely work, and seems like the best solution for Unicorn specifically, all things considered.
* Isn't one of the main points of process groups to deal with such a case? I guess we'd need to place the forks in a new process group and then we could wait on that whole process group (Process.wait -group). Maybe with an extra child process in between to setup the process group or so.
This was actually quite interesting, I had a look at this. The extra child process would need to be a sibling to the processes to be waited on (because the parent can only wait on children, not grandchildren). Then, when forking processes, each fork could call `setpgid(2)` to move itself into that sibling's process group; then, as you say, `Process.wait -group` could be used to wait for only those subprocesses that joined the group. Unfortunately there's a race condition - a fork could crash after forking and before calling `setpgid(2)`, in which case it would never get reaped.
* Maybe io_uring or similar API can wait for any of multiple processes to terminate?
Linux and FreeBSD have pidfds which can do this, and I think `WaitForMultipleObjects` can do this on Windows, but I couldn't find any equivalent for MacOS or OpenBSD unfortunately. ---------------------------------------- Feature #19322: Support spawning "private" child processes https://bugs.ruby-lang.org/issues/19322#change-101832 * Author: kjtsanaktsidis (KJ Tsanaktsidis) * Status: Open * Priority: Normal ---------------------------------------- ## Background The traditional Unix process APIs (`fork` etc) are poorly isolated. If a library spawns a child process, this is not transparent to the program using the library. Any signal handler for `SIGCHLD` in the program will be called when the spawned process exits, and even worse, if the parent calls `Process.waitpid2(-1)`, it will consume the returned status code, stealing it from the library! Unfortunately, the practice of responding to `SIGCHLD` by calling `waitpid2(-1)` in a loop is a pretty common unixism. For example, Unicorn does it [here](https://yhbt.net/unicorn.git/tree/lib/unicorn/http_server.rb#n401). In short, there is no reliable way for a gem to spawn a child process in a way that can’t (unintentionally) be interfered with by other parts of the program. ## Problem statement Consider the following program. ```ruby # Imagine this part of the program is in some top-level application event loop # or something - similar to how Unicorn works. It detects child processes exiting # and takes some action (possibly restarting a crashed worker, for example). Signal.trap(:CHLD) do loop do begin pid, status = Process.waitpid2 -1 puts "Signal handler reaped #{pid} #{status.inspect}" rescue Errno::ECHILD puts "Signal handler reaped nothing" break end end end # Imagine that _this_ part of the program is buried deep in some gem. It knows # nothing about the application SIGCHLD handling, and quite possibly the application # author might not even know this gem spawns a child process to do its work! require 'open3' loop do o, status = Open3.capture2("/bin/sh", "-c", "echo 'hello'") puts "ran command, got #{o.chomp} #{status.inspect}" end ``` In current versions of Ruby, _some_ loop iterations will function correctly, and print something like this. The gem gets the `Process::Status` object from its command and can know if e.g. it exited abnormally. ``` ran command, got ohaithar #<Process::Status: pid 1153687 exit 0> Signal handler reaped nothing ``` However, other iterations of the loop print this. The signal handler runs and calls `Process.waitpid2(-1)` before the code in open3 can do so. Then, the gem code does not get a `Process::Status` object! This is also potentially bad for the application; it reaped a child process it didn't even know existed, and it might cause some surprising bugs if the application author didn't know this was a possibility. ``` Signal handler reaped 1153596 #<Process::Status: pid 1153596 exit 0> Signal handler reaped nothing ran command, got ohaithar nil ``` We would like a family of APIs which allow a gem to spawn a child process and guarantees that the gem can wait on it. Some concurrent call to `Process.waitpid2(-1)` (or even `Process.waitpid2($some_lucky_guess_for_the_pid)`) should not steal the status out from underneath the code which created the process. Ideally, we should even suppress the `SIGCHLD` signal to avoid the application signal handler needlessly waking up. ## Proposed Ruby-level APIs. I propose we create the following new methods in Ruby. * `Process.spawn_private` * `Process.fork_private` These methods behave identically to their non-_private versions in all respect, except instead of returning a pid, they return an object of type `Process::PrivateHandle`. `Process::PrivateHandle` would have the following methods: * `pid()` - returns the pid for the created process * `wait()` - blocks the caller until the created process has exited, and returns a `Process::Status` object. If the handle has _already_ had `#wait` called on it, it returns the same `Process::Status` object as was returned then immediately. This is unlike `Process.waitpid` and friends, which would raise an ECHILD in this case (or, in the face of pid wraparound, potentially wait on some other totally unrelated child process with the same pid). * `wait_nonblock()` - if the created process has exited, behaves like `#wait`; otherwise, it returns a `Process::Status` object for which `#exited?` returns false. * `kill(...)` - if the created process has not been reaped via a call to `#wait`, performs identically to `Process.kill ..., pid`. Otherwise, if the process _has_ been reaped, raises `Errno::ESRCH` immediately without issuing a system call. This ensures that, if pids wrap around, that the wrong process is not signaled by mistake. A call to `Process.wait`, `Process.waitpid`, or `Process.waitpid2` will _never_ return a `Process::Status` for a process started with a `_private` method, even if that call is made with the pid of the child process. The _only_ way to reap a private child process is through `Process::PrivateHandle`. The implementation of `IO.popen`, `Kernel#system`, `Kernel#popen`, backticks, and the `Open3` module would be changed to use this private process mechanism internally, although they do not return pids so they do not need to have their interfaces changed. (note though - I don't believe `Kernel#system` suffers from the same problem as the `open3` example above, because it does not yield the GVL nor check interrupts in between spawning the child and waiting on it) ## Implementation strategy I believe this can be implemented, in broad strokes, with an approach like this: * Keep a global table mapping pids -> handles for processes created with `fork_private` or `spawn_private`. * When a child process is waited on, consult the handle table. If there is a handle registered, and the wait call was made without the handle, do NOT return the reaped status. Instead, save the status against the handle, and repeat the call to `waitpid`. * If the wait call _was_ made with the handle, we can return the * Once a handle has had the child status saved against it, it is removed from the table. * A subsequent call to wait on that pi the handle will look up the saved information and return it without making a system call. In fact, most of the infrastructure to do this correctly is already in place - it was added by @k0kubun and @normalperson four years ago - https://bugs.ruby-lang.org/issues/14867. MJIT had a similar problem to the one described in this issue; it needs to fork a C compiler, but if the application performs a `Process.waitpid2(-1)`, it could wind up reaping the gcc process out from underneath mjit. This code has changed considerably over the course of last year, but my understanding is that mjit still uses this infrastructure to protect its Ruby child-process from becoming visible to Ruby code. In any case, the way waitpid works _currently_, is that... * Ruby actually does all calls to `waitpid` as `WNOHANG` (i.e. nonblocking) internally. * If a call to `waitpid` finds no children, it blocks the thread, representing the state in a structure of type `struct waitpid_state`. * Ruby also keeps a list of all `waitpid_state`'s that are currently being waited for, `vm->waiting_pids` and `vm->waiting_grps`. * These structures are protected with a specific mutex, `vm->waitpid_lock`. * Ruby internally uses the SIGCHLD signal to reap the dead children, and then find a waiting call to `waitpid` (via the two lists) to actually dispatch the reaped status to. * If some caller is waiting for a specific pid, that _always_ takes priority over some other caller that's waiting for a pid-group (e.g. `-1`). mjit's child process is protected, because: * When mjit forks, it uses a method `rb_mjit_fork` to do so. * That calls the actual `fork` implementation _whilst still holding_ `vm->waitpid_lock` * Before yielding the lock, it inserts an entry in `vm->waiting_pids` saying that mjit is waiting for the just-created child. * Since direct waits for pids always take precedence over pid-groups, this ensures that mjit will always reap its own children. I believe this mechanism can be extended and generalised to power the proposed API, and mjit could itself use that rather than having mjit-specific handling in `process.c`. ## POC implementation I sketched out a _very_ rough POC to see if what I said above would be possible, and I think it is: https://github.com/ruby/ruby/commit/6009c564b16862001535f2b561f1a12f6e7e0c57 The following script behaves how I expect with this patch: ```ruby pid, h = Process.spawn_private "/bin/sh", "-c", "sleep 1; exit 69" puts "pid -> #{pid}" puts "h -> #{h}" # should ESRCH. sleep 2 begin Process.waitpid2 -1 rescue => e puts "waitpid err -> #{e}" end wpid, status = h.wait puts "wpid -> #{wpid}" puts "status -> #{status.inspect}" ``` ``` ktsanaktsidis@lima-linux1 ruby % ./tool/runruby.rb -- ./tst1.rb pid -> 1154105 h -> #<Process::PrivateHandle:0x0000ffff94014098> waitpid err -> No child processes wpid -> 1154105 status -> #<Process::Status: pid 1154105 exit 4> ``` The child process can be waited on with the handle, and the call to `waitpid2(-1)` finds nothing. ## Previous idea: OS-specific handles My first version of this proposal involved a similar API, but powering it with platform-specific concepts available on Linux, Windows, and FreeBSD which offer richer control than just pids & the `wait` syscall. In particular, I had believed that we could use the `clone` syscall in Linux to create a child process which: * Could be referred to by a unique file descriptor (a pidfd) which would be guaranteed never to be re-used (unlike a pid), * Would not generate a signal when it exited (i.e. no SIGCHLD). * Could not be waited on by an unspecting to `waitpid` (except if a special flag `__WCLONE` as passed). Unfortunately, when I tried to implement this, I ran into a pretty serious snag. It is possible to create such a process - BUT, when the process exec's, it goes _back_ to "raise-SIGCHLD-on-exit" and "allow-waiting-without-__WCLONE" modes. I guess this functionality in the clone syscall is really designed to power threads in Linux, rather than being a general-purpose "hidden process" API. So, I don't think we should use pidfds in this proposal. ## Motivation My use-case for this is that I’m working on a perf-based profiling tool for Ruby. To get around some Linux capability issues, I want my profiler gem (or CRuby patch, whatever it winds up being!) to fork a privileged helper binary to do some eBPF twiddling. But, if you’re profiling e.g. a Unicorn master process, the result of that binary exiting might be caught by Unicorn itself, rather than my (gem | interpreter feature). In my case, I'm so deep in linux specific stuff that just calling `clone(2)` from my extension is probably fine, but I had enough of a look at this process management stuff I thought it would be worth asking the question if this might be useful to other, more normal, gems. -- https://bugs.ruby-lang.org/
participants (4)
-
Eregon (Benoit Daloze)
-
Eric Wong
-
kjtsanaktsidis (KJ Tsanaktsidis)
-
nobu (Nobuyoshi Nakada)