[ruby-core:116183] [Ruby master Bug#20181] Process.wait(-1) doesn't report exited child processes if WAITPID_USE_SIGCHLD is enabled

Issue #20181 has been reported by stanhu (Stan Hu). ---------------------------------------- Bug #20181: Process.wait(-1) doesn't report exited child processes if WAITPID_USE_SIGCHLD is enabled https://bugs.ruby-lang.org/issues/20181 * Author: stanhu (Stan Hu) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.2 (2023-03-30 revision e51014f9c0) [aarch64-linux] * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- From Ruby 2.6 to 3.2, `Process.wait(-1)` doesn't return in a timely manner if a spawned, detached process is still running. The following script exits immediately with Ruby 3.3, but hangs for 10 minutes (the length of the `sleep`) in Ruby 2.6 to 3.2: ```ruby #!/bin/env ruby Process.spawn({}, "sh -c 'sleep 600'").tap do |pid| puts "detaching PID #{pid}" Process.detach(pid) end forked_pid = fork do loop { sleep 1 } end child_waiter = Thread.new do puts "Waiting for child process to die..." # This works # puts Process.wait2(forked_pid) # The spawned process has to exit before this returns in Ruby 3.1 and 3.2 pid, status = Process.wait2(-1) puts "Exited PID: #{pid}, status: #{status}" end process_killer = Thread.new do puts "Killing #{forked_pid}" system("kill #{forked_pid}") end child_waiter.join process_killer.join ``` In Ruby 3.2, we see: ``` detaching PID 8 Waiting for child process to die... Killing 11 <process hangs here> ``` In Ruby 3.3, this exits immediately: ``` detaching PID 9 Waiting for child process to die... Killing 11 Exited PID: 11, status: pid 11 SIGTERM (signal 15) ``` However, if I switch the `Process.wait(-1)` to `Process.wait(forked_pid)`, Ruby 3.2 works fine. I've validated that this problem goes away if I disable `WAITPID_USE_SIGCHLD`: ```diff diff --git a/vm_core.h b/vm_core.h index 1cc0659700..0e7d1643fe 100644 --- a/vm_core.h +++ b/vm_core.h @@ -126,7 +126,7 @@ #endif /* define to 0 to test old code path */ -#define WAITPID_USE_SIGCHLD (RUBY_SIGCHLD || SIGCHLD_LOSSY) +#define WAITPID_USE_SIGCHLD 0 #if defined(SIGSEGV) && defined(HAVE_SIGALTSTACK) && defined(SA_SIGINFO) && !defined(__NetBSD__) # define USE_SIGALTSTACK ``` This was first reported in the Puma issue tracker (https://github.com/puma/puma/issues/3313), and another contributor documented long-standing issues with `Process.wait` in the past: https://github.com/dentarg/gists/tree/master/gists/ruby-bug-15499#ruby--puma... In Ruby 2.6, https://github.com/ruby/ruby/commit/054a412d540e7ed2de63d68da753f585ea6616c3 introduced a mechanism for `rb_waitpid` that uses `SIGCHLD` for blocking `wait` calls, and this might have introduced this bug. Ruby 2.5 doesn't appear to have this problem. In Ruby 3.3, this `SIGCHLD` implementation was dropped in https://github.com/ruby/ruby/pull/7476 and https://github.com/ruby/ruby/pull/7527, so Ruby 3.3 no longer appears affected. -- https://bugs.ruby-lang.org/

Issue #20181 has been updated by stanhu (Stan Hu). This might be a duplicate of https://bugs.ruby-lang.org/issues/19322. ---------------------------------------- Bug #20181: Process.wait(-1) doesn't report exited child processes if WAITPID_USE_SIGCHLD is enabled https://bugs.ruby-lang.org/issues/20181#change-106198 * Author: stanhu (Stan Hu) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.2 (2023-03-30 revision e51014f9c0) [aarch64-linux] * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- From Ruby 2.6 to 3.2, `Process.wait(-1)` doesn't return in a timely manner if a spawned, detached process is still running. The following script exits immediately with Ruby 3.3, but hangs for 10 minutes (the length of the `sleep`) in Ruby 2.6 to 3.2: ```ruby #!/bin/env ruby Process.spawn({}, "sh -c 'sleep 600'").tap do |pid| puts "detaching PID #{pid}" Process.detach(pid) end forked_pid = fork do loop { sleep 1 } end child_waiter = Thread.new do puts "Waiting for child process to die..." # This works # puts Process.wait2(forked_pid) # The spawned process has to exit before this returns in Ruby 3.1 and 3.2 pid, status = Process.wait2(-1) puts "Exited PID: #{pid}, status: #{status}" end process_killer = Thread.new do puts "Killing #{forked_pid}" system("kill #{forked_pid}") end child_waiter.join process_killer.join ``` In Ruby 3.2, we see: ``` detaching PID 8 Waiting for child process to die... Killing 11 <process hangs here> ``` In Ruby 3.3, this exits immediately: ``` detaching PID 9 Waiting for child process to die... Killing 11 Exited PID: 11, status: pid 11 SIGTERM (signal 15) ``` However, if I switch the `Process.wait(-1)` to `Process.wait(forked_pid)`, Ruby 3.2 works fine. I've validated that this problem goes away if I disable `WAITPID_USE_SIGCHLD`: ```diff diff --git a/vm_core.h b/vm_core.h index 1cc0659700..0e7d1643fe 100644 --- a/vm_core.h +++ b/vm_core.h @@ -126,7 +126,7 @@ #endif /* define to 0 to test old code path */ -#define WAITPID_USE_SIGCHLD (RUBY_SIGCHLD || SIGCHLD_LOSSY) +#define WAITPID_USE_SIGCHLD 0 #if defined(SIGSEGV) && defined(HAVE_SIGALTSTACK) && defined(SA_SIGINFO) && !defined(__NetBSD__) # define USE_SIGALTSTACK ``` This was first reported in the Puma issue tracker (https://github.com/puma/puma/issues/3313), and another contributor documented long-standing issues with `Process.wait` in the past: https://github.com/dentarg/gists/tree/master/gists/ruby-bug-15499#ruby--puma... In Ruby 2.6, https://github.com/ruby/ruby/commit/054a412d540e7ed2de63d68da753f585ea6616c3 introduced a mechanism for `rb_waitpid` that uses `SIGCHLD` for blocking `wait` calls, and this might have introduced this bug. Ruby 2.5 doesn't appear to have this problem. In Ruby 3.3, this `SIGCHLD` implementation was dropped in https://github.com/ruby/ruby/pull/7476 and https://github.com/ruby/ruby/pull/7527, so Ruby 3.3 no longer appears affected. -- https://bugs.ruby-lang.org/

Issue #20181 has been updated by kjtsanaktsidis (KJ Tsanaktsidis). Actually I think this is a duplicate of https://bugs.ruby-lang.org/issues/19837. Does this describe your issue? The fix for this was backported into the Ruby 3.2 and 3.1 branches, but I don't think a release of either 3.2 or 3.1 has been performed since then. Does the problem go away if you compile Ruby from the `ruby_3_2` directly? ---------------------------------------- Bug #20181: Process.wait(-1) doesn't report exited child processes if WAITPID_USE_SIGCHLD is enabled https://bugs.ruby-lang.org/issues/20181#change-106202 * Author: stanhu (Stan Hu) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.2 (2023-03-30 revision e51014f9c0) [aarch64-linux] * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- From Ruby 2.6 to 3.2, `Process.wait(-1)` doesn't return in a timely manner if a spawned, detached process is still running. The following script exits immediately with Ruby 3.3, but hangs for 10 minutes (the length of the `sleep`) in Ruby 2.6 to 3.2: ```ruby #!/bin/env ruby Process.spawn({}, "sh -c 'sleep 600'").tap do |pid| puts "detaching PID #{pid}" Process.detach(pid) end forked_pid = fork do loop { sleep 1 } end child_waiter = Thread.new do puts "Waiting for child process to die..." # This works # puts Process.wait2(forked_pid) # The spawned process has to exit before this returns in Ruby 3.1 and 3.2 pid, status = Process.wait2(-1) puts "Exited PID: #{pid}, status: #{status}" end process_killer = Thread.new do puts "Killing #{forked_pid}" system("kill #{forked_pid}") end child_waiter.join process_killer.join ``` In Ruby 3.2, we see: ``` detaching PID 8 Waiting for child process to die... Killing 11 <process hangs here> ``` In Ruby 3.3, this exits immediately: ``` detaching PID 9 Waiting for child process to die... Killing 11 Exited PID: 11, status: pid 11 SIGTERM (signal 15) ``` However, if I switch the `Process.wait(-1)` to `Process.wait(forked_pid)`, Ruby 3.2 works fine. I've validated that this problem goes away if I disable `WAITPID_USE_SIGCHLD`: ```diff diff --git a/vm_core.h b/vm_core.h index 1cc0659700..0e7d1643fe 100644 --- a/vm_core.h +++ b/vm_core.h @@ -126,7 +126,7 @@ #endif /* define to 0 to test old code path */ -#define WAITPID_USE_SIGCHLD (RUBY_SIGCHLD || SIGCHLD_LOSSY) +#define WAITPID_USE_SIGCHLD 0 #if defined(SIGSEGV) && defined(HAVE_SIGALTSTACK) && defined(SA_SIGINFO) && !defined(__NetBSD__) # define USE_SIGALTSTACK ``` This was first reported in the Puma issue tracker (https://github.com/puma/puma/issues/3313), and another contributor documented long-standing issues with `Process.wait` in the past: https://github.com/dentarg/gists/tree/master/gists/ruby-bug-15499#ruby--puma... In Ruby 2.6, https://github.com/ruby/ruby/commit/054a412d540e7ed2de63d68da753f585ea6616c3 introduced a mechanism for `rb_waitpid` that uses `SIGCHLD` for blocking `wait` calls, and this might have introduced this bug. Ruby 2.5 doesn't appear to have this problem. In Ruby 3.3, this `SIGCHLD` implementation was dropped in https://github.com/ruby/ruby/pull/7476 and https://github.com/ruby/ruby/pull/7527, so Ruby 3.3 no longer appears affected. -- https://bugs.ruby-lang.org/

Issue #20181 has been updated by stanhu (Stan Hu). Yes, thanks, this definitely looks like the same issue. Thanks for filing that issue and getting the patches merged. I tested `ruby_3_2`, and it appears that the patch partially fixes the problem. When `Process.wait2(-1), Process::WNOHANG)` is used, everything works. But if the script above is used without `Process::WNOHANG`, the problem persists. ---------------------------------------- Bug #20181: Process.wait(-1) doesn't report exited child processes if WAITPID_USE_SIGCHLD is enabled https://bugs.ruby-lang.org/issues/20181#change-106204 * Author: stanhu (Stan Hu) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.2 (2023-03-30 revision e51014f9c0) [aarch64-linux] * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- From Ruby 2.6 to 3.2, `Process.wait(-1)` doesn't return in a timely manner if a spawned, detached process is still running. The following script exits immediately with Ruby 3.3, but hangs for 10 minutes (the length of the `sleep`) in Ruby 2.6 to 3.2: ```ruby #!/bin/env ruby Process.spawn({}, "sh -c 'sleep 600'").tap do |pid| puts "detaching PID #{pid}" Process.detach(pid) end forked_pid = fork do loop { sleep 1 } end child_waiter = Thread.new do puts "Waiting for child process to die..." # This works # puts Process.wait2(forked_pid) # The spawned process has to exit before this returns in Ruby 3.1 and 3.2 pid, status = Process.wait2(-1) puts "Exited PID: #{pid}, status: #{status}" end process_killer = Thread.new do puts "Killing #{forked_pid}" system("kill #{forked_pid}") end child_waiter.join process_killer.join ``` In Ruby 3.2, we see: ``` detaching PID 8 Waiting for child process to die... Killing 11 <process hangs here> ``` In Ruby 3.3, this exits immediately: ``` detaching PID 9 Waiting for child process to die... Killing 11 Exited PID: 11, status: pid 11 SIGTERM (signal 15) ``` However, if I switch the `Process.wait(-1)` to `Process.wait(forked_pid)`, Ruby 3.2 works fine. I've validated that this problem goes away if I disable `WAITPID_USE_SIGCHLD`: ```diff diff --git a/vm_core.h b/vm_core.h index 1cc0659700..0e7d1643fe 100644 --- a/vm_core.h +++ b/vm_core.h @@ -126,7 +126,7 @@ #endif /* define to 0 to test old code path */ -#define WAITPID_USE_SIGCHLD (RUBY_SIGCHLD || SIGCHLD_LOSSY) +#define WAITPID_USE_SIGCHLD 0 #if defined(SIGSEGV) && defined(HAVE_SIGALTSTACK) && defined(SA_SIGINFO) && !defined(__NetBSD__) # define USE_SIGALTSTACK ``` This was first reported in the Puma issue tracker (https://github.com/puma/puma/issues/3313), and another contributor documented long-standing issues with `Process.wait` in the past: https://github.com/dentarg/gists/tree/master/gists/ruby-bug-15499#ruby--puma... In Ruby 2.6, https://github.com/ruby/ruby/commit/054a412d540e7ed2de63d68da753f585ea6616c3 introduced a mechanism for `rb_waitpid` that uses `SIGCHLD` for blocking `wait` calls, and this might have introduced this bug. Ruby 2.5 doesn't appear to have this problem. In Ruby 3.3, this `SIGCHLD` implementation was dropped in https://github.com/ruby/ruby/pull/7476 and https://github.com/ruby/ruby/pull/7527, so Ruby 3.3 no longer appears affected. -- https://bugs.ruby-lang.org/
participants (2)
-
kjtsanaktsidis (KJ Tsanaktsidis)
-
stanhu (Stan Hu)