[ruby-core:112155] [Ruby master Bug#19395] Process forking within non-main Ractor creates child stuck in busy loop

Issue #19395 has been reported by luke-gru (Luke Gruber). ---------------------------------------- Bug #19395: Process forking within non-main Ractor creates child stuck in busy loop https://bugs.ruby-lang.org/issues/19395 * Author: luke-gru (Luke Gruber) * Status: Open * Priority: Normal * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- ```ruby def test_fork_in_ractor r2 = Ractor.new do pid = fork do exit Ractor.count end pid end pid = r2.take puts "Process #{Process.pid} waiting for #{pid}" _pid, status = Process.waitpid2(pid) # stuck forever if status.exitstatus != 1 raise "status is #{status.exitstatus}" end end test_fork_in_ractor() ``` $ top # shows CPU usage is high for child process -- https://bugs.ruby-lang.org/

Issue #19395 has been updated by luke-gru (Luke Gruber). Subject changed from Process forking within non-main Ractor creates child stuck in busy loop to Process forking within non-main Ractor causes segv Sorry, my changes in my dev branch were causing some odd behavior. It just crashes on 3.2.0. ---------------------------------------- Bug #19395: Process forking within non-main Ractor causes segv https://bugs.ruby-lang.org/issues/19395#change-101593 * Author: luke-gru (Luke Gruber) * Status: Open * Priority: Normal * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- ```ruby def test_fork_in_ractor r2 = Ractor.new do pid = fork do exit Ractor.count end pid end pid = r2.take puts "Process #{Process.pid} waiting for #{pid}" _pid, status = Process.waitpid2(pid) # stuck forever if status.exitstatus != 1 raise "status is #{status.exitstatus}" end end test_fork_in_ractor() ``` $ top # shows CPU usage is high for child process -- https://bugs.ruby-lang.org/

Issue #19395 has been updated by nobu (Nobuyoshi Nakada). luke-gru (Luke Gruber) wrote in #note-1:
It just crashes on 3.2.0.
I can't reproduce the SEGV on macOS 13.1. What platform are you using? ---------------------------------------- Bug #19395: Process forking within non-main Ractor causes segv https://bugs.ruby-lang.org/issues/19395#change-101600 * Author: luke-gru (Luke Gruber) * Status: Open * Priority: Normal * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- ```ruby def test_fork_in_ractor r2 = Ractor.new do pid = fork do exit Ractor.count end pid end pid = r2.take puts "Process #{Process.pid} waiting for #{pid}" _pid, status = Process.waitpid2(pid) # stuck forever if status.exitstatus != 1 raise "status is #{status.exitstatus}" end end test_fork_in_ractor() ``` $ top # shows CPU usage is high for child process -- https://bugs.ruby-lang.org/

Issue #19395 has been updated by luke-gru (Luke Gruber). ruby -v set to 3.2.0 Ubuntu 22.04, the issue seems to be calling `rb_native_mutex_destroy` on a locked mutex in `ractor_free`. Relevant part of the backtrace: ``` /home/lukeg/.rbenv/versions/3.2.0/lib/libruby.so.3.2(die+0x0) [0x7fc1374d0e5f] /tmp/ruby-build.20230103230257.28392.jU9iPR/ruby-3.2.0/error.c:798 /home/lukeg/.rbenv/versions/3.2.0/lib/libruby.so.3.2(rb_bug) /tmp/ruby-build.20230103230257.28392.jU9iPR/ruby-3.2.0/error.c:800 /home/lukeg/.rbenv/versions/3.2.0/lib/libruby.so.3.2(rb_bug_errno+0x43) [0x7fc137579223] /tmp/ruby-build.20230103230257.28392.jU9iPR/ruby-3.2.0/error.c:829 /home/lukeg/.rbenv/versions/3.2.0/lib/libruby.so.3.2(rb_native_mutex_destroy+0x24) [0x7fc137719a24] /tmp/ruby-build.20230103230257.28392.jU9iPR/ruby-3.2.0/thread_pthread.c:603 /home/lukeg/.rbenv/versions/3.2.0/lib/libruby.so.3.2(ractor_free+0x11) [0x7fc137679991] /tmp/ruby-build.20230103230257.28392.jU9iPR/ruby-3.2.0/ractor.c:235 /home/lukeg/.rbenv/versions/3.2.0/lib/libruby.so.3.2(run_final+0xf) ``` If instead you change `exit 0` to `exec "date"`, it doesn't crash. Maybe the atfork hooks need to be changed to acquire locks in parent, unlock in child. ---------------------------------------- Bug #19395: Process forking within non-main Ractor causes segv https://bugs.ruby-lang.org/issues/19395#change-101606 * Author: luke-gru (Luke Gruber) * Status: Feedback * Priority: Normal * ruby -v: 3.2.0 * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- ```ruby def test_fork_in_ractor r2 = Ractor.new do pid = fork do exit Ractor.count end pid end pid = r2.take puts "Process #{Process.pid} waiting for #{pid}" _pid, status = Process.waitpid2(pid) # stuck forever if status.exitstatus != 1 raise "status is #{status.exitstatus}" end end test_fork_in_ractor() ``` $ top # shows CPU usage is high for child process -- https://bugs.ruby-lang.org/

Issue #19395 has been updated by luke-gru (Luke Gruber). This fixes it: https://github.com/luke-gru/ruby/commit/0cb53d4458eb09d8a3f70caaa44c688b48ba... The issue is that when there's multiple ractors and you call fork, the other ractor(s) that are in the child process that aren't the new main ractor need to be GC'd, and their mutexes could be in a weird state, so either skip destruction of them or reinitialize them in the child process. Re-init works on my machine but I don't know if it works across platforms. ---------------------------------------- Bug #19395: Process forking within non-main Ractor hits rb_bug() https://bugs.ruby-lang.org/issues/19395#change-101612 * Author: luke-gru (Luke Gruber) * Status: Feedback * Priority: Normal * ruby -v: 3.2.0 * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- ```ruby def test_fork_in_ractor r2 = Ractor.new do pid = fork do exit Ractor.count end pid end pid = r2.take puts "Process #{Process.pid} waiting for #{pid}" _pid, status = Process.waitpid2(pid) # stuck forever if status.exitstatus != 1 raise "status is #{status.exitstatus}" end end test_fork_in_ractor() ``` $ top # shows CPU usage is high for child process -- https://bugs.ruby-lang.org/

Issue #19395 has been updated by luke-gru (Luke Gruber). I can no longer reproduce this issue, I probably had some changes in my tree that were causing the issues. Sorry! Please close. ---------------------------------------- Bug #19395: Process forking within non-main Ractor hits rb_bug() https://bugs.ruby-lang.org/issues/19395#change-106827 * Author: luke-gru (Luke Gruber) * Status: Feedback * Priority: Normal * ruby -v: 3.2.0 * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- ```ruby def test_fork_in_ractor r2 = Ractor.new do pid = fork do exit Ractor.count end pid end pid = r2.take puts "Process #{Process.pid} waiting for #{pid}" _pid, status = Process.waitpid2(pid) # stuck forever if status.exitstatus != 1 raise "status is #{status.exitstatus}" end end test_fork_in_ractor() ``` $ top # shows CPU usage is high for child process -- https://bugs.ruby-lang.org/
participants (2)
-
luke-gru (Luke Gruber)
-
nobu (Nobuyoshi Nakada)