[ruby-core:125078] [Ruby Bug#21959] rb_internal_thread_event_hooks_rw_lock is not reinitialized after fork causing deadlocks
Issue #21959 has been reported by anmarchenko_datadog (Andrey Marchenko). ---------------------------------------- Bug #21959: rb_internal_thread_event_hooks_rw_lock is not reinitialized after fork causing deadlocks https://bugs.ruby-lang.org/issues/21959 * Author: anmarchenko_datadog (Andrey Marchenko) * Status: Open * ruby -v: 4.0.1 * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN ---------------------------------------- Ruby's GVL Instrumentation API uses a read-write lock (rb_internal_thread_event_hooks_rw_lock) to protect the list of thread event hooks: - Read lock — acquired on every GVL transition to iterate and call hook callbacks (rb_thread_execute_hooks) - Write lock — acquired when adding/removing hooks (rb_internal_thread_add_event_hook, rb_internal_thread_remove_event_hook) After fork(), Ruby reinitializes several internal locks (e.g. vm->ractor.sched.lock, timer_th.waiting_lock), but not rb_internal_thread_event_hooks_rw_lock. This wasn't added with the GVL Instrumentation API. The full reproducer is available here: https://github.com/anmarchenko/ruby-locks-fork-bug ## Deadlock sequence 1. Parent process has thread event hooks registered (e.g. by a profiler like dd-trace-rb) 2. Multiple threads run concurrently, causing GVL transitions — each transition acquires a read lock on the rwlock 3. fork() happens while a thread holds the read lock 4. In the child, only the forking thread survives — the thread that held the lock is gone, but the lock state is copied as-is 5. Child tries to add or remove a hook → needs write lock → blocks forever on a lock that will never be released 6. Deadlock ## Impact This affects any Ruby C extension using the GVL Instrumentation API in combination with fork-based servers (Resque, Unicorn, Passenger, etc.). The original report comes from dd-trace-rb's profiler deadlocking Resque workers on Alpine Linux (musl libc): https://github.com/DataDog/dd-trace-rb/issues/4967 -- https://bugs.ruby-lang.org/
Issue #21959 has been updated by anmarchenko_datadog (Andrey Marchenko). I am planning to open a PR with proposed fix soon ---------------------------------------- Bug #21959: rb_internal_thread_event_hooks_rw_lock is not reinitialized after fork causing deadlocks https://bugs.ruby-lang.org/issues/21959#change-116782 * Author: anmarchenko_datadog (Andrey Marchenko) * Status: Open * ruby -v: 4.0.1 * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN ---------------------------------------- Ruby's GVL Instrumentation API uses a read-write lock (rb_internal_thread_event_hooks_rw_lock) to protect the list of thread event hooks: - Read lock — acquired on every GVL transition to iterate and call hook callbacks (rb_thread_execute_hooks) - Write lock — acquired when adding/removing hooks (rb_internal_thread_add_event_hook, rb_internal_thread_remove_event_hook) After fork(), Ruby reinitializes several internal locks (e.g. vm->ractor.sched.lock, timer_th.waiting_lock), but not rb_internal_thread_event_hooks_rw_lock. This wasn't added with the GVL Instrumentation API. The full reproducer is available here: https://github.com/anmarchenko/ruby-locks-fork-bug ## Deadlock sequence 1. Parent process has thread event hooks registered (e.g. by a profiler like dd-trace-rb) 2. Multiple threads run concurrently, causing GVL transitions — each transition acquires a read lock on the rwlock 3. fork() happens while a thread holds the read lock 4. In the child, only the forking thread survives — the thread that held the lock is gone, but the lock state is copied as-is 5. Child tries to add or remove a hook → needs write lock → blocks forever on a lock that will never be released 6. Deadlock ## Impact This affects any Ruby C extension using the GVL Instrumentation API in combination with fork-based servers (Resque, Unicorn, Passenger, etc.). The original report comes from dd-trace-rb's profiler deadlocking Resque workers on Alpine Linux (musl libc): https://github.com/DataDog/dd-trace-rb/issues/4967 -- https://bugs.ruby-lang.org/
Issue #21959 has been updated by anmarchenko_datadog (Andrey Marchenko). The PR is open: https://github.com/ruby/ruby/pull/16474 ---------------------------------------- Bug #21959: rb_internal_thread_event_hooks_rw_lock is not reinitialized after fork causing deadlocks https://bugs.ruby-lang.org/issues/21959#change-116783 * Author: anmarchenko_datadog (Andrey Marchenko) * Status: Open * ruby -v: 4.0.1 * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN ---------------------------------------- Ruby's GVL Instrumentation API uses a read-write lock (rb_internal_thread_event_hooks_rw_lock) to protect the list of thread event hooks: - Read lock — acquired on every GVL transition to iterate and call hook callbacks (rb_thread_execute_hooks) - Write lock — acquired when adding/removing hooks (rb_internal_thread_add_event_hook, rb_internal_thread_remove_event_hook) After fork(), Ruby reinitializes several internal locks (e.g. vm->ractor.sched.lock, timer_th.waiting_lock), but not rb_internal_thread_event_hooks_rw_lock. This wasn't added with the GVL Instrumentation API. The full reproducer is available here: https://github.com/anmarchenko/ruby-locks-fork-bug ## Deadlock sequence 1. Parent process has thread event hooks registered (e.g. by a profiler like dd-trace-rb) 2. Multiple threads run concurrently, causing GVL transitions — each transition acquires a read lock on the rwlock 3. fork() happens while a thread holds the read lock 4. In the child, only the forking thread survives — the thread that held the lock is gone, but the lock state is copied as-is 5. Child tries to add or remove a hook → needs write lock → blocks forever on a lock that will never be released 6. Deadlock ## Impact This affects any Ruby C extension using the GVL Instrumentation API in combination with fork-based servers (Resque, Unicorn, Passenger, etc.). The original report comes from dd-trace-rb's profiler deadlocking Resque workers on Alpine Linux (musl libc): https://github.com/DataDog/dd-trace-rb/issues/4967 -- https://bugs.ruby-lang.org/
Issue #21959 has been updated by hsbt (Hiroshi SHIBATA). Backport changed from 3.2: WONTFIX, 3.3: REQUIRED, 3.4: REQUIRED, 4.0: REQUIRED to 3.2: WONTFIX, 3.3: DONE, 3.4: REQUIRED, 4.0: REQUIRED ruby_3_3 commit:8f2fe167b3e6714216258f1509246ac08b6bac7e merged revision(s) commit:c8155822c460a5734d700cd468d306ca03b44ce4. ---------------------------------------- Bug #21959: rb_internal_thread_event_hooks_rw_lock is not reinitialized after fork causing deadlocks https://bugs.ruby-lang.org/issues/21959#change-116819 * Author: anmarchenko_datadog (Andrey Marchenko) * Status: Closed * ruby -v: 4.0.1 * Backport: 3.2: WONTFIX, 3.3: DONE, 3.4: REQUIRED, 4.0: REQUIRED ---------------------------------------- Ruby's GVL Instrumentation API uses a read-write lock (rb_internal_thread_event_hooks_rw_lock) to protect the list of thread event hooks: - Read lock — acquired on every GVL transition to iterate and call hook callbacks (rb_thread_execute_hooks) - Write lock — acquired when adding/removing hooks (rb_internal_thread_add_event_hook, rb_internal_thread_remove_event_hook) After fork(), Ruby reinitializes several internal locks (e.g. vm->ractor.sched.lock, timer_th.waiting_lock), but not rb_internal_thread_event_hooks_rw_lock. This wasn't added with the GVL Instrumentation API. The full reproducer is available here: https://github.com/anmarchenko/ruby-locks-fork-bug ## Deadlock sequence 1. Parent process has thread event hooks registered (e.g. by a profiler like dd-trace-rb) 2. Multiple threads run concurrently, causing GVL transitions — each transition acquires a read lock on the rwlock 3. fork() happens while a thread holds the read lock 4. In the child, only the forking thread survives — the thread that held the lock is gone, but the lock state is copied as-is 5. Child tries to add or remove a hook → needs write lock → blocks forever on a lock that will never be released 6. Deadlock ## Impact This affects any Ruby C extension using the GVL Instrumentation API in combination with fork-based servers (Resque, Unicorn, Passenger, etc.). The original report comes from dd-trace-rb's profiler deadlocking Resque workers on Alpine Linux (musl libc): https://github.com/DataDog/dd-trace-rb/issues/4967 -- https://bugs.ruby-lang.org/
Issue #21959 has been updated by k0kubun (Takashi Kokubun). Backport changed from 3.2: WONTFIX, 3.3: DONE, 3.4: REQUIRED, 4.0: REQUIRED to 3.2: WONTFIX, 3.3: DONE, 3.4: REQUIRED, 4.0: DONE ruby_4_0 commit:c38f8732c4ae6448cf05c795ddd5df4040ceeea8 merged revision(s) commit:c8155822c460a5734d700cd468d306ca03b44ce4. ---------------------------------------- Bug #21959: rb_internal_thread_event_hooks_rw_lock is not reinitialized after fork causing deadlocks https://bugs.ruby-lang.org/issues/21959#change-116920 * Author: anmarchenko_datadog (Andrey Marchenko) * Status: Closed * ruby -v: 4.0.1 * Backport: 3.2: WONTFIX, 3.3: DONE, 3.4: REQUIRED, 4.0: DONE ---------------------------------------- Ruby's GVL Instrumentation API uses a read-write lock (rb_internal_thread_event_hooks_rw_lock) to protect the list of thread event hooks: - Read lock — acquired on every GVL transition to iterate and call hook callbacks (rb_thread_execute_hooks) - Write lock — acquired when adding/removing hooks (rb_internal_thread_add_event_hook, rb_internal_thread_remove_event_hook) After fork(), Ruby reinitializes several internal locks (e.g. vm->ractor.sched.lock, timer_th.waiting_lock), but not rb_internal_thread_event_hooks_rw_lock. This wasn't added with the GVL Instrumentation API. The full reproducer is available here: https://github.com/anmarchenko/ruby-locks-fork-bug ## Deadlock sequence 1. Parent process has thread event hooks registered (e.g. by a profiler like dd-trace-rb) 2. Multiple threads run concurrently, causing GVL transitions — each transition acquires a read lock on the rwlock 3. fork() happens while a thread holds the read lock 4. In the child, only the forking thread survives — the thread that held the lock is gone, but the lock state is copied as-is 5. Child tries to add or remove a hook → needs write lock → blocks forever on a lock that will never be released 6. Deadlock ## Impact This affects any Ruby C extension using the GVL Instrumentation API in combination with fork-based servers (Resque, Unicorn, Passenger, etc.). The original report comes from dd-trace-rb's profiler deadlocking Resque workers on Alpine Linux (musl libc): https://github.com/DataDog/dd-trace-rb/issues/4967 -- https://bugs.ruby-lang.org/
participants (3)
-
anmarchenko_datadog (Andrey Marchenko) -
hsbt (Hiroshi SHIBATA) -
k0kubun (Takashi Kokubun)