[ruby-core:122417] [Ruby Bug#21398] Ractor.select hangs when multiple threads submit heavy jobs concurrently

Issue #21398 has been reported by arino.tamada (有乃 玉田). ---------------------------------------- Bug #21398: Ractor.select hangs when multiple threads submit heavy jobs concurrently https://bugs.ruby-lang.org/issues/21398 * Author: arino.tamada (有乃 玉田) * Status: Open * ruby -v: ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin23] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- When multiple threads run heavy Ractor-based jobs at the same time, `Ractor.select(*workers)` can hang indefinitely without any crash, exception, or deadlock being reported. This issue does not occur when only one thread is running similar jobs — the hang happens only when four or more threads submit large jobs concurrently to a shared Ractor pipeline. ### Steps to Reproduce Run the script below in Ruby 3.4.2 or later. In some runs, one or more threads will hang forever at Ractor.select. The issue reproduces more easily on machines with multiple cores and sufficient memory. ### Expected Behavior All calls to Ractor.select(*workers) should return when any worker Ractor yields a result. All jobs should complete. ### Actual Behavior - In some runs (especially with multiple threads), `Ractor.select` hangs and never returns. - There is no crash or exception; the process remains alive but blocked. - All Ractors and threads are created and started correctly. - This behavior does not occur when only one thread performs the same job sequentially. - It appears that `Ractor.select` does not always behave reliably under concurrent multithreaded usage with heavy Ractor pipelines. Reproducible Example: ```ruby THREADS = 4 JOBS_PER_THREAD = 10 ARRAY_SIZE = 10_000 def ractor_job(job_count, array_size) pipe = Ractor.new do loop { Ractor.yield Ractor.receive } end workers = (1..4).map do Ractor.new(pipe) do |pipe| while job = pipe.take result = job.map { |x| x * 2 }.sum Ractor.yield result end end end jobs = Array.new(job_count) { Array.new(array_size) { rand(1000) } } jobs.each { |job| pipe.send(job) } results = [] jobs.size.times do puts "Waiting for results..." _ractor, result = Ractor.select(*workers) puts "Received result: #{result}" results << result end results end threads = [] THREADS.times do threads << Thread.new do ractor_job(JOBS_PER_THREAD, ARRAY_SIZE) end end threads.each(&:join) puts "All threads finished." ``` -- https://bugs.ruby-lang.org/

Issue #21398 has been updated by luke-gru (Luke Gruber). Thank you for the report. Is it possible for you to try to install ruby-head (3.5.0dev) and try this script with the recent changes to ractors (ractor ports)? I'm having trouble reproducing it on my end on ruby-head. Here's the script with modifications for working with ports: ```ruby THREADS = 4 JOBS_PER_THREAD = 10 ARRAY_SIZE = 10_000 def ractor_job(job_count, array_size) port = Ractor::Port.new pipe = Ractor.new do loop { job, worker = Ractor.receive worker.send job } end workers = (1..4).map do |i| Ractor.new(pipe, port) do |pipe, job_port| while job = Ractor.receive result = job.map { |x| x * 2 }.sum job_port.send result end end end jobs = Array.new(job_count) { Array.new(array_size) { rand(1000) } } jobs.each_with_index do |job, i| w_idx = i % 4 pipe.send([job, workers[w_idx]]) end results = [] jobs.size.times do puts "Waiting for results..." _ractor, result = Ractor.select(port) puts "Received result: #{result}" results << result end results end threads = [] THREADS.times do threads << Thread.new do ractor_job(JOBS_PER_THREAD, ARRAY_SIZE) end end threads.each(&:join) puts "All threads finished." ``` If you're unfamiliar with ractor ports, here's the ticket for the feature: https://bugs.ruby-lang.org/issues/21262 ---------------------------------------- Bug #21398: Ractor.select hangs when multiple threads submit heavy jobs concurrently https://bugs.ruby-lang.org/issues/21398#change-113648 * Author: arino.tamada (有乃 玉田) * Status: Open * ruby -v: ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin23] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- When multiple threads run heavy Ractor-based jobs at the same time, `Ractor.select(*workers)` can hang indefinitely without any crash, exception, or deadlock being reported. This issue does not occur when only one thread is running similar jobs — the hang happens only when four or more threads submit large jobs concurrently to a shared Ractor pipeline. ### Steps to Reproduce Run the script below in Ruby 3.4.2 or later. In some runs, one or more threads will hang forever at Ractor.select. The issue reproduces more easily on machines with multiple cores and sufficient memory. ### Expected Behavior All calls to Ractor.select(*workers) should return when any worker Ractor yields a result. All jobs should complete. ### Actual Behavior - In some runs (especially with multiple threads), `Ractor.select` hangs and never returns. - There is no crash or exception; the process remains alive but blocked. - All Ractors and threads are created and started correctly. - This behavior does not occur when only one thread performs the same job sequentially. - It appears that `Ractor.select` does not always behave reliably under concurrent multithreaded usage with heavy Ractor pipelines. Reproducible Example: ```ruby THREADS = 4 JOBS_PER_THREAD = 10 ARRAY_SIZE = 10_000 def ractor_job(job_count, array_size) pipe = Ractor.new do loop { Ractor.yield Ractor.receive } end workers = (1..4).map do Ractor.new(pipe) do |pipe| while job = pipe.take result = job.map { |x| x * 2 }.sum Ractor.yield result end end end jobs = Array.new(job_count) { Array.new(array_size) { rand(1000) } } jobs.each { |job| pipe.send(job) } results = [] jobs.size.times do puts "Waiting for results..." _ractor, result = Ractor.select(*workers) puts "Received result: #{result}" results << result end results end threads = [] THREADS.times do threads << Thread.new do ractor_job(JOBS_PER_THREAD, ARRAY_SIZE) end end threads.each(&:join) puts "All threads finished." ``` -- https://bugs.ruby-lang.org/

Issue #21398 has been updated by arino.tamada (有乃 玉田). Thank you for your response. I have tried running the script with Ruby `3.5.0-dev` (ruby-head) and using `Ractor::Port` as in your example. I can confirm that the issue does not reproduce in my environment with these recent changes. The script completes successfully and does not hang at `Ractor.select`. Thank you for your support and the improvements to Ractor! ---------------------------------------- Bug #21398: Ractor.select hangs when multiple threads submit heavy jobs concurrently https://bugs.ruby-lang.org/issues/21398#change-113664 * Author: arino.tamada (有乃 玉田) * Status: Open * ruby -v: ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin23] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- When multiple threads run heavy Ractor-based jobs at the same time, `Ractor.select(*workers)` can hang indefinitely without any crash, exception, or deadlock being reported. This issue does not occur when only one thread is running similar jobs — the hang happens only when four or more threads submit large jobs concurrently to a shared Ractor pipeline. ### Steps to Reproduce Run the script below in Ruby 3.4.2 or later. In some runs, one or more threads will hang forever at Ractor.select. The issue reproduces more easily on machines with multiple cores and sufficient memory. ### Expected Behavior All calls to Ractor.select(*workers) should return when any worker Ractor yields a result. All jobs should complete. ### Actual Behavior - In some runs (especially with multiple threads), `Ractor.select` hangs and never returns. - There is no crash or exception; the process remains alive but blocked. - All Ractors and threads are created and started correctly. - This behavior does not occur when only one thread performs the same job sequentially. - It appears that `Ractor.select` does not always behave reliably under concurrent multithreaded usage with heavy Ractor pipelines. Reproducible Example: ```ruby THREADS = 4 JOBS_PER_THREAD = 10 ARRAY_SIZE = 10_000 def ractor_job(job_count, array_size) pipe = Ractor.new do loop { Ractor.yield Ractor.receive } end workers = (1..4).map do Ractor.new(pipe) do |pipe| while job = pipe.take result = job.map { |x| x * 2 }.sum Ractor.yield result end end end jobs = Array.new(job_count) { Array.new(array_size) { rand(1000) } } jobs.each { |job| pipe.send(job) } results = [] jobs.size.times do puts "Waiting for results..." _ractor, result = Ractor.select(*workers) puts "Received result: #{result}" results << result end results end threads = [] THREADS.times do threads << Thread.new do ractor_job(JOBS_PER_THREAD, ARRAY_SIZE) end end threads.each(&:join) puts "All threads finished." ``` -- https://bugs.ruby-lang.org/

Issue #21398 has been updated by arino.tamada (有乃 玉田). I'm experiencing a hang when combining `Ractor::Port` and multiple `Thread`s in Ruby 3.5.0dev (ruby-head). The script below is a modified version of the one shared in comment #1 of this issue, with adjusted parameters. When I increase the number of threads to 10 (`THREADS = 10`), the program often hangs after printing `"Waiting for results…"`. CPU usage drops to almost zero, and memory usage remains high (about 10GB), suggesting that the program is stuck rather than still computing. ```ruby THREADS = 10 JOBS_PER_THREAD = 100 ARRAY_SIZE = 1_000_000 def ractor_job(job_count, array_size) port = Ractor::Port.new workers = (1..4).map do |i| Ractor.new(port) do |job_port| while job = Ractor.receive result = job.map { |x| x * 2 }.sum job_port.send result end end end jobs = Array.new(job_count) { Array.new(array_size) { rand(1000) } } jobs.each_with_index do |job, i| w_idx = i % 4 workers[w_idx].send(job) end results = [] jobs.size.times do puts "Waiting for results..." _ractor, result = Ractor.select(port) puts "Received result: #{result}" results << result end results end threads = [] THREADS.times do threads << Thread.new do ractor_job(JOBS_PER_THREAD, ARRAY_SIZE) end end threads.each(&:join) puts "All threads finished." ``` Observations: - The program prints "Waiting for results..." but no "Received result:" lines appear after a certain point. - CPU usage is low after the hang begins, but memory usage remains high (~10GB). - Thread#join blocks indefinitely; the script never reaches "All threads finished." - Using ruby-head, the hang occurred 6 out of 10 times under the same conditions. ### ruby -v ruby 3.5.0dev (2025-06-17T22:51:16Z master 3cfd71e7e4) +PRISM [arm64-darwin23] ---------------------------------------- Bug #21398: Ractor.select hangs when multiple threads submit heavy jobs concurrently https://bugs.ruby-lang.org/issues/21398#change-113813 * Author: arino.tamada (有乃 玉田) * Status: Open * ruby -v: ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin23] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- When multiple threads run heavy Ractor-based jobs at the same time, `Ractor.select(*workers)` can hang indefinitely without any crash, exception, or deadlock being reported. This issue does not occur when only one thread is running similar jobs — the hang happens only when four or more threads submit large jobs concurrently to a shared Ractor pipeline. ### Steps to Reproduce Run the script below in Ruby 3.4.2 or later. In some runs, one or more threads will hang forever at Ractor.select. The issue reproduces more easily on machines with multiple cores and sufficient memory. ### Expected Behavior All calls to Ractor.select(*workers) should return when any worker Ractor yields a result. All jobs should complete. ### Actual Behavior - In some runs (especially with multiple threads), `Ractor.select` hangs and never returns. - There is no crash or exception; the process remains alive but blocked. - All Ractors and threads are created and started correctly. - This behavior does not occur when only one thread performs the same job sequentially. - It appears that `Ractor.select` does not always behave reliably under concurrent multithreaded usage with heavy Ractor pipelines. Reproducible Example: ```ruby THREADS = 4 JOBS_PER_THREAD = 10 ARRAY_SIZE = 10_000 def ractor_job(job_count, array_size) pipe = Ractor.new do loop { Ractor.yield Ractor.receive } end workers = (1..4).map do Ractor.new(pipe) do |pipe| while job = pipe.take result = job.map { |x| x * 2 }.sum Ractor.yield result end end end jobs = Array.new(job_count) { Array.new(array_size) { rand(1000) } } jobs.each { |job| pipe.send(job) } results = [] jobs.size.times do puts "Waiting for results..." _ractor, result = Ractor.select(*workers) puts "Received result: #{result}" results << result end results end threads = [] THREADS.times do threads << Thread.new do ractor_job(JOBS_PER_THREAD, ARRAY_SIZE) end end threads.each(&:join) puts "All threads finished." ``` -- https://bugs.ruby-lang.org/

Issue #21398 has been updated by luke-gru (Luke Gruber). Assignee set to ractor Target version set to 3.5 I sent a [PR](https://github.com/ruby/ruby/pull/13682) for this but I don't know if this is the proper fix or if we should use a different mutex to wait for condvar for DNTs. I did, however, figure out why it's happening. @ko1 what are your thoughts? ---------------------------------------- Bug #21398: Ractor.select hangs when multiple threads submit heavy jobs concurrently https://bugs.ruby-lang.org/issues/21398#change-113819 * Author: arino.tamada (有乃 玉田) * Status: Open * Assignee: ractor * Target version: 3.5 * ruby -v: ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin23] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- When multiple threads run heavy Ractor-based jobs at the same time, `Ractor.select(*workers)` can hang indefinitely without any crash, exception, or deadlock being reported. This issue does not occur when only one thread is running similar jobs — the hang happens only when four or more threads submit large jobs concurrently to a shared Ractor pipeline. ### Steps to Reproduce Run the script below in Ruby 3.4.2 or later. In some runs, one or more threads will hang forever at Ractor.select. The issue reproduces more easily on machines with multiple cores and sufficient memory. ### Expected Behavior All calls to Ractor.select(*workers) should return when any worker Ractor yields a result. All jobs should complete. ### Actual Behavior - In some runs (especially with multiple threads), `Ractor.select` hangs and never returns. - There is no crash or exception; the process remains alive but blocked. - All Ractors and threads are created and started correctly. - This behavior does not occur when only one thread performs the same job sequentially. - It appears that `Ractor.select` does not always behave reliably under concurrent multithreaded usage with heavy Ractor pipelines. Reproducible Example: ```ruby THREADS = 4 JOBS_PER_THREAD = 10 ARRAY_SIZE = 10_000 def ractor_job(job_count, array_size) pipe = Ractor.new do loop { Ractor.yield Ractor.receive } end workers = (1..4).map do Ractor.new(pipe) do |pipe| while job = pipe.take result = job.map { |x| x * 2 }.sum Ractor.yield result end end end jobs = Array.new(job_count) { Array.new(array_size) { rand(1000) } } jobs.each { |job| pipe.send(job) } results = [] jobs.size.times do puts "Waiting for results..." _ractor, result = Ractor.select(*workers) puts "Received result: #{result}" results << result end results end threads = [] THREADS.times do threads << Thread.new do ractor_job(JOBS_PER_THREAD, ARRAY_SIZE) end end threads.each(&:join) puts "All threads finished." ``` -- https://bugs.ruby-lang.org/
participants (2)
-
arino.tamada
-
luke-gru (Luke Gruber)