[ruby-core:119683] [Ruby master Feature#20861] Add an environment variable for tuning the default thread quantum
 
            Issue #20861 has been reported by tenderlovemaking (Aaron Patterson). ---------------------------------------- Feature #20861: Add an environment variable for tuning the default thread quantum https://bugs.ruby-lang.org/issues/20861 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- The default thread quantum is currently [hard coded at 100ms](https://github.com/ruby/ruby/blob/c7708d22c33040a74ea7ac683bf7407d3759edfe/t...). This can impact multithreaded systems that are trying to process Ruby level CPU bound work at the same time as IO work. I would like to add an environment variable `RUBY_THREAD_DEFAULT_QUANTUM_MS` that allows users to specify the default thread quantum (in milliseconds) via an environment variable. It defaults to our current default of 100ms. I've submitted the patch [here](https://github.com/ruby/ruby/pull/11981). Here is a Ruby program to demonstrate the problem: ```ruby def measure x = Process.clock_gettime(Process::CLOCK_MONOTONIC) yield Process.clock_gettime(Process::CLOCK_MONOTONIC) - x end def fib(n) if n < 2 n else fib(n-2) + fib(n-1) end end # find fib that takes ~500ms fib_i = 50.times.find { |i| measure { fib(i) } >= 0.05 } sleep_i = measure { fib(fib_i) } threads = [ Thread.new { 100.times { sleep(sleep_i) # sometimes stalled waiting for fib's quantum to finish } puts "done 1" }, Thread.new { 100.times { fib(fib_i) }; puts "done 2" }, ] # We expect the total time to be about 100 * sleep_i (~5 seconds) because # theoretically the sleep thread could be done nearly completely in parallel to # the fib thread. # # But because the `sleep` thread is iterating over the sleep call, it must wait # for the `fib` thread to complete its quantum, before it can start the next iteration. # # This means each sleep iteration could take up to `sleep_i + 100ms` # # We're calling that stalled time "waste" total = measure { threads.each(&:join) } waste = total - (sleep_i * 100) p TOTAL: total, WASTE: waste ``` The program has two threads. One thread is using CPU time by computing `fib` in a loop. The other thread is simulating IO time by calling `sleep` in a loop. When the `sleep` call completes, it can stall, waiting for the quantum in the fib thread to expire. That means that each iteration on sleep can actually take `sleep time + thread quantum`, or in this case ~600ms when we expected it to only take ~500ms. Ideally, the above program would take `500ms * 100` since all `sleep` calls should be able to execute in parallel with the `fib` calls. Of course this isn't true because the sleep thread must acquire the GVL before it can continue the next iteration, so there will always be _some_ overhead. This feature is for allowing people to tune that overhead. If we run this program with the default quantum the output looks like this: ``` $ ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T14:49:50Z quantum-computing c7708d22c3) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 12.672821999993175, WASTE: 4.960721996147186} ``` The output shows that our program spent about 5 seconds stalled, waiting to acquire the GVL. With this patch we can lower the default quantum, and the output is like this: ``` $ RUBY_THREAD_DEFAULT_QUANTUM_MS=10 ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T22:06:35Z quantum-computing 087500643d) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 8.898526000091806, WASTE: 1.4168260043952614} ``` Specifying the ENV to change the quantum to 10ms lowered our waste in the program to ~1.4 seconds. It's common for web applications to do mixed CPU and IO bound tasks in threads (see the Puma webserver), so it would be great if there was a way to customize the thread quantum depending on your application's workload. -- https://bugs.ruby-lang.org/
 
            Issue #20861 has been updated by tenderlovemaking (Aaron Patterson). I think I did my math a little wrong. It should be 50ms rather than 500ms, but the measurements are correct. Specifying the 10ms quantum reduces waste by ~3.5 seconds. 😅 ---------------------------------------- Feature #20861: Add an environment variable for tuning the default thread quantum https://bugs.ruby-lang.org/issues/20861#change-110343 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- The default thread quantum is currently [hard coded at 100ms](https://github.com/ruby/ruby/blob/c7708d22c33040a74ea7ac683bf7407d3759edfe/t...). This can impact multithreaded systems that are trying to process Ruby level CPU bound work at the same time as IO work. I would like to add an environment variable `RUBY_THREAD_DEFAULT_QUANTUM_MS` that allows users to specify the default thread quantum (in milliseconds) via an environment variable. It defaults to our current default of 100ms. I've submitted the patch [here](https://github.com/ruby/ruby/pull/11981). Here is a Ruby program to demonstrate the problem: ```ruby def measure x = Process.clock_gettime(Process::CLOCK_MONOTONIC) yield Process.clock_gettime(Process::CLOCK_MONOTONIC) - x end def fib(n) if n < 2 n else fib(n-2) + fib(n-1) end end # find fib that takes ~500ms fib_i = 50.times.find { |i| measure { fib(i) } >= 0.05 } sleep_i = measure { fib(fib_i) } threads = [ Thread.new { 100.times { sleep(sleep_i) # sometimes stalled waiting for fib's quantum to finish } puts "done 1" }, Thread.new { 100.times { fib(fib_i) }; puts "done 2" }, ] # We expect the total time to be about 100 * sleep_i (~5 seconds) because # theoretically the sleep thread could be done nearly completely in parallel to # the fib thread. # # But because the `sleep` thread is iterating over the sleep call, it must wait # for the `fib` thread to complete its quantum, before it can start the next iteration. # # This means each sleep iteration could take up to `sleep_i + 100ms` # # We're calling that stalled time "waste" total = measure { threads.each(&:join) } waste = total - (sleep_i * 100) p TOTAL: total, WASTE: waste ``` The program has two threads. One thread is using CPU time by computing `fib` in a loop. The other thread is simulating IO time by calling `sleep` in a loop. When the `sleep` call completes, it can stall, waiting for the quantum in the fib thread to expire. That means that each iteration on sleep can actually take `sleep time + thread quantum`, or in this case ~600ms when we expected it to only take ~500ms. Ideally, the above program would take `500ms * 100` since all `sleep` calls should be able to execute in parallel with the `fib` calls. Of course this isn't true because the sleep thread must acquire the GVL before it can continue the next iteration, so there will always be _some_ overhead. This feature is for allowing people to tune that overhead. If we run this program with the default quantum the output looks like this: ``` $ ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T14:49:50Z quantum-computing c7708d22c3) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 12.672821999993175, WASTE: 4.960721996147186} ``` The output shows that our program spent about 5 seconds stalled, waiting to acquire the GVL. With this patch we can lower the default quantum, and the output is like this: ``` $ RUBY_THREAD_DEFAULT_QUANTUM_MS=10 ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T22:06:35Z quantum-computing 087500643d) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 8.898526000091806, WASTE: 1.4168260043952614} ``` Specifying the ENV to change the quantum to 10ms lowered our waste in the program to ~1.4 seconds. It's common for web applications to do mixed CPU and IO bound tasks in threads (see the Puma webserver), so it would be great if there was a way to customize the thread quantum depending on your application's workload. -- https://bugs.ruby-lang.org/
 
            Issue #20861 has been updated by ioquatix (Samuel Williams). This can be useful, so I agree with adding it. For the sake of providing feedback, some thoughts: 1. Does the default value of 100ms make sense? Should we update the default too? 2. Should we mark this as experimental in the first pass, e.g. emit a warning if it is set? Maybe we can experiment with it in 3.4 and commit in 3.5 if it looks good? (I'm also okay with your proposal as is - just food for thought). 3. Is there a way we could automatically tune this number according to the workload? For example, could we measure the unfairness of the scheduler and adjust accordingly? Regarding (3), I think it’s fantastic to highlight this issue and provide a way to address it. However, IMHO, most users may not be familiar with tuning this setting effectively. A fixed value might perform well in some scenarios but may not hold up in others. Ideally, Ruby could adaptively determine the optimal value for the best performance, sparing users the need to tweak it themselves. In other words, if it were possible to determine this value automatically, this proposal might be less beneficial—or even detrimental if a fixed value were worse than an automatic adjustment system. ---------------------------------------- Feature #20861: Add an environment variable for tuning the default thread quantum https://bugs.ruby-lang.org/issues/20861#change-110346 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- The default thread quantum is currently [hard coded at 100ms](https://github.com/ruby/ruby/blob/c7708d22c33040a74ea7ac683bf7407d3759edfe/t...). This can impact multithreaded systems that are trying to process Ruby level CPU bound work at the same time as IO work. I would like to add an environment variable `RUBY_THREAD_DEFAULT_QUANTUM_MS` that allows users to specify the default thread quantum (in milliseconds) via an environment variable. It defaults to our current default of 100ms. I've submitted the patch [here](https://github.com/ruby/ruby/pull/11981). Here is a Ruby program to demonstrate the problem: ```ruby def measure x = Process.clock_gettime(Process::CLOCK_MONOTONIC) yield Process.clock_gettime(Process::CLOCK_MONOTONIC) - x end def fib(n) if n < 2 n else fib(n-2) + fib(n-1) end end # find fib that takes ~500ms fib_i = 50.times.find { |i| measure { fib(i) } >= 0.05 } sleep_i = measure { fib(fib_i) } threads = [ Thread.new { 100.times { sleep(sleep_i) # sometimes stalled waiting for fib's quantum to finish } puts "done 1" }, Thread.new { 100.times { fib(fib_i) }; puts "done 2" }, ] # We expect the total time to be about 100 * sleep_i (~5 seconds) because # theoretically the sleep thread could be done nearly completely in parallel to # the fib thread. # # But because the `sleep` thread is iterating over the sleep call, it must wait # for the `fib` thread to complete its quantum, before it can start the next iteration. # # This means each sleep iteration could take up to `sleep_i + 100ms` # # We're calling that stalled time "waste" total = measure { threads.each(&:join) } waste = total - (sleep_i * 100) p TOTAL: total, WASTE: waste ``` The program has two threads. One thread is using CPU time by computing `fib` in a loop. The other thread is simulating IO time by calling `sleep` in a loop. When the `sleep` call completes, it can stall, waiting for the quantum in the fib thread to expire. That means that each iteration on sleep can actually take `sleep time + thread quantum`, or in this case ~600ms when we expected it to only take ~500ms. Ideally, the above program would take `500ms * 100` since all `sleep` calls should be able to execute in parallel with the `fib` calls. Of course this isn't true because the sleep thread must acquire the GVL before it can continue the next iteration, so there will always be _some_ overhead. This feature is for allowing people to tune that overhead. If we run this program with the default quantum the output looks like this: ``` $ ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T14:49:50Z quantum-computing c7708d22c3) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 12.672821999993175, WASTE: 4.960721996147186} ``` The output shows that our program spent about 5 seconds stalled, waiting to acquire the GVL. With this patch we can lower the default quantum, and the output is like this: ``` $ RUBY_THREAD_DEFAULT_QUANTUM_MS=10 ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T22:06:35Z quantum-computing 087500643d) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 8.898526000091806, WASTE: 1.4168260043952614} ``` Specifying the ENV to change the quantum to 10ms lowered our waste in the program to ~1.4 seconds. It's common for web applications to do mixed CPU and IO bound tasks in threads (see the Puma webserver), so it would be great if there was a way to customize the thread quantum depending on your application's workload. -- https://bugs.ruby-lang.org/
 
            Issue #20861 has been updated by nobu (Nobuyoshi Nakada). Your patch misses pthread_win32.c, and if we add a new environment variable, man/ruby.1 must be updated as well. ---------------------------------------- Feature #20861: Add an environment variable for tuning the default thread quantum https://bugs.ruby-lang.org/issues/20861#change-110347 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- The default thread quantum is currently [hard coded at 100ms](https://github.com/ruby/ruby/blob/c7708d22c33040a74ea7ac683bf7407d3759edfe/t...). This can impact multithreaded systems that are trying to process Ruby level CPU bound work at the same time as IO work. I would like to add an environment variable `RUBY_THREAD_DEFAULT_QUANTUM_MS` that allows users to specify the default thread quantum (in milliseconds) via an environment variable. It defaults to our current default of 100ms. I've submitted the patch [here](https://github.com/ruby/ruby/pull/11981). Here is a Ruby program to demonstrate the problem: ```ruby def measure x = Process.clock_gettime(Process::CLOCK_MONOTONIC) yield Process.clock_gettime(Process::CLOCK_MONOTONIC) - x end def fib(n) if n < 2 n else fib(n-2) + fib(n-1) end end # find fib that takes ~500ms fib_i = 50.times.find { |i| measure { fib(i) } >= 0.05 } sleep_i = measure { fib(fib_i) } threads = [ Thread.new { 100.times { sleep(sleep_i) # sometimes stalled waiting for fib's quantum to finish } puts "done 1" }, Thread.new { 100.times { fib(fib_i) }; puts "done 2" }, ] # We expect the total time to be about 100 * sleep_i (~5 seconds) because # theoretically the sleep thread could be done nearly completely in parallel to # the fib thread. # # But because the `sleep` thread is iterating over the sleep call, it must wait # for the `fib` thread to complete its quantum, before it can start the next iteration. # # This means each sleep iteration could take up to `sleep_i + 100ms` # # We're calling that stalled time "waste" total = measure { threads.each(&:join) } waste = total - (sleep_i * 100) p TOTAL: total, WASTE: waste ``` The program has two threads. One thread is using CPU time by computing `fib` in a loop. The other thread is simulating IO time by calling `sleep` in a loop. When the `sleep` call completes, it can stall, waiting for the quantum in the fib thread to expire. That means that each iteration on sleep can actually take `sleep time + thread quantum`, or in this case ~600ms when we expected it to only take ~500ms. Ideally, the above program would take `500ms * 100` since all `sleep` calls should be able to execute in parallel with the `fib` calls. Of course this isn't true because the sleep thread must acquire the GVL before it can continue the next iteration, so there will always be _some_ overhead. This feature is for allowing people to tune that overhead. If we run this program with the default quantum the output looks like this: ``` $ ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T14:49:50Z quantum-computing c7708d22c3) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 12.672821999993175, WASTE: 4.960721996147186} ``` The output shows that our program spent about 5 seconds stalled, waiting to acquire the GVL. With this patch we can lower the default quantum, and the output is like this: ``` $ RUBY_THREAD_DEFAULT_QUANTUM_MS=10 ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T22:06:35Z quantum-computing 087500643d) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 8.898526000091806, WASTE: 1.4168260043952614} ``` Specifying the ENV to change the quantum to 10ms lowered our waste in the program to ~1.4 seconds. It's common for web applications to do mixed CPU and IO bound tasks in threads (see the Puma webserver), so it would be great if there was a way to customize the thread quantum depending on your application's workload. -- https://bugs.ruby-lang.org/
 
            Issue #20861 has been updated by byroot (Jean Boussier). This was discussed a few times at Kaigi, and IMO a quantum value on a per thread basis would make more sense. ---------------------------------------- Feature #20861: Add an environment variable for tuning the default thread quantum https://bugs.ruby-lang.org/issues/20861#change-110348 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- The default thread quantum is currently [hard coded at 100ms](https://github.com/ruby/ruby/blob/c7708d22c33040a74ea7ac683bf7407d3759edfe/t...). This can impact multithreaded systems that are trying to process Ruby level CPU bound work at the same time as IO work. I would like to add an environment variable `RUBY_THREAD_DEFAULT_QUANTUM_MS` that allows users to specify the default thread quantum (in milliseconds) via an environment variable. It defaults to our current default of 100ms. I've submitted the patch [here](https://github.com/ruby/ruby/pull/11981). Here is a Ruby program to demonstrate the problem: ```ruby def measure x = Process.clock_gettime(Process::CLOCK_MONOTONIC) yield Process.clock_gettime(Process::CLOCK_MONOTONIC) - x end def fib(n) if n < 2 n else fib(n-2) + fib(n-1) end end # find fib that takes ~500ms fib_i = 50.times.find { |i| measure { fib(i) } >= 0.05 } sleep_i = measure { fib(fib_i) } threads = [ Thread.new { 100.times { sleep(sleep_i) # sometimes stalled waiting for fib's quantum to finish } puts "done 1" }, Thread.new { 100.times { fib(fib_i) }; puts "done 2" }, ] # We expect the total time to be about 100 * sleep_i (~5 seconds) because # theoretically the sleep thread could be done nearly completely in parallel to # the fib thread. # # But because the `sleep` thread is iterating over the sleep call, it must wait # for the `fib` thread to complete its quantum, before it can start the next iteration. # # This means each sleep iteration could take up to `sleep_i + 100ms` # # We're calling that stalled time "waste" total = measure { threads.each(&:join) } waste = total - (sleep_i * 100) p TOTAL: total, WASTE: waste ``` The program has two threads. One thread is using CPU time by computing `fib` in a loop. The other thread is simulating IO time by calling `sleep` in a loop. When the `sleep` call completes, it can stall, waiting for the quantum in the fib thread to expire. That means that each iteration on sleep can actually take `sleep time + thread quantum`, or in this case ~600ms when we expected it to only take ~500ms. Ideally, the above program would take `500ms * 100` since all `sleep` calls should be able to execute in parallel with the `fib` calls. Of course this isn't true because the sleep thread must acquire the GVL before it can continue the next iteration, so there will always be _some_ overhead. This feature is for allowing people to tune that overhead. If we run this program with the default quantum the output looks like this: ``` $ ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T14:49:50Z quantum-computing c7708d22c3) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 12.672821999993175, WASTE: 4.960721996147186} ``` The output shows that our program spent about 5 seconds stalled, waiting to acquire the GVL. With this patch we can lower the default quantum, and the output is like this: ``` $ RUBY_THREAD_DEFAULT_QUANTUM_MS=10 ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T22:06:35Z quantum-computing 087500643d) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 8.898526000091806, WASTE: 1.4168260043952614} ``` Specifying the ENV to change the quantum to 10ms lowered our waste in the program to ~1.4 seconds. It's common for web applications to do mixed CPU and IO bound tasks in threads (see the Puma webserver), so it would be great if there was a way to customize the thread quantum depending on your application's workload. -- https://bugs.ruby-lang.org/
 
            Issue #20861 has been updated by tenderlovemaking (Aaron Patterson). ioquatix (Samuel Williams) wrote in #note-2:
This can be useful, so I agree with adding it.
For the sake of providing feedback, some thoughts:
1. Does the default value of 100ms make sense? Should we update the default too? 2. Should we mark this as experimental in the first pass, e.g. emit a warning if it is set? Maybe we can experiment with it in 3.4 and commit in 3.5 if it looks good? (I'm also okay with your proposal as is - just food for thought). 3. Is there a way we could automatically tune this number according to the workload? For example, could we measure the unfairness of the scheduler and adjust accordingly?
Regarding (3), I think it’s fantastic to highlight this issue and provide a way to address it. However, IMHO, most users may not be familiar with tuning this setting effectively. A fixed value might perform well in some scenarios but may not hold up in others. Ideally, Ruby could adaptively determine the optimal value for the best performance, sparing users the need to tweak it themselves. In other words, if it were possible to determine this value automatically, this proposal might be less beneficial—or even detrimental if a fixed value were worse than an automatic adjustment system.
Thank you for the feedback, but I think these points should be addressed as a different feature. I do think the default quantum should be lowered, but it's hard to experiment with other values while this one is hard coded. If people are allowed to experiment with other values, I think we can make a more informed decision about a "good" default. nobu (Nobuyoshi Nakada) wrote in #note-3:
Your patch misses pthread_win32.c, and if we add a new environment variable, man/ruby.1 must be updated as well.
Thanks. I _think_ I've fixed it. byroot (Jean Boussier) wrote in #note-4:
This was discussed a few times at Kaigi, and IMO a quantum value on a per thread basis would make more sense.
Yes, I think it's good to control the quantum on a per thread basis. We can already do that with `Thread#priority` (as you know). Are you thinking something different (like specify quantum in time rather than priority?) Most threaded apps _don't_ set thread priority. Puma, Sidekiq, and even the thread pools in concurrent-ruby don't offer a setting to change the quantum/priority. I think that they _should_ offer this setting, but priority is relative to the default quantum, and there is no way to experiment with the default quantum. I think being able to adjust the default quantum via env var is a good feature because people can experiment _without_ changing any application code, and would be a good way for us to find a better default. ---------------------------------------- Feature #20861: Add an environment variable for tuning the default thread quantum https://bugs.ruby-lang.org/issues/20861#change-110349 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- The default thread quantum is currently [hard coded at 100ms](https://github.com/ruby/ruby/blob/c7708d22c33040a74ea7ac683bf7407d3759edfe/t...). This can impact multithreaded systems that are trying to process Ruby level CPU bound work at the same time as IO work. I would like to add an environment variable `RUBY_THREAD_DEFAULT_QUANTUM_MS` that allows users to specify the default thread quantum (in milliseconds) via an environment variable. It defaults to our current default of 100ms. I've submitted the patch [here](https://github.com/ruby/ruby/pull/11981). Here is a Ruby program to demonstrate the problem: ```ruby def measure x = Process.clock_gettime(Process::CLOCK_MONOTONIC) yield Process.clock_gettime(Process::CLOCK_MONOTONIC) - x end def fib(n) if n < 2 n else fib(n-2) + fib(n-1) end end # find fib that takes ~500ms fib_i = 50.times.find { |i| measure { fib(i) } >= 0.05 } sleep_i = measure { fib(fib_i) } threads = [ Thread.new { 100.times { sleep(sleep_i) # sometimes stalled waiting for fib's quantum to finish } puts "done 1" }, Thread.new { 100.times { fib(fib_i) }; puts "done 2" }, ] # We expect the total time to be about 100 * sleep_i (~5 seconds) because # theoretically the sleep thread could be done nearly completely in parallel to # the fib thread. # # But because the `sleep` thread is iterating over the sleep call, it must wait # for the `fib` thread to complete its quantum, before it can start the next iteration. # # This means each sleep iteration could take up to `sleep_i + 100ms` # # We're calling that stalled time "waste" total = measure { threads.each(&:join) } waste = total - (sleep_i * 100) p TOTAL: total, WASTE: waste ``` The program has two threads. One thread is using CPU time by computing `fib` in a loop. The other thread is simulating IO time by calling `sleep` in a loop. When the `sleep` call completes, it can stall, waiting for the quantum in the fib thread to expire. That means that each iteration on sleep can actually take `sleep time + thread quantum`, or in this case ~600ms when we expected it to only take ~500ms. Ideally, the above program would take `500ms * 100` since all `sleep` calls should be able to execute in parallel with the `fib` calls. Of course this isn't true because the sleep thread must acquire the GVL before it can continue the next iteration, so there will always be _some_ overhead. This feature is for allowing people to tune that overhead. If we run this program with the default quantum the output looks like this: ``` $ ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T14:49:50Z quantum-computing c7708d22c3) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 12.672821999993175, WASTE: 4.960721996147186} ``` The output shows that our program spent about 5 seconds stalled, waiting to acquire the GVL. With this patch we can lower the default quantum, and the output is like this: ``` $ RUBY_THREAD_DEFAULT_QUANTUM_MS=10 ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T22:06:35Z quantum-computing 087500643d) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 8.898526000091806, WASTE: 1.4168260043952614} ``` Specifying the ENV to change the quantum to 10ms lowered our waste in the program to ~1.4 seconds. It's common for web applications to do mixed CPU and IO bound tasks in threads (see the Puma webserver), so it would be great if there was a way to customize the thread quantum depending on your application's workload. -- https://bugs.ruby-lang.org/
 
            Issue #20861 has been updated by byroot (Jean Boussier).
Are you thinking something different (like specify quantum in time rather than priority?)
Yes, quite literally: `Thread.current.quantum = 20` or something like that. Which IMO is much easier to reason about than priorities. And generally you have threads that are meant as "main" threads and some that are meant as "background" and you'd want them to have different quantums, hence why I'd rather skip the environment variable and go straight to an accessor. But perhaps a `Thread.default_quantum = XX` would be needed too. ---------------------------------------- Feature #20861: Add an environment variable for tuning the default thread quantum https://bugs.ruby-lang.org/issues/20861#change-110350 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- The default thread quantum is currently [hard coded at 100ms](https://github.com/ruby/ruby/blob/c7708d22c33040a74ea7ac683bf7407d3759edfe/t...). This can impact multithreaded systems that are trying to process Ruby level CPU bound work at the same time as IO work. I would like to add an environment variable `RUBY_THREAD_DEFAULT_QUANTUM_MS` that allows users to specify the default thread quantum (in milliseconds) via an environment variable. It defaults to our current default of 100ms. I've submitted the patch [here](https://github.com/ruby/ruby/pull/11981). Here is a Ruby program to demonstrate the problem: ```ruby def measure x = Process.clock_gettime(Process::CLOCK_MONOTONIC) yield Process.clock_gettime(Process::CLOCK_MONOTONIC) - x end def fib(n) if n < 2 n else fib(n-2) + fib(n-1) end end # find fib that takes ~500ms fib_i = 50.times.find { |i| measure { fib(i) } >= 0.05 } sleep_i = measure { fib(fib_i) } threads = [ Thread.new { 100.times { sleep(sleep_i) # sometimes stalled waiting for fib's quantum to finish } puts "done 1" }, Thread.new { 100.times { fib(fib_i) }; puts "done 2" }, ] # We expect the total time to be about 100 * sleep_i (~5 seconds) because # theoretically the sleep thread could be done nearly completely in parallel to # the fib thread. # # But because the `sleep` thread is iterating over the sleep call, it must wait # for the `fib` thread to complete its quantum, before it can start the next iteration. # # This means each sleep iteration could take up to `sleep_i + 100ms` # # We're calling that stalled time "waste" total = measure { threads.each(&:join) } waste = total - (sleep_i * 100) p TOTAL: total, WASTE: waste ``` The program has two threads. One thread is using CPU time by computing `fib` in a loop. The other thread is simulating IO time by calling `sleep` in a loop. When the `sleep` call completes, it can stall, waiting for the quantum in the fib thread to expire. That means that each iteration on sleep can actually take `sleep time + thread quantum`, or in this case ~600ms when we expected it to only take ~500ms. Ideally, the above program would take `500ms * 100` since all `sleep` calls should be able to execute in parallel with the `fib` calls. Of course this isn't true because the sleep thread must acquire the GVL before it can continue the next iteration, so there will always be _some_ overhead. This feature is for allowing people to tune that overhead. If we run this program with the default quantum the output looks like this: ``` $ ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T14:49:50Z quantum-computing c7708d22c3) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 12.672821999993175, WASTE: 4.960721996147186} ``` The output shows that our program spent about 5 seconds stalled, waiting to acquire the GVL. With this patch we can lower the default quantum, and the output is like this: ``` $ RUBY_THREAD_DEFAULT_QUANTUM_MS=10 ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T22:06:35Z quantum-computing 087500643d) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 8.898526000091806, WASTE: 1.4168260043952614} ``` Specifying the ENV to change the quantum to 10ms lowered our waste in the program to ~1.4 seconds. It's common for web applications to do mixed CPU and IO bound tasks in threads (see the Puma webserver), so it would be great if there was a way to customize the thread quantum depending on your application's workload. -- https://bugs.ruby-lang.org/
 
            Issue #20861 has been updated by tenderlovemaking (Aaron Patterson). byroot (Jean Boussier) wrote in #note-6:
Are you thinking something different (like specify quantum in time rather than priority?)
Yes, quite literally: `Thread.current.quantum = 20` or something like that.
Which IMO is much easier to reason about than priorities. And generally you have threads that are meant as "main" threads and some that are meant as "background" and you'd want them to have different quantums, hence why I'd rather skip the environment variable and go straight to an accessor.
Makes sense, and I agree.
But perhaps a `Thread.default_quantum = XX` would be needed too.
I think a `Thread.default_quantum=` would be very useful in the same way I mentioned the environment variable being useful. Specifically for apps where you can't specify the quantum, like with Puma / Sidekiq etc. I will make a patches and tickets for these. ---------------------------------------- Feature #20861: Add an environment variable for tuning the default thread quantum https://bugs.ruby-lang.org/issues/20861#change-110351 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- The default thread quantum is currently [hard coded at 100ms](https://github.com/ruby/ruby/blob/c7708d22c33040a74ea7ac683bf7407d3759edfe/t...). This can impact multithreaded systems that are trying to process Ruby level CPU bound work at the same time as IO work. I would like to add an environment variable `RUBY_THREAD_DEFAULT_QUANTUM_MS` that allows users to specify the default thread quantum (in milliseconds) via an environment variable. It defaults to our current default of 100ms. I've submitted the patch [here](https://github.com/ruby/ruby/pull/11981). Here is a Ruby program to demonstrate the problem: ```ruby def measure x = Process.clock_gettime(Process::CLOCK_MONOTONIC) yield Process.clock_gettime(Process::CLOCK_MONOTONIC) - x end def fib(n) if n < 2 n else fib(n-2) + fib(n-1) end end # find fib that takes ~500ms fib_i = 50.times.find { |i| measure { fib(i) } >= 0.05 } sleep_i = measure { fib(fib_i) } threads = [ Thread.new { 100.times { sleep(sleep_i) # sometimes stalled waiting for fib's quantum to finish } puts "done 1" }, Thread.new { 100.times { fib(fib_i) }; puts "done 2" }, ] # We expect the total time to be about 100 * sleep_i (~5 seconds) because # theoretically the sleep thread could be done nearly completely in parallel to # the fib thread. # # But because the `sleep` thread is iterating over the sleep call, it must wait # for the `fib` thread to complete its quantum, before it can start the next iteration. # # This means each sleep iteration could take up to `sleep_i + 100ms` # # We're calling that stalled time "waste" total = measure { threads.each(&:join) } waste = total - (sleep_i * 100) p TOTAL: total, WASTE: waste ``` The program has two threads. One thread is using CPU time by computing `fib` in a loop. The other thread is simulating IO time by calling `sleep` in a loop. When the `sleep` call completes, it can stall, waiting for the quantum in the fib thread to expire. That means that each iteration on sleep can actually take `sleep time + thread quantum`, or in this case ~600ms when we expected it to only take ~500ms. Ideally, the above program would take `500ms * 100` since all `sleep` calls should be able to execute in parallel with the `fib` calls. Of course this isn't true because the sleep thread must acquire the GVL before it can continue the next iteration, so there will always be _some_ overhead. This feature is for allowing people to tune that overhead. If we run this program with the default quantum the output looks like this: ``` $ ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T14:49:50Z quantum-computing c7708d22c3) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 12.672821999993175, WASTE: 4.960721996147186} ``` The output shows that our program spent about 5 seconds stalled, waiting to acquire the GVL. With this patch we can lower the default quantum, and the output is like this: ``` $ RUBY_THREAD_DEFAULT_QUANTUM_MS=10 ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T22:06:35Z quantum-computing 087500643d) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 8.898526000091806, WASTE: 1.4168260043952614} ``` Specifying the ENV to change the quantum to 10ms lowered our waste in the program to ~1.4 seconds. It's common for web applications to do mixed CPU and IO bound tasks in threads (see the Puma webserver), so it would be great if there was a way to customize the thread quantum depending on your application's workload. -- https://bugs.ruby-lang.org/
 
            Issue #20861 has been updated by jhawthorn (John Hawthorn). I don't think we should expose the quantum per-thread inside Ruby. I worry it will prevent future improvements. The fact that Thread scheduling and priorities are currently done by giving a shorter/longer quantum is an implementation detail that could (and I'd like to) change. Thread `priority` is good *because* it is more abstract, which gives flexibility on implementation. Similarly I don't love `Thread.default_quantum=` as it's implementation specific - I wouldn't expect it to make sense for JRuby/TruffleRuby for example, and it's possible CRuby could have a different implementation in the future. Though I don't feel as strongly. I think an environment variable is best as it's very similar to how we allow tuning the garbage collector via environment variables. In the future if end up being able to remove this, that's much easier/safer with an environment variable. ---------------------------------------- Feature #20861: Add an environment variable for tuning the default thread quantum https://bugs.ruby-lang.org/issues/20861#change-110357 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- The default thread quantum is currently [hard coded at 100ms](https://github.com/ruby/ruby/blob/c7708d22c33040a74ea7ac683bf7407d3759edfe/t...). This can impact multithreaded systems that are trying to process Ruby level CPU bound work at the same time as IO work. I would like to add an environment variable `RUBY_THREAD_DEFAULT_QUANTUM_MS` that allows users to specify the default thread quantum (in milliseconds) via an environment variable. It defaults to our current default of 100ms. I've submitted the patch [here](https://github.com/ruby/ruby/pull/11981). Here is a Ruby program to demonstrate the problem: ```ruby def measure x = Process.clock_gettime(Process::CLOCK_MONOTONIC) yield Process.clock_gettime(Process::CLOCK_MONOTONIC) - x end def fib(n) if n < 2 n else fib(n-2) + fib(n-1) end end # find fib that takes ~500ms fib_i = 50.times.find { |i| measure { fib(i) } >= 0.05 } sleep_i = measure { fib(fib_i) } threads = [ Thread.new { 100.times { sleep(sleep_i) # sometimes stalled waiting for fib's quantum to finish } puts "done 1" }, Thread.new { 100.times { fib(fib_i) }; puts "done 2" }, ] # We expect the total time to be about 100 * sleep_i (~5 seconds) because # theoretically the sleep thread could be done nearly completely in parallel to # the fib thread. # # But because the `sleep` thread is iterating over the sleep call, it must wait # for the `fib` thread to complete its quantum, before it can start the next iteration. # # This means each sleep iteration could take up to `sleep_i + 100ms` # # We're calling that stalled time "waste" total = measure { threads.each(&:join) } waste = total - (sleep_i * 100) p TOTAL: total, WASTE: waste ``` The program has two threads. One thread is using CPU time by computing `fib` in a loop. The other thread is simulating IO time by calling `sleep` in a loop. When the `sleep` call completes, it can stall, waiting for the quantum in the fib thread to expire. That means that each iteration on sleep can actually take `sleep time + thread quantum`, or in this case ~600ms when we expected it to only take ~500ms. Ideally, the above program would take `500ms * 100` since all `sleep` calls should be able to execute in parallel with the `fib` calls. Of course this isn't true because the sleep thread must acquire the GVL before it can continue the next iteration, so there will always be _some_ overhead. This feature is for allowing people to tune that overhead. If we run this program with the default quantum the output looks like this: ``` $ ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T14:49:50Z quantum-computing c7708d22c3) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 12.672821999993175, WASTE: 4.960721996147186} ``` The output shows that our program spent about 5 seconds stalled, waiting to acquire the GVL. With this patch we can lower the default quantum, and the output is like this: ``` $ RUBY_THREAD_DEFAULT_QUANTUM_MS=10 ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T22:06:35Z quantum-computing 087500643d) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 8.898526000091806, WASTE: 1.4168260043952614} ``` Specifying the ENV to change the quantum to 10ms lowered our waste in the program to ~1.4 seconds. It's common for web applications to do mixed CPU and IO bound tasks in threads (see the Puma webserver), so it would be great if there was a way to customize the thread quantum depending on your application's workload. -- https://bugs.ruby-lang.org/
 
            Issue #20861 has been updated by byroot (Jean Boussier).
Thread priority is good because it is more abstract, which gives flexibility on implementation.
Then we should document what it does, even if stated that the behavior may change between version. Because in its current state it's pretty much impossible to use it. ---------------------------------------- Feature #20861: Add an environment variable for tuning the default thread quantum https://bugs.ruby-lang.org/issues/20861#change-110358 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- The default thread quantum is currently [hard coded at 100ms](https://github.com/ruby/ruby/blob/c7708d22c33040a74ea7ac683bf7407d3759edfe/t...). This can impact multithreaded systems that are trying to process Ruby level CPU bound work at the same time as IO work. I would like to add an environment variable `RUBY_THREAD_DEFAULT_QUANTUM_MS` that allows users to specify the default thread quantum (in milliseconds) via an environment variable. It defaults to our current default of 100ms. I've submitted the patch [here](https://github.com/ruby/ruby/pull/11981). Here is a Ruby program to demonstrate the problem: ```ruby def measure x = Process.clock_gettime(Process::CLOCK_MONOTONIC) yield Process.clock_gettime(Process::CLOCK_MONOTONIC) - x end def fib(n) if n < 2 n else fib(n-2) + fib(n-1) end end # find fib that takes ~500ms fib_i = 50.times.find { |i| measure { fib(i) } >= 0.05 } sleep_i = measure { fib(fib_i) } threads = [ Thread.new { 100.times { sleep(sleep_i) # sometimes stalled waiting for fib's quantum to finish } puts "done 1" }, Thread.new { 100.times { fib(fib_i) }; puts "done 2" }, ] # We expect the total time to be about 100 * sleep_i (~5 seconds) because # theoretically the sleep thread could be done nearly completely in parallel to # the fib thread. # # But because the `sleep` thread is iterating over the sleep call, it must wait # for the `fib` thread to complete its quantum, before it can start the next iteration. # # This means each sleep iteration could take up to `sleep_i + 100ms` # # We're calling that stalled time "waste" total = measure { threads.each(&:join) } waste = total - (sleep_i * 100) p TOTAL: total, WASTE: waste ``` The program has two threads. One thread is using CPU time by computing `fib` in a loop. The other thread is simulating IO time by calling `sleep` in a loop. When the `sleep` call completes, it can stall, waiting for the quantum in the fib thread to expire. That means that each iteration on sleep can actually take `sleep time + thread quantum`, or in this case ~600ms when we expected it to only take ~500ms. Ideally, the above program would take `500ms * 100` since all `sleep` calls should be able to execute in parallel with the `fib` calls. Of course this isn't true because the sleep thread must acquire the GVL before it can continue the next iteration, so there will always be _some_ overhead. This feature is for allowing people to tune that overhead. If we run this program with the default quantum the output looks like this: ``` $ ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T14:49:50Z quantum-computing c7708d22c3) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 12.672821999993175, WASTE: 4.960721996147186} ``` The output shows that our program spent about 5 seconds stalled, waiting to acquire the GVL. With this patch we can lower the default quantum, and the output is like this: ``` $ RUBY_THREAD_DEFAULT_QUANTUM_MS=10 ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T22:06:35Z quantum-computing 087500643d) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 8.898526000091806, WASTE: 1.4168260043952614} ``` Specifying the ENV to change the quantum to 10ms lowered our waste in the program to ~1.4 seconds. It's common for web applications to do mixed CPU and IO bound tasks in threads (see the Puma webserver), so it would be great if there was a way to customize the thread quantum depending on your application's workload. -- https://bugs.ruby-lang.org/
 
            Issue #20861 has been updated by ivoanjo (Ivo Anjo). I think this is really nice first step and worth having to enable experimentation as well. Having said that, I'm not sure there's ever going to be a good value single (even per-thread). It may be worth considering on the medium/long-term the introduction of an actual scheduling algorithm, even if a very simple one. Such an algorithm could make sure to respect priorities, take into account sleep/wake up timelines, and introduce (some amount of) fairness. ---------------------------------------- Feature #20861: Add an environment variable for tuning the default thread quantum https://bugs.ruby-lang.org/issues/20861#change-110388 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- The default thread quantum is currently [hard coded at 100ms](https://github.com/ruby/ruby/blob/c7708d22c33040a74ea7ac683bf7407d3759edfe/t...). This can impact multithreaded systems that are trying to process Ruby level CPU bound work at the same time as IO work. I would like to add an environment variable `RUBY_THREAD_DEFAULT_QUANTUM_MS` that allows users to specify the default thread quantum (in milliseconds) via an environment variable. It defaults to our current default of 100ms. I've submitted the patch [here](https://github.com/ruby/ruby/pull/11981). Here is a Ruby program to demonstrate the problem: ```ruby def measure x = Process.clock_gettime(Process::CLOCK_MONOTONIC) yield Process.clock_gettime(Process::CLOCK_MONOTONIC) - x end def fib(n) if n < 2 n else fib(n-2) + fib(n-1) end end # find fib that takes ~500ms fib_i = 50.times.find { |i| measure { fib(i) } >= 0.05 } sleep_i = measure { fib(fib_i) } threads = [ Thread.new { 100.times { sleep(sleep_i) # sometimes stalled waiting for fib's quantum to finish } puts "done 1" }, Thread.new { 100.times { fib(fib_i) }; puts "done 2" }, ] # We expect the total time to be about 100 * sleep_i (~5 seconds) because # theoretically the sleep thread could be done nearly completely in parallel to # the fib thread. # # But because the `sleep` thread is iterating over the sleep call, it must wait # for the `fib` thread to complete its quantum, before it can start the next iteration. # # This means each sleep iteration could take up to `sleep_i + 100ms` # # We're calling that stalled time "waste" total = measure { threads.each(&:join) } waste = total - (sleep_i * 100) p TOTAL: total, WASTE: waste ``` The program has two threads. One thread is using CPU time by computing `fib` in a loop. The other thread is simulating IO time by calling `sleep` in a loop. When the `sleep` call completes, it can stall, waiting for the quantum in the fib thread to expire. That means that each iteration on sleep can actually take `sleep time + thread quantum`, or in this case ~600ms when we expected it to only take ~500ms. Ideally, the above program would take `500ms * 100` since all `sleep` calls should be able to execute in parallel with the `fib` calls. Of course this isn't true because the sleep thread must acquire the GVL before it can continue the next iteration, so there will always be _some_ overhead. This feature is for allowing people to tune that overhead. If we run this program with the default quantum the output looks like this: ``` $ ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T14:49:50Z quantum-computing c7708d22c3) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 12.672821999993175, WASTE: 4.960721996147186} ``` The output shows that our program spent about 5 seconds stalled, waiting to acquire the GVL. With this patch we can lower the default quantum, and the output is like this: ``` $ RUBY_THREAD_DEFAULT_QUANTUM_MS=10 ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T22:06:35Z quantum-computing 087500643d) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 8.898526000091806, WASTE: 1.4168260043952614} ``` Specifying the ENV to change the quantum to 10ms lowered our waste in the program to ~1.4 seconds. It's common for web applications to do mixed CPU and IO bound tasks in threads (see the Puma webserver), so it would be great if there was a way to customize the thread quantum depending on your application's workload. -- https://bugs.ruby-lang.org/
 
            Issue #20861 has been updated by ko1 (Koichi Sasada). This example doesn't make sense for the real app because nobody repeat sleeping for the constant. Do you have any example similar to the example? But the current 100ms doesn't strong opinion (it equals to Linux time quantum), so I'm okay to make it configurable (I didn't check the patch though). ---------------------------------------- Feature #20861: Add an environment variable for tuning the default thread quantum https://bugs.ruby-lang.org/issues/20861#change-110389 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- The default thread quantum is currently [hard coded at 100ms](https://github.com/ruby/ruby/blob/c7708d22c33040a74ea7ac683bf7407d3759edfe/t...). This can impact multithreaded systems that are trying to process Ruby level CPU bound work at the same time as IO work. I would like to add an environment variable `RUBY_THREAD_DEFAULT_QUANTUM_MS` that allows users to specify the default thread quantum (in milliseconds) via an environment variable. It defaults to our current default of 100ms. I've submitted the patch [here](https://github.com/ruby/ruby/pull/11981). Here is a Ruby program to demonstrate the problem: ```ruby def measure x = Process.clock_gettime(Process::CLOCK_MONOTONIC) yield Process.clock_gettime(Process::CLOCK_MONOTONIC) - x end def fib(n) if n < 2 n else fib(n-2) + fib(n-1) end end # find fib that takes ~500ms fib_i = 50.times.find { |i| measure { fib(i) } >= 0.05 } sleep_i = measure { fib(fib_i) } threads = [ Thread.new { 100.times { sleep(sleep_i) # sometimes stalled waiting for fib's quantum to finish } puts "done 1" }, Thread.new { 100.times { fib(fib_i) }; puts "done 2" }, ] # We expect the total time to be about 100 * sleep_i (~5 seconds) because # theoretically the sleep thread could be done nearly completely in parallel to # the fib thread. # # But because the `sleep` thread is iterating over the sleep call, it must wait # for the `fib` thread to complete its quantum, before it can start the next iteration. # # This means each sleep iteration could take up to `sleep_i + 100ms` # # We're calling that stalled time "waste" total = measure { threads.each(&:join) } waste = total - (sleep_i * 100) p TOTAL: total, WASTE: waste ``` The program has two threads. One thread is using CPU time by computing `fib` in a loop. The other thread is simulating IO time by calling `sleep` in a loop. When the `sleep` call completes, it can stall, waiting for the quantum in the fib thread to expire. That means that each iteration on sleep can actually take `sleep time + thread quantum`, or in this case ~600ms when we expected it to only take ~500ms. Ideally, the above program would take `500ms * 100` since all `sleep` calls should be able to execute in parallel with the `fib` calls. Of course this isn't true because the sleep thread must acquire the GVL before it can continue the next iteration, so there will always be _some_ overhead. This feature is for allowing people to tune that overhead. If we run this program with the default quantum the output looks like this: ``` $ ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T14:49:50Z quantum-computing c7708d22c3) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 12.672821999993175, WASTE: 4.960721996147186} ``` The output shows that our program spent about 5 seconds stalled, waiting to acquire the GVL. With this patch we can lower the default quantum, and the output is like this: ``` $ RUBY_THREAD_DEFAULT_QUANTUM_MS=10 ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T22:06:35Z quantum-computing 087500643d) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 8.898526000091806, WASTE: 1.4168260043952614} ``` Specifying the ENV to change the quantum to 10ms lowered our waste in the program to ~1.4 seconds. It's common for web applications to do mixed CPU and IO bound tasks in threads (see the Puma webserver), so it would be great if there was a way to customize the thread quantum depending on your application's workload. -- https://bugs.ruby-lang.org/
 
            Issue #20861 has been updated by ivoanjo (Ivo Anjo). https://github.com/ivoanjo/gvl-tracing/blob/master/examples/rubykaigi2023/rk... (from [here](https://ivoanjo.me/rubykaigi2023/#:~:text=Observing%20the%20GVL%20%236)) is probably a good example without `sleep` -- the thread doing I/O keeps being penalized because it uses very little time, but needs to wait for the full 100ms period. ---------------------------------------- Feature #20861: Add an environment variable for tuning the default thread quantum https://bugs.ruby-lang.org/issues/20861#change-110391 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- The default thread quantum is currently [hard coded at 100ms](https://github.com/ruby/ruby/blob/c7708d22c33040a74ea7ac683bf7407d3759edfe/t...). This can impact multithreaded systems that are trying to process Ruby level CPU bound work at the same time as IO work. I would like to add an environment variable `RUBY_THREAD_DEFAULT_QUANTUM_MS` that allows users to specify the default thread quantum (in milliseconds) via an environment variable. It defaults to our current default of 100ms. I've submitted the patch [here](https://github.com/ruby/ruby/pull/11981). Here is a Ruby program to demonstrate the problem: ```ruby def measure x = Process.clock_gettime(Process::CLOCK_MONOTONIC) yield Process.clock_gettime(Process::CLOCK_MONOTONIC) - x end def fib(n) if n < 2 n else fib(n-2) + fib(n-1) end end # find fib that takes ~500ms fib_i = 50.times.find { |i| measure { fib(i) } >= 0.05 } sleep_i = measure { fib(fib_i) } threads = [ Thread.new { 100.times { sleep(sleep_i) # sometimes stalled waiting for fib's quantum to finish } puts "done 1" }, Thread.new { 100.times { fib(fib_i) }; puts "done 2" }, ] # We expect the total time to be about 100 * sleep_i (~5 seconds) because # theoretically the sleep thread could be done nearly completely in parallel to # the fib thread. # # But because the `sleep` thread is iterating over the sleep call, it must wait # for the `fib` thread to complete its quantum, before it can start the next iteration. # # This means each sleep iteration could take up to `sleep_i + 100ms` # # We're calling that stalled time "waste" total = measure { threads.each(&:join) } waste = total - (sleep_i * 100) p TOTAL: total, WASTE: waste ``` The program has two threads. One thread is using CPU time by computing `fib` in a loop. The other thread is simulating IO time by calling `sleep` in a loop. When the `sleep` call completes, it can stall, waiting for the quantum in the fib thread to expire. That means that each iteration on sleep can actually take `sleep time + thread quantum`, or in this case ~600ms when we expected it to only take ~500ms. Ideally, the above program would take `500ms * 100` since all `sleep` calls should be able to execute in parallel with the `fib` calls. Of course this isn't true because the sleep thread must acquire the GVL before it can continue the next iteration, so there will always be _some_ overhead. This feature is for allowing people to tune that overhead. If we run this program with the default quantum the output looks like this: ``` $ ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T14:49:50Z quantum-computing c7708d22c3) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 12.672821999993175, WASTE: 4.960721996147186} ``` The output shows that our program spent about 5 seconds stalled, waiting to acquire the GVL. With this patch we can lower the default quantum, and the output is like this: ``` $ RUBY_THREAD_DEFAULT_QUANTUM_MS=10 ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T22:06:35Z quantum-computing 087500643d) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 8.898526000091806, WASTE: 1.4168260043952614} ``` Specifying the ENV to change the quantum to 10ms lowered our waste in the program to ~1.4 seconds. It's common for web applications to do mixed CPU and IO bound tasks in threads (see the Puma webserver), so it would be great if there was a way to customize the thread quantum depending on your application's workload. -- https://bugs.ruby-lang.org/
 
            Issue #20861 has been updated by tenderlovemaking (Aaron Patterson). ko1 (Koichi Sasada) wrote in #note-11:
This example doesn't make sense for the real app because nobody repeat sleeping for the constant. Do you have any example similar to the example?
But the current 100ms is not based on strong opinion (it equals to Linux time quantum), so I'm okay to make it configurable (I didn't check the patch though).
As @ivoanjo posted, it's usually some kind of IO mixed with CPU. For example using `Net::HTTP` and the response is slow, or even making database queries. The `sleep` example is just a simple way to demonstrate the impact of the quantum on IO calls. ---------------------------------------- Feature #20861: Add an environment variable for tuning the default thread quantum https://bugs.ruby-lang.org/issues/20861#change-110654 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- The default thread quantum is currently [hard coded at 100ms](https://github.com/ruby/ruby/blob/c7708d22c33040a74ea7ac683bf7407d3759edfe/t...). This can impact multithreaded systems that are trying to process Ruby level CPU bound work at the same time as IO work. I would like to add an environment variable `RUBY_THREAD_DEFAULT_QUANTUM_MS` that allows users to specify the default thread quantum (in milliseconds) via an environment variable. It defaults to our current default of 100ms. I've submitted the patch [here](https://github.com/ruby/ruby/pull/11981). Here is a Ruby program to demonstrate the problem: ```ruby def measure x = Process.clock_gettime(Process::CLOCK_MONOTONIC) yield Process.clock_gettime(Process::CLOCK_MONOTONIC) - x end def fib(n) if n < 2 n else fib(n-2) + fib(n-1) end end # find fib that takes ~500ms fib_i = 50.times.find { |i| measure { fib(i) } >= 0.05 } sleep_i = measure { fib(fib_i) } threads = [ Thread.new { 100.times { sleep(sleep_i) # sometimes stalled waiting for fib's quantum to finish } puts "done 1" }, Thread.new { 100.times { fib(fib_i) }; puts "done 2" }, ] # We expect the total time to be about 100 * sleep_i (~5 seconds) because # theoretically the sleep thread could be done nearly completely in parallel to # the fib thread. # # But because the `sleep` thread is iterating over the sleep call, it must wait # for the `fib` thread to complete its quantum, before it can start the next iteration. # # This means each sleep iteration could take up to `sleep_i + 100ms` # # We're calling that stalled time "waste" total = measure { threads.each(&:join) } waste = total - (sleep_i * 100) p TOTAL: total, WASTE: waste ``` The program has two threads. One thread is using CPU time by computing `fib` in a loop. The other thread is simulating IO time by calling `sleep` in a loop. When the `sleep` call completes, it can stall, waiting for the quantum in the fib thread to expire. That means that each iteration on sleep can actually take `sleep time + thread quantum`, or in this case ~600ms when we expected it to only take ~500ms. Ideally, the above program would take `500ms * 100` since all `sleep` calls should be able to execute in parallel with the `fib` calls. Of course this isn't true because the sleep thread must acquire the GVL before it can continue the next iteration, so there will always be _some_ overhead. This feature is for allowing people to tune that overhead. If we run this program with the default quantum the output looks like this: ``` $ ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T14:49:50Z quantum-computing c7708d22c3) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 12.672821999993175, WASTE: 4.960721996147186} ``` The output shows that our program spent about 5 seconds stalled, waiting to acquire the GVL. With this patch we can lower the default quantum, and the output is like this: ``` $ RUBY_THREAD_DEFAULT_QUANTUM_MS=10 ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T22:06:35Z quantum-computing 087500643d) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 8.898526000091806, WASTE: 1.4168260043952614} ``` Specifying the ENV to change the quantum to 10ms lowered our waste in the program to ~1.4 seconds. It's common for web applications to do mixed CPU and IO bound tasks in threads (see the Puma webserver), so it would be great if there was a way to customize the thread quantum depending on your application's workload. -- https://bugs.ruby-lang.org/
 
            Issue #20861 has been updated by ivoanjo (Ivo Anjo). Out of curiosity, Python has a way of setting the GIL switching interval too: https://docs.python.org/3.6/library/sys.html#sys.setswitchinterval . Default interval seems to be 5ms (https://github.com/python/cpython/blob/47cbf038850852cdcbe7a404ed7c64542340d...). ---------------------------------------- Feature #20861: Add an environment variable for tuning the default thread quantum https://bugs.ruby-lang.org/issues/20861#change-110656 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- The default thread quantum is currently [hard coded at 100ms](https://github.com/ruby/ruby/blob/c7708d22c33040a74ea7ac683bf7407d3759edfe/t...). This can impact multithreaded systems that are trying to process Ruby level CPU bound work at the same time as IO work. I would like to add an environment variable `RUBY_THREAD_DEFAULT_QUANTUM_MS` that allows users to specify the default thread quantum (in milliseconds) via an environment variable. It defaults to our current default of 100ms. I've submitted the patch [here](https://github.com/ruby/ruby/pull/11981). Here is a Ruby program to demonstrate the problem: ```ruby def measure x = Process.clock_gettime(Process::CLOCK_MONOTONIC) yield Process.clock_gettime(Process::CLOCK_MONOTONIC) - x end def fib(n) if n < 2 n else fib(n-2) + fib(n-1) end end # find fib that takes ~500ms fib_i = 50.times.find { |i| measure { fib(i) } >= 0.05 } sleep_i = measure { fib(fib_i) } threads = [ Thread.new { 100.times { sleep(sleep_i) # sometimes stalled waiting for fib's quantum to finish } puts "done 1" }, Thread.new { 100.times { fib(fib_i) }; puts "done 2" }, ] # We expect the total time to be about 100 * sleep_i (~5 seconds) because # theoretically the sleep thread could be done nearly completely in parallel to # the fib thread. # # But because the `sleep` thread is iterating over the sleep call, it must wait # for the `fib` thread to complete its quantum, before it can start the next iteration. # # This means each sleep iteration could take up to `sleep_i + 100ms` # # We're calling that stalled time "waste" total = measure { threads.each(&:join) } waste = total - (sleep_i * 100) p TOTAL: total, WASTE: waste ``` The program has two threads. One thread is using CPU time by computing `fib` in a loop. The other thread is simulating IO time by calling `sleep` in a loop. When the `sleep` call completes, it can stall, waiting for the quantum in the fib thread to expire. That means that each iteration on sleep can actually take `sleep time + thread quantum`, or in this case ~600ms when we expected it to only take ~500ms. Ideally, the above program would take `500ms * 100` since all `sleep` calls should be able to execute in parallel with the `fib` calls. Of course this isn't true because the sleep thread must acquire the GVL before it can continue the next iteration, so there will always be _some_ overhead. This feature is for allowing people to tune that overhead. If we run this program with the default quantum the output looks like this: ``` $ ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T14:49:50Z quantum-computing c7708d22c3) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 12.672821999993175, WASTE: 4.960721996147186} ``` The output shows that our program spent about 5 seconds stalled, waiting to acquire the GVL. With this patch we can lower the default quantum, and the output is like this: ``` $ RUBY_THREAD_DEFAULT_QUANTUM_MS=10 ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T22:06:35Z quantum-computing 087500643d) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 8.898526000091806, WASTE: 1.4168260043952614} ``` Specifying the ENV to change the quantum to 10ms lowered our waste in the program to ~1.4 seconds. It's common for web applications to do mixed CPU and IO bound tasks in threads (see the Puma webserver), so it would be great if there was a way to customize the thread quantum depending on your application's workload. -- https://bugs.ruby-lang.org/
 
            Issue #20861 has been updated by tenderlovemaking (Aaron Patterson). At RubyConf we discussed this a bit and it seemed like the feature is fine, but maybe the name isn't good enough. How about `RUBY_THREAD_SWITCH_INTERVAL`? I like having "MS" in the name so that the units are obvious, but I really don't have a strong opinion. ---------------------------------------- Feature #20861: Add an environment variable for tuning the default thread quantum https://bugs.ruby-lang.org/issues/20861#change-110724 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- The default thread quantum is currently [hard coded at 100ms](https://github.com/ruby/ruby/blob/c7708d22c33040a74ea7ac683bf7407d3759edfe/t...). This can impact multithreaded systems that are trying to process Ruby level CPU bound work at the same time as IO work. I would like to add an environment variable `RUBY_THREAD_DEFAULT_QUANTUM_MS` that allows users to specify the default thread quantum (in milliseconds) via an environment variable. It defaults to our current default of 100ms. I've submitted the patch [here](https://github.com/ruby/ruby/pull/11981). Here is a Ruby program to demonstrate the problem: ```ruby def measure x = Process.clock_gettime(Process::CLOCK_MONOTONIC) yield Process.clock_gettime(Process::CLOCK_MONOTONIC) - x end def fib(n) if n < 2 n else fib(n-2) + fib(n-1) end end # find fib that takes ~500ms fib_i = 50.times.find { |i| measure { fib(i) } >= 0.05 } sleep_i = measure { fib(fib_i) } threads = [ Thread.new { 100.times { sleep(sleep_i) # sometimes stalled waiting for fib's quantum to finish } puts "done 1" }, Thread.new { 100.times { fib(fib_i) }; puts "done 2" }, ] # We expect the total time to be about 100 * sleep_i (~5 seconds) because # theoretically the sleep thread could be done nearly completely in parallel to # the fib thread. # # But because the `sleep` thread is iterating over the sleep call, it must wait # for the `fib` thread to complete its quantum, before it can start the next iteration. # # This means each sleep iteration could take up to `sleep_i + 100ms` # # We're calling that stalled time "waste" total = measure { threads.each(&:join) } waste = total - (sleep_i * 100) p TOTAL: total, WASTE: waste ``` The program has two threads. One thread is using CPU time by computing `fib` in a loop. The other thread is simulating IO time by calling `sleep` in a loop. When the `sleep` call completes, it can stall, waiting for the quantum in the fib thread to expire. That means that each iteration on sleep can actually take `sleep time + thread quantum`, or in this case ~600ms when we expected it to only take ~500ms. Ideally, the above program would take `500ms * 100` since all `sleep` calls should be able to execute in parallel with the `fib` calls. Of course this isn't true because the sleep thread must acquire the GVL before it can continue the next iteration, so there will always be _some_ overhead. This feature is for allowing people to tune that overhead. If we run this program with the default quantum the output looks like this: ``` $ ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T14:49:50Z quantum-computing c7708d22c3) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 12.672821999993175, WASTE: 4.960721996147186} ``` The output shows that our program spent about 5 seconds stalled, waiting to acquire the GVL. With this patch we can lower the default quantum, and the output is like this: ``` $ RUBY_THREAD_DEFAULT_QUANTUM_MS=10 ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T22:06:35Z quantum-computing 087500643d) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 8.898526000091806, WASTE: 1.4168260043952614} ``` Specifying the ENV to change the quantum to 10ms lowered our waste in the program to ~1.4 seconds. It's common for web applications to do mixed CPU and IO bound tasks in threads (see the Puma webserver), so it would be great if there was a way to customize the thread quantum depending on your application's workload. -- https://bugs.ruby-lang.org/
 
            Issue #20861 has been updated by jhawthorn (John Hawthorn). Another option for naming is to use "time slice": `RUBY_THREAD_TIMESLICE`. I'm okay with any of the proposed names For other prior art Linux's CFS scheduler exposes `/sys/kernel/debug/sched/base_slice_ns` ---------------------------------------- Feature #20861: Add an environment variable for tuning the default thread quantum https://bugs.ruby-lang.org/issues/20861#change-110726 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- The default thread quantum is currently [hard coded at 100ms](https://github.com/ruby/ruby/blob/c7708d22c33040a74ea7ac683bf7407d3759edfe/t...). This can impact multithreaded systems that are trying to process Ruby level CPU bound work at the same time as IO work. I would like to add an environment variable `RUBY_THREAD_DEFAULT_QUANTUM_MS` that allows users to specify the default thread quantum (in milliseconds) via an environment variable. It defaults to our current default of 100ms. I've submitted the patch [here](https://github.com/ruby/ruby/pull/11981). Here is a Ruby program to demonstrate the problem: ```ruby def measure x = Process.clock_gettime(Process::CLOCK_MONOTONIC) yield Process.clock_gettime(Process::CLOCK_MONOTONIC) - x end def fib(n) if n < 2 n else fib(n-2) + fib(n-1) end end # find fib that takes ~500ms fib_i = 50.times.find { |i| measure { fib(i) } >= 0.05 } sleep_i = measure { fib(fib_i) } threads = [ Thread.new { 100.times { sleep(sleep_i) # sometimes stalled waiting for fib's quantum to finish } puts "done 1" }, Thread.new { 100.times { fib(fib_i) }; puts "done 2" }, ] # We expect the total time to be about 100 * sleep_i (~5 seconds) because # theoretically the sleep thread could be done nearly completely in parallel to # the fib thread. # # But because the `sleep` thread is iterating over the sleep call, it must wait # for the `fib` thread to complete its quantum, before it can start the next iteration. # # This means each sleep iteration could take up to `sleep_i + 100ms` # # We're calling that stalled time "waste" total = measure { threads.each(&:join) } waste = total - (sleep_i * 100) p TOTAL: total, WASTE: waste ``` The program has two threads. One thread is using CPU time by computing `fib` in a loop. The other thread is simulating IO time by calling `sleep` in a loop. When the `sleep` call completes, it can stall, waiting for the quantum in the fib thread to expire. That means that each iteration on sleep can actually take `sleep time + thread quantum`, or in this case ~600ms when we expected it to only take ~500ms. Ideally, the above program would take `500ms * 100` since all `sleep` calls should be able to execute in parallel with the `fib` calls. Of course this isn't true because the sleep thread must acquire the GVL before it can continue the next iteration, so there will always be _some_ overhead. This feature is for allowing people to tune that overhead. If we run this program with the default quantum the output looks like this: ``` $ ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T14:49:50Z quantum-computing c7708d22c3) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 12.672821999993175, WASTE: 4.960721996147186} ``` The output shows that our program spent about 5 seconds stalled, waiting to acquire the GVL. With this patch we can lower the default quantum, and the output is like this: ``` $ RUBY_THREAD_DEFAULT_QUANTUM_MS=10 ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T22:06:35Z quantum-computing 087500643d) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 8.898526000091806, WASTE: 1.4168260043952614} ``` Specifying the ENV to change the quantum to 10ms lowered our waste in the program to ~1.4 seconds. It's common for web applications to do mixed CPU and IO bound tasks in threads (see the Puma webserver), so it would be great if there was a way to customize the thread quantum depending on your application's workload. -- https://bugs.ruby-lang.org/
 
            Issue #20861 has been updated by Dan0042 (Daniel DeLorme). byroot (Jean Boussier) wrote in #note-6:
Yes, quite literally: `Thread.current.quantum = 20` or something like that.
Which IMO is much easier to reason about than priorities. And generally you have threads that are meant as "main" threads and some that are meant as "background" and you'd want them to have different quantums, hence why I'd rather skip the environment variable and go straight to an accessor.
But perhaps a `Thread.default_quantum = XX` would be needed too.
What happened to this idea? According to the latest comments it seems we're back to the idea of an environment variable, but that means we - can't set the quantum after process start (let's say some performance monitoring periodically adjusts the value) - can't set the quantum per thread (but that seem to be fine according to #note-8) - can't set the quantum to a different value in a forked process As long as those limitations are not a problem, then an environment variable is fine. ---------------------------------------- Feature #20861: Add an environment variable for tuning the default thread quantum https://bugs.ruby-lang.org/issues/20861#change-110744 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- The default thread quantum is currently [hard coded at 100ms](https://github.com/ruby/ruby/blob/c7708d22c33040a74ea7ac683bf7407d3759edfe/t...). This can impact multithreaded systems that are trying to process Ruby level CPU bound work at the same time as IO work. I would like to add an environment variable `RUBY_THREAD_DEFAULT_QUANTUM_MS` that allows users to specify the default thread quantum (in milliseconds) via an environment variable. It defaults to our current default of 100ms. I've submitted the patch [here](https://github.com/ruby/ruby/pull/11981). Here is a Ruby program to demonstrate the problem: ```ruby def measure x = Process.clock_gettime(Process::CLOCK_MONOTONIC) yield Process.clock_gettime(Process::CLOCK_MONOTONIC) - x end def fib(n) if n < 2 n else fib(n-2) + fib(n-1) end end # find fib that takes ~500ms fib_i = 50.times.find { |i| measure { fib(i) } >= 0.05 } sleep_i = measure { fib(fib_i) } threads = [ Thread.new { 100.times { sleep(sleep_i) # sometimes stalled waiting for fib's quantum to finish } puts "done 1" }, Thread.new { 100.times { fib(fib_i) }; puts "done 2" }, ] # We expect the total time to be about 100 * sleep_i (~5 seconds) because # theoretically the sleep thread could be done nearly completely in parallel to # the fib thread. # # But because the `sleep` thread is iterating over the sleep call, it must wait # for the `fib` thread to complete its quantum, before it can start the next iteration. # # This means each sleep iteration could take up to `sleep_i + 100ms` # # We're calling that stalled time "waste" total = measure { threads.each(&:join) } waste = total - (sleep_i * 100) p TOTAL: total, WASTE: waste ``` The program has two threads. One thread is using CPU time by computing `fib` in a loop. The other thread is simulating IO time by calling `sleep` in a loop. When the `sleep` call completes, it can stall, waiting for the quantum in the fib thread to expire. That means that each iteration on sleep can actually take `sleep time + thread quantum`, or in this case ~600ms when we expected it to only take ~500ms. Ideally, the above program would take `500ms * 100` since all `sleep` calls should be able to execute in parallel with the `fib` calls. Of course this isn't true because the sleep thread must acquire the GVL before it can continue the next iteration, so there will always be _some_ overhead. This feature is for allowing people to tune that overhead. If we run this program with the default quantum the output looks like this: ``` $ ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T14:49:50Z quantum-computing c7708d22c3) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 12.672821999993175, WASTE: 4.960721996147186} ``` The output shows that our program spent about 5 seconds stalled, waiting to acquire the GVL. With this patch we can lower the default quantum, and the output is like this: ``` $ RUBY_THREAD_DEFAULT_QUANTUM_MS=10 ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T22:06:35Z quantum-computing 087500643d) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 8.898526000091806, WASTE: 1.4168260043952614} ``` Specifying the ENV to change the quantum to 10ms lowered our waste in the program to ~1.4 seconds. It's common for web applications to do mixed CPU and IO bound tasks in threads (see the Puma webserver), so it would be great if there was a way to customize the thread quantum depending on your application's workload. -- https://bugs.ruby-lang.org/
 
            Issue #20861 has been updated by ko1 (Koichi Sasada). I like `RUBY_THREAD_TIME_QUANTUM` for environment variable. ---------------------------------------- Feature #20861: Add an environment variable for tuning the default thread quantum https://bugs.ruby-lang.org/issues/20861#change-110900 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- The default thread quantum is currently [hard coded at 100ms](https://github.com/ruby/ruby/blob/c7708d22c33040a74ea7ac683bf7407d3759edfe/t...). This can impact multithreaded systems that are trying to process Ruby level CPU bound work at the same time as IO work. I would like to add an environment variable `RUBY_THREAD_DEFAULT_QUANTUM_MS` that allows users to specify the default thread quantum (in milliseconds) via an environment variable. It defaults to our current default of 100ms. I've submitted the patch [here](https://github.com/ruby/ruby/pull/11981). Here is a Ruby program to demonstrate the problem: ```ruby def measure x = Process.clock_gettime(Process::CLOCK_MONOTONIC) yield Process.clock_gettime(Process::CLOCK_MONOTONIC) - x end def fib(n) if n < 2 n else fib(n-2) + fib(n-1) end end # find fib that takes ~500ms fib_i = 50.times.find { |i| measure { fib(i) } >= 0.05 } sleep_i = measure { fib(fib_i) } threads = [ Thread.new { 100.times { sleep(sleep_i) # sometimes stalled waiting for fib's quantum to finish } puts "done 1" }, Thread.new { 100.times { fib(fib_i) }; puts "done 2" }, ] # We expect the total time to be about 100 * sleep_i (~5 seconds) because # theoretically the sleep thread could be done nearly completely in parallel to # the fib thread. # # But because the `sleep` thread is iterating over the sleep call, it must wait # for the `fib` thread to complete its quantum, before it can start the next iteration. # # This means each sleep iteration could take up to `sleep_i + 100ms` # # We're calling that stalled time "waste" total = measure { threads.each(&:join) } waste = total - (sleep_i * 100) p TOTAL: total, WASTE: waste ``` The program has two threads. One thread is using CPU time by computing `fib` in a loop. The other thread is simulating IO time by calling `sleep` in a loop. When the `sleep` call completes, it can stall, waiting for the quantum in the fib thread to expire. That means that each iteration on sleep can actually take `sleep time + thread quantum`, or in this case ~600ms when we expected it to only take ~500ms. Ideally, the above program would take `500ms * 100` since all `sleep` calls should be able to execute in parallel with the `fib` calls. Of course this isn't true because the sleep thread must acquire the GVL before it can continue the next iteration, so there will always be _some_ overhead. This feature is for allowing people to tune that overhead. If we run this program with the default quantum the output looks like this: ``` $ ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T14:49:50Z quantum-computing c7708d22c3) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 12.672821999993175, WASTE: 4.960721996147186} ``` The output shows that our program spent about 5 seconds stalled, waiting to acquire the GVL. With this patch we can lower the default quantum, and the output is like this: ``` $ RUBY_THREAD_DEFAULT_QUANTUM_MS=10 ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T22:06:35Z quantum-computing 087500643d) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 8.898526000091806, WASTE: 1.4168260043952614} ``` Specifying the ENV to change the quantum to 10ms lowered our waste in the program to ~1.4 seconds. It's common for web applications to do mixed CPU and IO bound tasks in threads (see the Puma webserver), so it would be great if there was a way to customize the thread quantum depending on your application's workload. -- https://bugs.ruby-lang.org/
 
            Issue #20861 has been updated by matz (Yukihiro Matsumoto). I vote for the environment variable `RUBY_THREAD_TIMESLICE`. The term **quantum** is unfamiliar to me. Matz. ---------------------------------------- Feature #20861: Add an environment variable for tuning the default thread quantum https://bugs.ruby-lang.org/issues/20861#change-110960 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- The default thread quantum is currently [hard coded at 100ms](https://github.com/ruby/ruby/blob/c7708d22c33040a74ea7ac683bf7407d3759edfe/t...). This can impact multithreaded systems that are trying to process Ruby level CPU bound work at the same time as IO work. I would like to add an environment variable `RUBY_THREAD_DEFAULT_QUANTUM_MS` that allows users to specify the default thread quantum (in milliseconds) via an environment variable. It defaults to our current default of 100ms. I've submitted the patch [here](https://github.com/ruby/ruby/pull/11981). Here is a Ruby program to demonstrate the problem: ```ruby def measure x = Process.clock_gettime(Process::CLOCK_MONOTONIC) yield Process.clock_gettime(Process::CLOCK_MONOTONIC) - x end def fib(n) if n < 2 n else fib(n-2) + fib(n-1) end end # find fib that takes ~500ms fib_i = 50.times.find { |i| measure { fib(i) } >= 0.05 } sleep_i = measure { fib(fib_i) } threads = [ Thread.new { 100.times { sleep(sleep_i) # sometimes stalled waiting for fib's quantum to finish } puts "done 1" }, Thread.new { 100.times { fib(fib_i) }; puts "done 2" }, ] # We expect the total time to be about 100 * sleep_i (~5 seconds) because # theoretically the sleep thread could be done nearly completely in parallel to # the fib thread. # # But because the `sleep` thread is iterating over the sleep call, it must wait # for the `fib` thread to complete its quantum, before it can start the next iteration. # # This means each sleep iteration could take up to `sleep_i + 100ms` # # We're calling that stalled time "waste" total = measure { threads.each(&:join) } waste = total - (sleep_i * 100) p TOTAL: total, WASTE: waste ``` The program has two threads. One thread is using CPU time by computing `fib` in a loop. The other thread is simulating IO time by calling `sleep` in a loop. When the `sleep` call completes, it can stall, waiting for the quantum in the fib thread to expire. That means that each iteration on sleep can actually take `sleep time + thread quantum`, or in this case ~600ms when we expected it to only take ~500ms. Ideally, the above program would take `500ms * 100` since all `sleep` calls should be able to execute in parallel with the `fib` calls. Of course this isn't true because the sleep thread must acquire the GVL before it can continue the next iteration, so there will always be _some_ overhead. This feature is for allowing people to tune that overhead. If we run this program with the default quantum the output looks like this: ``` $ ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T14:49:50Z quantum-computing c7708d22c3) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 12.672821999993175, WASTE: 4.960721996147186} ``` The output shows that our program spent about 5 seconds stalled, waiting to acquire the GVL. With this patch we can lower the default quantum, and the output is like this: ``` $ RUBY_THREAD_DEFAULT_QUANTUM_MS=10 ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T22:06:35Z quantum-computing 087500643d) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 8.898526000091806, WASTE: 1.4168260043952614} ``` Specifying the ENV to change the quantum to 10ms lowered our waste in the program to ~1.4 seconds. It's common for web applications to do mixed CPU and IO bound tasks in threads (see the Puma webserver), so it would be great if there was a way to customize the thread quantum depending on your application's workload. -- https://bugs.ruby-lang.org/
 
            Issue #20861 has been updated by luke-gru (Luke Gruber). I was taking a look at ractor scheduling recently to see how it all works, and I noticed that `thread->time_running_us` is never set back to `0` when a thread switches. It seems to me that after the initial switch after 100ms, it switches every time the interrupt is run (every 10ms currently). ---------------------------------------- Feature #20861: Add an environment variable for tuning the default thread quantum https://bugs.ruby-lang.org/issues/20861#change-111256 * Author: tenderlovemaking (Aaron Patterson) * Status: Closed ---------------------------------------- The default thread quantum is currently [hard coded at 100ms](https://github.com/ruby/ruby/blob/c7708d22c33040a74ea7ac683bf7407d3759edfe/t...). This can impact multithreaded systems that are trying to process Ruby level CPU bound work at the same time as IO work. I would like to add an environment variable `RUBY_THREAD_DEFAULT_QUANTUM_MS` that allows users to specify the default thread quantum (in milliseconds) via an environment variable. It defaults to our current default of 100ms. I've submitted the patch [here](https://github.com/ruby/ruby/pull/11981). Here is a Ruby program to demonstrate the problem: ```ruby def measure x = Process.clock_gettime(Process::CLOCK_MONOTONIC) yield Process.clock_gettime(Process::CLOCK_MONOTONIC) - x end def fib(n) if n < 2 n else fib(n-2) + fib(n-1) end end # find fib that takes ~500ms fib_i = 50.times.find { |i| measure { fib(i) } >= 0.05 } sleep_i = measure { fib(fib_i) } threads = [ Thread.new { 100.times { sleep(sleep_i) # sometimes stalled waiting for fib's quantum to finish } puts "done 1" }, Thread.new { 100.times { fib(fib_i) }; puts "done 2" }, ] # We expect the total time to be about 100 * sleep_i (~5 seconds) because # theoretically the sleep thread could be done nearly completely in parallel to # the fib thread. # # But because the `sleep` thread is iterating over the sleep call, it must wait # for the `fib` thread to complete its quantum, before it can start the next iteration. # # This means each sleep iteration could take up to `sleep_i + 100ms` # # We're calling that stalled time "waste" total = measure { threads.each(&:join) } waste = total - (sleep_i * 100) p TOTAL: total, WASTE: waste ``` The program has two threads. One thread is using CPU time by computing `fib` in a loop. The other thread is simulating IO time by calling `sleep` in a loop. When the `sleep` call completes, it can stall, waiting for the quantum in the fib thread to expire. That means that each iteration on sleep can actually take `sleep time + thread quantum`, or in this case ~600ms when we expected it to only take ~500ms. Ideally, the above program would take `500ms * 100` since all `sleep` calls should be able to execute in parallel with the `fib` calls. Of course this isn't true because the sleep thread must acquire the GVL before it can continue the next iteration, so there will always be _some_ overhead. This feature is for allowing people to tune that overhead. If we run this program with the default quantum the output looks like this: ``` $ ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T14:49:50Z quantum-computing c7708d22c3) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 12.672821999993175, WASTE: 4.960721996147186} ``` The output shows that our program spent about 5 seconds stalled, waiting to acquire the GVL. With this patch we can lower the default quantum, and the output is like this: ``` $ RUBY_THREAD_DEFAULT_QUANTUM_MS=10 ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T22:06:35Z quantum-computing 087500643d) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 8.898526000091806, WASTE: 1.4168260043952614} ``` Specifying the ENV to change the quantum to 10ms lowered our waste in the program to ~1.4 seconds. It's common for web applications to do mixed CPU and IO bound tasks in threads (see the Puma webserver), so it would be great if there was a way to customize the thread quantum depending on your application's workload. -- https://bugs.ruby-lang.org/
 
            Issue #20861 has been updated by luke-gru (Luke Gruber). Should the timer thread timeout be based off the thread quantum? My understanding is that currently it's every 10ms, and the threads check if they've exceeded the quantum 10 times before they yield. The check should probably be done by the timer thread, no? ---------------------------------------- Feature #20861: Add an environment variable for tuning the default thread quantum https://bugs.ruby-lang.org/issues/20861#change-111257 * Author: tenderlovemaking (Aaron Patterson) * Status: Closed ---------------------------------------- The default thread quantum is currently [hard coded at 100ms](https://github.com/ruby/ruby/blob/c7708d22c33040a74ea7ac683bf7407d3759edfe/t...). This can impact multithreaded systems that are trying to process Ruby level CPU bound work at the same time as IO work. I would like to add an environment variable `RUBY_THREAD_DEFAULT_QUANTUM_MS` that allows users to specify the default thread quantum (in milliseconds) via an environment variable. It defaults to our current default of 100ms. I've submitted the patch [here](https://github.com/ruby/ruby/pull/11981). Here is a Ruby program to demonstrate the problem: ```ruby def measure x = Process.clock_gettime(Process::CLOCK_MONOTONIC) yield Process.clock_gettime(Process::CLOCK_MONOTONIC) - x end def fib(n) if n < 2 n else fib(n-2) + fib(n-1) end end # find fib that takes ~500ms fib_i = 50.times.find { |i| measure { fib(i) } >= 0.05 } sleep_i = measure { fib(fib_i) } threads = [ Thread.new { 100.times { sleep(sleep_i) # sometimes stalled waiting for fib's quantum to finish } puts "done 1" }, Thread.new { 100.times { fib(fib_i) }; puts "done 2" }, ] # We expect the total time to be about 100 * sleep_i (~5 seconds) because # theoretically the sleep thread could be done nearly completely in parallel to # the fib thread. # # But because the `sleep` thread is iterating over the sleep call, it must wait # for the `fib` thread to complete its quantum, before it can start the next iteration. # # This means each sleep iteration could take up to `sleep_i + 100ms` # # We're calling that stalled time "waste" total = measure { threads.each(&:join) } waste = total - (sleep_i * 100) p TOTAL: total, WASTE: waste ``` The program has two threads. One thread is using CPU time by computing `fib` in a loop. The other thread is simulating IO time by calling `sleep` in a loop. When the `sleep` call completes, it can stall, waiting for the quantum in the fib thread to expire. That means that each iteration on sleep can actually take `sleep time + thread quantum`, or in this case ~600ms when we expected it to only take ~500ms. Ideally, the above program would take `500ms * 100` since all `sleep` calls should be able to execute in parallel with the `fib` calls. Of course this isn't true because the sleep thread must acquire the GVL before it can continue the next iteration, so there will always be _some_ overhead. This feature is for allowing people to tune that overhead. If we run this program with the default quantum the output looks like this: ``` $ ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T14:49:50Z quantum-computing c7708d22c3) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 12.672821999993175, WASTE: 4.960721996147186} ``` The output shows that our program spent about 5 seconds stalled, waiting to acquire the GVL. With this patch we can lower the default quantum, and the output is like this: ``` $ RUBY_THREAD_DEFAULT_QUANTUM_MS=10 ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T22:06:35Z quantum-computing 087500643d) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 8.898526000091806, WASTE: 1.4168260043952614} ``` Specifying the ENV to change the quantum to 10ms lowered our waste in the program to ~1.4 seconds. It's common for web applications to do mixed CPU and IO bound tasks in threads (see the Puma webserver), so it would be great if there was a way to customize the thread quantum depending on your application's workload. -- https://bugs.ruby-lang.org/
 
            Issue #20861 has been updated by nateberkopec (Nate Berkopec). Just to report back about this feature. I've tried this now with about 5 different apps of various sizes, ranging from 10 requests per second up to 1000 requests per second. Both on web (Puma) and background job (Sidekiq) workloads. Changing the value from the default to 10 had no measurable effect on latency distribution, even at p99 or p99.9. I had high hopes for this feature but it appears like this just doesn't happen all that often in the real world. ---------------------------------------- Feature #20861: Add an environment variable for tuning the default thread quantum https://bugs.ruby-lang.org/issues/20861#change-113495 * Author: tenderlovemaking (Aaron Patterson) * Status: Closed ---------------------------------------- The default thread quantum is currently [hard coded at 100ms](https://github.com/ruby/ruby/blob/c7708d22c33040a74ea7ac683bf7407d3759edfe/t...). This can impact multithreaded systems that are trying to process Ruby level CPU bound work at the same time as IO work. I would like to add an environment variable `RUBY_THREAD_DEFAULT_QUANTUM_MS` that allows users to specify the default thread quantum (in milliseconds) via an environment variable. It defaults to our current default of 100ms. I've submitted the patch [here](https://github.com/ruby/ruby/pull/11981). Here is a Ruby program to demonstrate the problem: ```ruby def measure x = Process.clock_gettime(Process::CLOCK_MONOTONIC) yield Process.clock_gettime(Process::CLOCK_MONOTONIC) - x end def fib(n) if n < 2 n else fib(n-2) + fib(n-1) end end # find fib that takes ~500ms fib_i = 50.times.find { |i| measure { fib(i) } >= 0.05 } sleep_i = measure { fib(fib_i) } threads = [ Thread.new { 100.times { sleep(sleep_i) # sometimes stalled waiting for fib's quantum to finish } puts "done 1" }, Thread.new { 100.times { fib(fib_i) }; puts "done 2" }, ] # We expect the total time to be about 100 * sleep_i (~5 seconds) because # theoretically the sleep thread could be done nearly completely in parallel to # the fib thread. # # But because the `sleep` thread is iterating over the sleep call, it must wait # for the `fib` thread to complete its quantum, before it can start the next iteration. # # This means each sleep iteration could take up to `sleep_i + 100ms` # # We're calling that stalled time "waste" total = measure { threads.each(&:join) } waste = total - (sleep_i * 100) p TOTAL: total, WASTE: waste ``` The program has two threads. One thread is using CPU time by computing `fib` in a loop. The other thread is simulating IO time by calling `sleep` in a loop. When the `sleep` call completes, it can stall, waiting for the quantum in the fib thread to expire. That means that each iteration on sleep can actually take `sleep time + thread quantum`, or in this case ~600ms when we expected it to only take ~500ms. Ideally, the above program would take `500ms * 100` since all `sleep` calls should be able to execute in parallel with the `fib` calls. Of course this isn't true because the sleep thread must acquire the GVL before it can continue the next iteration, so there will always be _some_ overhead. This feature is for allowing people to tune that overhead. If we run this program with the default quantum the output looks like this: ``` $ ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T14:49:50Z quantum-computing c7708d22c3) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 12.672821999993175, WASTE: 4.960721996147186} ``` The output shows that our program spent about 5 seconds stalled, waiting to acquire the GVL. With this patch we can lower the default quantum, and the output is like this: ``` $ RUBY_THREAD_DEFAULT_QUANTUM_MS=10 ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T22:06:35Z quantum-computing 087500643d) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 8.898526000091806, WASTE: 1.4168260043952614} ``` Specifying the ENV to change the quantum to 10ms lowered our waste in the program to ~1.4 seconds. It's common for web applications to do mixed CPU and IO bound tasks in threads (see the Puma webserver), so it would be great if there was a way to customize the thread quantum depending on your application's workload. -- https://bugs.ruby-lang.org/
 
            Issue #20861 has been updated by ivoanjo (Ivo Anjo). nateberkopec (Nate Berkopec) wrote in #note-22:
Just to report back about this feature. [...] I had high hopes for this feature but it appears like this just doesn't happen all that often in the real world.
That's somewhat disappointing! I was also hoping this may be a "quick" fix for improving multi-thread responsiveness, until perhaps some kind of better scheduling was introduced. [For what it's worth, I've tried with my I/O + background thread example from RubyConf](https://ivoanjo.me/rubyconf2024/#:~:text=Observing%20the%20GVL%20%236) and in that microbenchmark it does improve responsiveness. Out of curiosity, for those apps you tried, do you remember was the p99 and p99.9 of requests on those apps? Curious if they were in the tens, hundreds, or thousands of ms range. ---------------------------------------- Feature #20861: Add an environment variable for tuning the default thread quantum https://bugs.ruby-lang.org/issues/20861#change-113682 * Author: tenderlovemaking (Aaron Patterson) * Status: Closed ---------------------------------------- The default thread quantum is currently [hard coded at 100ms](https://github.com/ruby/ruby/blob/c7708d22c33040a74ea7ac683bf7407d3759edfe/t...). This can impact multithreaded systems that are trying to process Ruby level CPU bound work at the same time as IO work. I would like to add an environment variable `RUBY_THREAD_DEFAULT_QUANTUM_MS` that allows users to specify the default thread quantum (in milliseconds) via an environment variable. It defaults to our current default of 100ms. I've submitted the patch [here](https://github.com/ruby/ruby/pull/11981). Here is a Ruby program to demonstrate the problem: ```ruby def measure x = Process.clock_gettime(Process::CLOCK_MONOTONIC) yield Process.clock_gettime(Process::CLOCK_MONOTONIC) - x end def fib(n) if n < 2 n else fib(n-2) + fib(n-1) end end # find fib that takes ~500ms fib_i = 50.times.find { |i| measure { fib(i) } >= 0.05 } sleep_i = measure { fib(fib_i) } threads = [ Thread.new { 100.times { sleep(sleep_i) # sometimes stalled waiting for fib's quantum to finish } puts "done 1" }, Thread.new { 100.times { fib(fib_i) }; puts "done 2" }, ] # We expect the total time to be about 100 * sleep_i (~5 seconds) because # theoretically the sleep thread could be done nearly completely in parallel to # the fib thread. # # But because the `sleep` thread is iterating over the sleep call, it must wait # for the `fib` thread to complete its quantum, before it can start the next iteration. # # This means each sleep iteration could take up to `sleep_i + 100ms` # # We're calling that stalled time "waste" total = measure { threads.each(&:join) } waste = total - (sleep_i * 100) p TOTAL: total, WASTE: waste ``` The program has two threads. One thread is using CPU time by computing `fib` in a loop. The other thread is simulating IO time by calling `sleep` in a loop. When the `sleep` call completes, it can stall, waiting for the quantum in the fib thread to expire. That means that each iteration on sleep can actually take `sleep time + thread quantum`, or in this case ~600ms when we expected it to only take ~500ms. Ideally, the above program would take `500ms * 100` since all `sleep` calls should be able to execute in parallel with the `fib` calls. Of course this isn't true because the sleep thread must acquire the GVL before it can continue the next iteration, so there will always be _some_ overhead. This feature is for allowing people to tune that overhead. If we run this program with the default quantum the output looks like this: ``` $ ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T14:49:50Z quantum-computing c7708d22c3) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 12.672821999993175, WASTE: 4.960721996147186} ``` The output shows that our program spent about 5 seconds stalled, waiting to acquire the GVL. With this patch we can lower the default quantum, and the output is like this: ``` $ RUBY_THREAD_DEFAULT_QUANTUM_MS=10 ./miniruby -v fibtest.rb ruby 3.4.0dev (2024-11-01T22:06:35Z quantum-computing 087500643d) +PRISM [arm64-darwin24] done 2 done 1 {TOTAL: 8.898526000091806, WASTE: 1.4168260043952614} ``` Specifying the ENV to change the quantum to 10ms lowered our waste in the program to ~1.4 seconds. It's common for web applications to do mixed CPU and IO bound tasks in threads (see the Puma webserver), so it would be great if there was a way to customize the thread quantum depending on your application's workload. -- https://bugs.ruby-lang.org/
participants (11)
- 
                 byroot (Jean Boussier) byroot (Jean Boussier)
- 
                 Dan0042 (Daniel DeLorme) Dan0042 (Daniel DeLorme)
- 
                 ioquatix (Samuel Williams) ioquatix (Samuel Williams)
- 
                 ivoanjo (Ivo Anjo) ivoanjo (Ivo Anjo)
- 
                 jhawthorn (John Hawthorn) jhawthorn (John Hawthorn)
- 
                 ko1 (Koichi Sasada) ko1 (Koichi Sasada)
- 
                 luke-gru (Luke Gruber) luke-gru (Luke Gruber)
- 
                 matz (Yukihiro Matsumoto) matz (Yukihiro Matsumoto)
- 
                 nateberkopec (Nate Berkopec) nateberkopec (Nate Berkopec)
- 
                 nobu (Nobuyoshi Nakada) nobu (Nobuyoshi Nakada)
- 
                 tenderlovemaking (Aaron Patterson) tenderlovemaking (Aaron Patterson)