- ruby-core - ml.ruby-lang.org

[ruby-core:114528] [Ruby master Feature#17678] Ractors do not restart after fork
by ivoanjo (Ivo Anjo) 25 Aug '23

25 Aug '23

Issue #17678 has been updated by ivoanjo (Ivo Anjo). > The addition of Ractor#alive? and/or Ractor#status makes sense to me. Even in non-forked processes such methods could be useful. Note that you can get what you want already, by calling Ractor#inspect, so these methods would only need to expose information that Ractor is already storing. Thanks for looking into this! I don't think the info is there in #inspect... At least I don't get it on stable or latest ruby-head? 😅 Here's an updated example: ```ruby puts RUBY_DESCRIPTION r2 = Ractor.new { puts "[#{Process.pid}] Ractor started!"; sleep(1000) } puts "[#{Process.pid}] In parent process, ractor status is #{r2.inspect}" sleep(1) puts "[#{Process.pid}] Forking..." fork do puts "[#{Process.pid}] In child process, ractor status is #{r2.inspect}" end Process.wait ``` and here's what I get: ``` $ ruby ractor-test.rb ruby 3.3.0dev (2023-08-24T12:12:51Z master 5ec1fc52c1) [x86_64-linux] ractor-test.rb:3: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues. [10] In parent process, ractor status is #<Ractor:#2 ractor-test.rb:3 blocking> [10] Ractor started! [10] Forking... [12] In child process, ractor status is #<Ractor:#2 ractor-test.rb:3 blocking> ``` ---------------------------------------- Feature #17678: Ractors do not restart after fork https://bugs.ruby-lang.org/issues/17678#change-104327 * Author: ivoanjo (Ivo Anjo) * Status: Assigned * Priority: Normal * Assignee: ko1 (Koichi Sasada) ---------------------------------------- Hello there! I'm working at Datadog on the `ddtrace` gem -- <https://github.com/DataDog/dd-trace-rb> and we're experimenting with using Ractors in our library but run into a few issues. ### Background When running a Ractor as a background process, the Ractor stops & does not restart when the application forks. ### How to reproduce (Ruby version & script) `ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-linux]` ```ruby r2 = Ractor.new do loop { puts "[#{Process.pid}] Ractor"; sleep(1) } end sleep(1) puts "[#{Process.pid}] Forking..." fork do sleep(5) puts "[#{Process.pid}] End fork." end loop do sleep(1) end ``` ### Expectation and result The application prints “Ractor” each second in the main process, but not in the fork. Expected the Ractor (defined as `r2`) to run in the fork. ``` [29] Ractor [29] Ractor [29] Forking... [29] Ractor [29] Ractor [29] Ractor [29] Ractor [29] Ractor [32] End fork. [29] Ractor [29] Ractor [29] Ractor ``` ### Additional notes Threads do not restart across forks either, so it might not be unreasonable to expect consistent behavior. However, it’s possible to detect a dead Thread and recreate it after a fork (e.g. with `#alive?`, `#status`), but there’s no such mechanism for Ractors. ### Suggested solutions 1. Auto-restart Ractors after fork 2. Add additional methods to Ractors that allow users to check & manage the status of the Ractor, similar to Thread. -- https://bugs.ruby-lang.org/

1 0

[ruby-core:114525] [Ruby master Bug#17263] Fiber context switch degrades with number of fibers, limit on number of fibers
by ioquatix (Samuel Williams) 25 Aug '23

25 Aug '23

Issue #17263 has been updated by ioquatix (Samuel Williams). Status changed from Open to Closed My current conclusion is this: Based on the `perf` `cpu-cycles:k`, we see proportional increase in overhead related to the number of fibers, despite ultimately having the same total number of context switches. This is unfortunate, but not exactly unexpected as we are stressing virtual memory. Here are the results of my testing: ``` | fibers | elapsed time (s) | rate (t/s) | | ---------------- | ---------------- | ---------------- | | 1 | 0.91 | 10998609.17 | | 2 | 0.82 | 12239077.16 | | 4 | 0.77 | 12930013.16 | | 8 | 0.79 | 12678091.91 | | 16 | 0.79 | 12578625.99 | | 32 | 0.79 | 12598729.93 | | 64 | 0.79 | 12597254.54 | | 128 | 0.79 | 12643086.20 | | 256 | 0.83 | 12116891.53 | | 512 | 0.94 | 10654248.57 | | 1024 | 1.01 | 9865286.58 | | 2048 | 1.04 | 9644781.53 | | 4096 | 1.06 | 9455585.41 | | 8192 | 1.10 | 9070485.29 | | 16384 | 1.98 | 5054997.19 | | 32768 | 3.14 | 3189286.37 | | 65536 | 3.39 | 2949265.02 | | 131072 | 3.39 | 2951698.03 | | 262144 | 3.44 | 2910388.50 | | 524288 | 3.43 | 2915666.38 | | 1048576 | 3.43 | 2917077.46 | ``` ---------------------------------------- Bug #17263: Fiber context switch degrades with number of fibers, limit on number of fibers https://bugs.ruby-lang.org/issues/17263#change-104323 * Author: ciconia (Sharon Rosner) * Status: Closed * Priority: Normal * ruby -v: 2.7.1 * Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN ---------------------------------------- I'm working on developing [Polyphony](https://github.com/digital-fabric/polyphony), a Ruby gem for writing highly-concurrent Ruby programs with fibers. In the course of my work I have come up against two problems using Ruby fibers: 1. Fiber context switching performance seem to degrade as the number of fibers is increased. This is both with `Fiber#transfer` and `Fiber#resume/Fiber.yield`. 2. The number of concurrent fibers that can exist at any time seems to be limited. Once a certain number is reached (on my system this seems to be 31744 fibers), calling `Fiber#transfer` will raise a `FiberError` with the message `can't set a guard page: Cannot allocate memory`. This is not due to RAM being saturated. With 10000 fibers, my test program hovers at around 150MB RSS (on Ruby 2.7.1). Here's a program for testing the performance of `Fiber#transfer`: ```ruby # frozen_string_literal: true require 'fiber' class Fiber attr_accessor :next end def run(num_fibers) count = 0 GC.start GC.disable first = nil last = nil supervisor = Fiber.current num_fibers.times do fiber = Fiber.new do loop do count += 1 if count == 1_000_000 supervisor.transfer else Fiber.current.next.transfer end end end first ||= fiber last.next = fiber if last last = fiber end last.next = first t0 = Time.now first.transfer elapsed = Time.now - t0 rss = `ps -o rss= -p #{Process.pid}`.to_i puts "fibers: #{num_fibers} rss: #{rss} count: #{count} rate: #{count / elapsed}" rescue Exception => e puts "Stopped at #{count} fibers" p e end run(100) run(1000) run(10000) run(100000) ``` With Ruby 2.6.5 I'm getting: ``` fibers: 100 rss: 23212 count: 1000000 rate: 3357675.1688139187 fibers: 1000 rss: 31292 count: 1000000 rate: 2455537.056439736 fibers: 10000 rss: 127388 count: 1000000 rate: 954251.1674325482 Stopped at 22718 fibers #<FiberError: can't set a guard page: Cannot allocate memory> ``` With Ruby 2.7.1 I'm getting: ``` fibers: 100 rss: 23324 count: 1000000 rate: 3443916.967616508 fibers: 1000 rss: 34676 count: 1000000 rate: 2333315.3862491543 fibers: 10000 rss: 151364 count: 1000000 rate: 916772.1008060966 Stopped at 31744 fibers #<FiberError: can't set a guard page: Cannot allocate memory> ``` With ruby-head I get an almost identical result to that of 2.7.1. As you can see, the performance degradation is similar in all the three versions of Ruby, going from ~3.4M context switches per second for 100 fibers to less then 1M context switches per second for 10000 fibers. Running with 100000 fibers fails to complete. Here's a program for testing the performance of `Fiber#resume/Fiber.yield`: ```ruby # frozen_string_literal: true require 'fiber' class Fiber attr_accessor :next end # This program shows how the performance of Fiber.transfer degrades as the fiber # count increases def run(num_fibers) count = 0 GC.start GC.disable fibers = [] num_fibers.times do fibers << Fiber.new { loop { Fiber.yield } } end t0 = Time.now while count < 1000000 fibers.each do |f| count += 1 f.resume end end elapsed = Time.now - t0 puts "fibers: #{num_fibers} count: #{count} rate: #{count / elapsed}" rescue Exception => e puts "Stopped at #{count} fibers" p e end run(100) run(1000) run(10000) run(100000) ``` With Ruby 2.7.1 I'm getting the following output: ``` fibers: 100 count: 1000000 rate: 3048230.049946255 fibers: 1000 count: 1000000 rate: 2362235.6455160403 fibers: 10000 count: 1000000 rate: 950251.7621725246 Stopped at 21745 fibers #<FiberError: can't set a guard page: Cannot allocate memory> ``` As I understand it, theoretically at least switching between fibers should have a constant cost in terms of CPU cycles, irrespective of the number of fibers currently existing in memory. I am completely ignorant the implementation details of Ruby fibers, so at least for now I don't have any idea where this problem is coming from. ---Files-------------------------------- clipboard-202308251514-grqb1.png (81.3 KB) clipboard-202308251514-r7g4l.png (81 KB) clipboard-202308251538-kmofk.png (13.8 KB) -- https://bugs.ruby-lang.org/

1 0

[ruby-core:114524] [Ruby master Bug#17263] Fiber context switch degrades with number of fibers, limit on number of fibers
by ioquatix (Samuel Williams) 25 Aug '23

25 Aug '23

Issue #17263 has been updated by ioquatix (Samuel Williams). File clipboard-202308251538-kmofk.png added 1 million fibers is causing ~600GiB of virtual address space to be consumed. That seems like quite a lot: ![](clipboard-202308251538-kmofk.png) I'm not surprised if this is causing the the OS to thrash/have issues. ---------------------------------------- Bug #17263: Fiber context switch degrades with number of fibers, limit on number of fibers https://bugs.ruby-lang.org/issues/17263#change-104320 * Author: ciconia (Sharon Rosner) * Status: Open * Priority: Normal * ruby -v: 2.7.1 * Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN ---------------------------------------- I'm working on developing [Polyphony](https://github.com/digital-fabric/polyphony), a Ruby gem for writing highly-concurrent Ruby programs with fibers. In the course of my work I have come up against two problems using Ruby fibers: 1. Fiber context switching performance seem to degrade as the number of fibers is increased. This is both with `Fiber#transfer` and `Fiber#resume/Fiber.yield`. 2. The number of concurrent fibers that can exist at any time seems to be limited. Once a certain number is reached (on my system this seems to be 31744 fibers), calling `Fiber#transfer` will raise a `FiberError` with the message `can't set a guard page: Cannot allocate memory`. This is not due to RAM being saturated. With 10000 fibers, my test program hovers at around 150MB RSS (on Ruby 2.7.1). Here's a program for testing the performance of `Fiber#transfer`: ```ruby # frozen_string_literal: true require 'fiber' class Fiber attr_accessor :next end def run(num_fibers) count = 0 GC.start GC.disable first = nil last = nil supervisor = Fiber.current num_fibers.times do fiber = Fiber.new do loop do count += 1 if count == 1_000_000 supervisor.transfer else Fiber.current.next.transfer end end end first ||= fiber last.next = fiber if last last = fiber end last.next = first t0 = Time.now first.transfer elapsed = Time.now - t0 rss = `ps -o rss= -p #{Process.pid}`.to_i puts "fibers: #{num_fibers} rss: #{rss} count: #{count} rate: #{count / elapsed}" rescue Exception => e puts "Stopped at #{count} fibers" p e end run(100) run(1000) run(10000) run(100000) ``` With Ruby 2.6.5 I'm getting: ``` fibers: 100 rss: 23212 count: 1000000 rate: 3357675.1688139187 fibers: 1000 rss: 31292 count: 1000000 rate: 2455537.056439736 fibers: 10000 rss: 127388 count: 1000000 rate: 954251.1674325482 Stopped at 22718 fibers #<FiberError: can't set a guard page: Cannot allocate memory> ``` With Ruby 2.7.1 I'm getting: ``` fibers: 100 rss: 23324 count: 1000000 rate: 3443916.967616508 fibers: 1000 rss: 34676 count: 1000000 rate: 2333315.3862491543 fibers: 10000 rss: 151364 count: 1000000 rate: 916772.1008060966 Stopped at 31744 fibers #<FiberError: can't set a guard page: Cannot allocate memory> ``` With ruby-head I get an almost identical result to that of 2.7.1. As you can see, the performance degradation is similar in all the three versions of Ruby, going from ~3.4M context switches per second for 100 fibers to less then 1M context switches per second for 10000 fibers. Running with 100000 fibers fails to complete. Here's a program for testing the performance of `Fiber#resume/Fiber.yield`: ```ruby # frozen_string_literal: true require 'fiber' class Fiber attr_accessor :next end # This program shows how the performance of Fiber.transfer degrades as the fiber # count increases def run(num_fibers) count = 0 GC.start GC.disable fibers = [] num_fibers.times do fibers << Fiber.new { loop { Fiber.yield } } end t0 = Time.now while count < 1000000 fibers.each do |f| count += 1 f.resume end end elapsed = Time.now - t0 puts "fibers: #{num_fibers} count: #{count} rate: #{count / elapsed}" rescue Exception => e puts "Stopped at #{count} fibers" p e end run(100) run(1000) run(10000) run(100000) ``` With Ruby 2.7.1 I'm getting the following output: ``` fibers: 100 count: 1000000 rate: 3048230.049946255 fibers: 1000 count: 1000000 rate: 2362235.6455160403 fibers: 10000 count: 1000000 rate: 950251.7621725246 Stopped at 21745 fibers #<FiberError: can't set a guard page: Cannot allocate memory> ``` As I understand it, theoretically at least switching between fibers should have a constant cost in terms of CPU cycles, irrespective of the number of fibers currently existing in memory. I am completely ignorant the implementation details of Ruby fibers, so at least for now I don't have any idea where this problem is coming from. ---Files-------------------------------- clipboard-202308251514-grqb1.png (81.3 KB) clipboard-202308251514-r7g4l.png (81 KB) clipboard-202308251538-kmofk.png (13.8 KB) -- https://bugs.ruby-lang.org/

1 0

[ruby-core:114523] [Ruby master Bug#17263] Fiber context switch degrades with number of fibers, limit on number of fibers
by ioquatix (Samuel Williams) 25 Aug '23

25 Aug '23

Issue #17263 has been updated by ioquatix (Samuel Williams). File clipboard-202308251514-grqb1.png added File clipboard-202308251514-r7g4l.png added I ran some profiles to try and identify why it was so slow. What I found was `vm_push_frame` becomes slow. ![](clipboard-202308251514-grqb1.png) vs ![](clipboard-202308251514-r7g4l.png) It's quite a big difference. I believe this is because of the virtual memory mapping. I wonder if we could experimentally disable the guard pages to confirm this. ---------------------------------------- Bug #17263: Fiber context switch degrades with number of fibers, limit on number of fibers https://bugs.ruby-lang.org/issues/17263#change-104319 * Author: ciconia (Sharon Rosner) * Status: Open * Priority: Normal * ruby -v: 2.7.1 * Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN ---------------------------------------- I'm working on developing [Polyphony](https://github.com/digital-fabric/polyphony), a Ruby gem for writing highly-concurrent Ruby programs with fibers. In the course of my work I have come up against two problems using Ruby fibers: 1. Fiber context switching performance seem to degrade as the number of fibers is increased. This is both with `Fiber#transfer` and `Fiber#resume/Fiber.yield`. 2. The number of concurrent fibers that can exist at any time seems to be limited. Once a certain number is reached (on my system this seems to be 31744 fibers), calling `Fiber#transfer` will raise a `FiberError` with the message `can't set a guard page: Cannot allocate memory`. This is not due to RAM being saturated. With 10000 fibers, my test program hovers at around 150MB RSS (on Ruby 2.7.1). Here's a program for testing the performance of `Fiber#transfer`: ```ruby # frozen_string_literal: true require 'fiber' class Fiber attr_accessor :next end def run(num_fibers) count = 0 GC.start GC.disable first = nil last = nil supervisor = Fiber.current num_fibers.times do fiber = Fiber.new do loop do count += 1 if count == 1_000_000 supervisor.transfer else Fiber.current.next.transfer end end end first ||= fiber last.next = fiber if last last = fiber end last.next = first t0 = Time.now first.transfer elapsed = Time.now - t0 rss = `ps -o rss= -p #{Process.pid}`.to_i puts "fibers: #{num_fibers} rss: #{rss} count: #{count} rate: #{count / elapsed}" rescue Exception => e puts "Stopped at #{count} fibers" p e end run(100) run(1000) run(10000) run(100000) ``` With Ruby 2.6.5 I'm getting: ``` fibers: 100 rss: 23212 count: 1000000 rate: 3357675.1688139187 fibers: 1000 rss: 31292 count: 1000000 rate: 2455537.056439736 fibers: 10000 rss: 127388 count: 1000000 rate: 954251.1674325482 Stopped at 22718 fibers #<FiberError: can't set a guard page: Cannot allocate memory> ``` With Ruby 2.7.1 I'm getting: ``` fibers: 100 rss: 23324 count: 1000000 rate: 3443916.967616508 fibers: 1000 rss: 34676 count: 1000000 rate: 2333315.3862491543 fibers: 10000 rss: 151364 count: 1000000 rate: 916772.1008060966 Stopped at 31744 fibers #<FiberError: can't set a guard page: Cannot allocate memory> ``` With ruby-head I get an almost identical result to that of 2.7.1. As you can see, the performance degradation is similar in all the three versions of Ruby, going from ~3.4M context switches per second for 100 fibers to less then 1M context switches per second for 10000 fibers. Running with 100000 fibers fails to complete. Here's a program for testing the performance of `Fiber#resume/Fiber.yield`: ```ruby # frozen_string_literal: true require 'fiber' class Fiber attr_accessor :next end # This program shows how the performance of Fiber.transfer degrades as the fiber # count increases def run(num_fibers) count = 0 GC.start GC.disable fibers = [] num_fibers.times do fibers << Fiber.new { loop { Fiber.yield } } end t0 = Time.now while count < 1000000 fibers.each do |f| count += 1 f.resume end end elapsed = Time.now - t0 puts "fibers: #{num_fibers} count: #{count} rate: #{count / elapsed}" rescue Exception => e puts "Stopped at #{count} fibers" p e end run(100) run(1000) run(10000) run(100000) ``` With Ruby 2.7.1 I'm getting the following output: ``` fibers: 100 count: 1000000 rate: 3048230.049946255 fibers: 1000 count: 1000000 rate: 2362235.6455160403 fibers: 10000 count: 1000000 rate: 950251.7621725246 Stopped at 21745 fibers #<FiberError: can't set a guard page: Cannot allocate memory> ``` As I understand it, theoretically at least switching between fibers should have a constant cost in terms of CPU cycles, irrespective of the number of fibers currently existing in memory. I am completely ignorant the implementation details of Ruby fibers, so at least for now I don't have any idea where this problem is coming from. ---Files-------------------------------- clipboard-202308251514-grqb1.png (81.3 KB) clipboard-202308251514-r7g4l.png (81 KB) -- https://bugs.ruby-lang.org/

1 0

[ruby-core:114522] [Ruby master Feature#19057] Hide implementation of `rb_io_t`.
by ioquatix (Samuel Williams) 25 Aug '23

25 Aug '23

Issue #19057 has been updated by ioquatix (Samuel Williams). @naruse Here is the compatibility fix that will allow `kgio`, `unicorn` and so on to compile with no changes: https://github.com/ruby/ruby/pull/8286 Please feel free to merge that before doing the preview 2 release, if @normalperson has not released updated gems. However, bear in mind this delays the inevitable - I'd still like to propose we remove this completely in 3.4 - 3.3 has deprecations, and 3.4 is removed. Does that sound okay to you? ---------------------------------------- Feature #19057: Hide implementation of `rb_io_t`. https://bugs.ruby-lang.org/issues/19057#change-104318 * Author: ioquatix (Samuel Williams) * Status: Assigned * Priority: Normal * Assignee: ioquatix (Samuel Williams) * Target version: 3.3 ---------------------------------------- In order to make improvements to the IO implementation like <https://bugs.ruby-lang.org/issues/18455>, we need to add new fields to `struct rb_io_t`. By the way, ending types in `_t` is not recommended by POSIX, so I'm also trying to rename the internal implementation to drop `_t` where possible during this conversion. Anyway, we should try to hide the implementation of `struct rb_io`. Ideally, we don't expose any of it, but the problem is backwards compatibility. So, in order to remain backwards compatibility, we should expose some fields of `struct rb_io`, the most commonly used one is `fd` and `mode`, but several others are commonly used. There are many fields which should not be exposed because they are implementation details. ## Current proposal The current proposed change <https://github.com/ruby/ruby/pull/6511> creates two structs: ```c // include/ruby/io.h #ifndef RB_IO_T struct rb_io { int fd; // ... public fields ... }; #else struct rb_io; #endif // internal/io.h #define RB_IO_T struct rb_io { int fd; // ... public fields ... // ... private fields ... }; ``` However, we are not 100% confident this is safe according to the C specification. My experience is not sufficiently wide to say this is safe in practice, but it does look okay to both myself, and @Eregon + @tenderlovemaking have both given some kind of approval. That being said, maybe it's not safe. There are two alternatives: ## Hide all details We can make public `struct rb_io` completely invisible. ```c // include/ruby/io.h #define RB_IO_HIDDEN struct rb_io; int rb_ioptr_descriptor(struct rb_io *ioptr); // accessor for previously visible state. // internal/io.h struct rb_io { // ... all fields ... }; ``` This would only be forwards compatible, and code would need to feature detect like this: ```c #ifdef RB_IO_HIDDEN #define RB_IOPTR_DESCRIPTOR rb_ioptr_descriptor #else #define RB_IOPTR_DESCRIPTOR(ioptr) rb_ioptr_descriptor(ioptr) #endif ``` ## Nested public interface Alternatively, we can nest the public fields into the private struct: ```c // include/ruby/io.h struct rb_io_public { int fd; // ... public fields ... }; // internal/io.h #define RB_IO_T struct rb_io { struct rb_io_public public; // ... private fields ... }; ``` ## Considerations I personally think the "Hide all details" implementation is the best, but it's also the lest compatible. This is also what we are ultimately aiming for, whether we decide to take an intermediate "compatibility step" is up to us. I think "Nested public interface" is messy and introduces more complexity, but it might be slightly better defined than the "Current proposal" which might create undefined behaviour. That being said, all the tests are passing. -- https://bugs.ruby-lang.org/

1 0

[ruby-core:114521] [Ruby master Feature#19057] Hide implementation of `rb_io_t`.
by naruse (Yui NARUSE) 25 Aug '23

25 Aug '23

Issue #19057 has been updated by naruse (Yui NARUSE). I'll release Ruby 3.3.0 preview 2 soon. I'm concerning that those three projects don't support the preview yet. A preview release is to allow people to test their applications, but without support of those foundational projects it's hard to achieve the goal. If those projects are not fixed before the preview release, I'll revert changes related to Feature #19057 for preview 2. ---------------------------------------- Feature #19057: Hide implementation of `rb_io_t`. https://bugs.ruby-lang.org/issues/19057#change-104317 * Author: ioquatix (Samuel Williams) * Status: Assigned * Priority: Normal * Assignee: ioquatix (Samuel Williams) * Target version: 3.3 ---------------------------------------- In order to make improvements to the IO implementation like <https://bugs.ruby-lang.org/issues/18455>, we need to add new fields to `struct rb_io_t`. By the way, ending types in `_t` is not recommended by POSIX, so I'm also trying to rename the internal implementation to drop `_t` where possible during this conversion. Anyway, we should try to hide the implementation of `struct rb_io`. Ideally, we don't expose any of it, but the problem is backwards compatibility. So, in order to remain backwards compatibility, we should expose some fields of `struct rb_io`, the most commonly used one is `fd` and `mode`, but several others are commonly used. There are many fields which should not be exposed because they are implementation details. ## Current proposal The current proposed change <https://github.com/ruby/ruby/pull/6511> creates two structs: ```c // include/ruby/io.h #ifndef RB_IO_T struct rb_io { int fd; // ... public fields ... }; #else struct rb_io; #endif // internal/io.h #define RB_IO_T struct rb_io { int fd; // ... public fields ... // ... private fields ... }; ``` However, we are not 100% confident this is safe according to the C specification. My experience is not sufficiently wide to say this is safe in practice, but it does look okay to both myself, and @Eregon + @tenderlovemaking have both given some kind of approval. That being said, maybe it's not safe. There are two alternatives: ## Hide all details We can make public `struct rb_io` completely invisible. ```c // include/ruby/io.h #define RB_IO_HIDDEN struct rb_io; int rb_ioptr_descriptor(struct rb_io *ioptr); // accessor for previously visible state. // internal/io.h struct rb_io { // ... all fields ... }; ``` This would only be forwards compatible, and code would need to feature detect like this: ```c #ifdef RB_IO_HIDDEN #define RB_IOPTR_DESCRIPTOR rb_ioptr_descriptor #else #define RB_IOPTR_DESCRIPTOR(ioptr) rb_ioptr_descriptor(ioptr) #endif ``` ## Nested public interface Alternatively, we can nest the public fields into the private struct: ```c // include/ruby/io.h struct rb_io_public { int fd; // ... public fields ... }; // internal/io.h #define RB_IO_T struct rb_io { struct rb_io_public public; // ... private fields ... }; ``` ## Considerations I personally think the "Hide all details" implementation is the best, but it's also the lest compatible. This is also what we are ultimately aiming for, whether we decide to take an intermediate "compatibility step" is up to us. I think "Nested public interface" is messy and introduces more complexity, but it might be slightly better defined than the "Current proposal" which might create undefined behaviour. That being said, all the tests are passing. -- https://bugs.ruby-lang.org/

1 0

[ruby-core:114520] [Ruby master Bug#17263] Fiber context switch degrades with number of fibers, limit on number of fibers
by ioquatix (Samuel Williams) 25 Aug '23

25 Aug '23

Issue #17263 has been updated by ioquatix (Samuel Williams). It looks like roughly 3 page faults per fiber. If I run `x` fibers, I get `3x` page faults. It's proportional to the number of fibers, but I'm not sure how expensive this is. The CPU time is also costly, for `x` fibers, I get `50000x` kernel side CPU cycles. So for sure there is some overhead there. Hard to separate that from setup vs runtime though. ``` > sudo perf stat -e page-faults,cpu-cycles:u,cpu-cycles:k (which ruby) fiber.rb 100000 fibers: 100000 count: 1000000 rate: 2993069.28 Performance counter stats for '/home/samuel/.rubies/ruby-3.2.1/bin/ruby fiber.rb 100000': 323,536 page-faults 2,249,302,328 cpu-cycles:u 4,642,691,199 cpu-cycles:k 1.302578116 seconds time elapsed 0.439950000 seconds user 0.838409000 seconds sys > sudo perf stat -e page-faults,cpu-cycles:u,cpu-cycles:k (which ruby) fiber.rb 1000000 fibers: 1000000 count: 1000000 rate: 3029874.53 Performance counter stats for '/home/samuel/.rubies/ruby-3.2.1/bin/ruby fiber.rb 1000000': 3,210,454 page-faults 5,704,584,641 cpu-cycles:u 49,187,058,607 cpu-cycles:k 10.457917495 seconds time elapsed 1.122311000 seconds user 9.121725000 seconds sys ``` ---------------------------------------- Bug #17263: Fiber context switch degrades with number of fibers, limit on number of fibers https://bugs.ruby-lang.org/issues/17263#change-104316 * Author: ciconia (Sharon Rosner) * Status: Open * Priority: Normal * ruby -v: 2.7.1 * Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN ---------------------------------------- I'm working on developing [Polyphony](https://github.com/digital-fabric/polyphony), a Ruby gem for writing highly-concurrent Ruby programs with fibers. In the course of my work I have come up against two problems using Ruby fibers: 1. Fiber context switching performance seem to degrade as the number of fibers is increased. This is both with `Fiber#transfer` and `Fiber#resume/Fiber.yield`. 2. The number of concurrent fibers that can exist at any time seems to be limited. Once a certain number is reached (on my system this seems to be 31744 fibers), calling `Fiber#transfer` will raise a `FiberError` with the message `can't set a guard page: Cannot allocate memory`. This is not due to RAM being saturated. With 10000 fibers, my test program hovers at around 150MB RSS (on Ruby 2.7.1). Here's a program for testing the performance of `Fiber#transfer`: ```ruby # frozen_string_literal: true require 'fiber' class Fiber attr_accessor :next end def run(num_fibers) count = 0 GC.start GC.disable first = nil last = nil supervisor = Fiber.current num_fibers.times do fiber = Fiber.new do loop do count += 1 if count == 1_000_000 supervisor.transfer else Fiber.current.next.transfer end end end first ||= fiber last.next = fiber if last last = fiber end last.next = first t0 = Time.now first.transfer elapsed = Time.now - t0 rss = `ps -o rss= -p #{Process.pid}`.to_i puts "fibers: #{num_fibers} rss: #{rss} count: #{count} rate: #{count / elapsed}" rescue Exception => e puts "Stopped at #{count} fibers" p e end run(100) run(1000) run(10000) run(100000) ``` With Ruby 2.6.5 I'm getting: ``` fibers: 100 rss: 23212 count: 1000000 rate: 3357675.1688139187 fibers: 1000 rss: 31292 count: 1000000 rate: 2455537.056439736 fibers: 10000 rss: 127388 count: 1000000 rate: 954251.1674325482 Stopped at 22718 fibers #<FiberError: can't set a guard page: Cannot allocate memory> ``` With Ruby 2.7.1 I'm getting: ``` fibers: 100 rss: 23324 count: 1000000 rate: 3443916.967616508 fibers: 1000 rss: 34676 count: 1000000 rate: 2333315.3862491543 fibers: 10000 rss: 151364 count: 1000000 rate: 916772.1008060966 Stopped at 31744 fibers #<FiberError: can't set a guard page: Cannot allocate memory> ``` With ruby-head I get an almost identical result to that of 2.7.1. As you can see, the performance degradation is similar in all the three versions of Ruby, going from ~3.4M context switches per second for 100 fibers to less then 1M context switches per second for 10000 fibers. Running with 100000 fibers fails to complete. Here's a program for testing the performance of `Fiber#resume/Fiber.yield`: ```ruby # frozen_string_literal: true require 'fiber' class Fiber attr_accessor :next end # This program shows how the performance of Fiber.transfer degrades as the fiber # count increases def run(num_fibers) count = 0 GC.start GC.disable fibers = [] num_fibers.times do fibers << Fiber.new { loop { Fiber.yield } } end t0 = Time.now while count < 1000000 fibers.each do |f| count += 1 f.resume end end elapsed = Time.now - t0 puts "fibers: #{num_fibers} count: #{count} rate: #{count / elapsed}" rescue Exception => e puts "Stopped at #{count} fibers" p e end run(100) run(1000) run(10000) run(100000) ``` With Ruby 2.7.1 I'm getting the following output: ``` fibers: 100 count: 1000000 rate: 3048230.049946255 fibers: 1000 count: 1000000 rate: 2362235.6455160403 fibers: 10000 count: 1000000 rate: 950251.7621725246 Stopped at 21745 fibers #<FiberError: can't set a guard page: Cannot allocate memory> ``` As I understand it, theoretically at least switching between fibers should have a constant cost in terms of CPU cycles, irrespective of the number of fibers currently existing in memory. I am completely ignorant the implementation details of Ruby fibers, so at least for now I don't have any idea where this problem is coming from. -- https://bugs.ruby-lang.org/

1 0

[ruby-core:114519] [Ruby master Bug#17263] Fiber context switch degrades with number of fibers, limit on number of fibers
by ioquatix (Samuel Williams) 25 Aug '23

25 Aug '23

Issue #17263 has been updated by ioquatix (Samuel Williams). The difference is negligible but there did appear to be some improvement. We obviously need better benchmark tools, because this is total eye-ball statistics, but it's expected that less memory dependency between instructions should be better. https://github.com/ruby/ruby/pull/8284 ---------------------------------------- Bug #17263: Fiber context switch degrades with number of fibers, limit on number of fibers https://bugs.ruby-lang.org/issues/17263#change-104315 * Author: ciconia (Sharon Rosner) * Status: Open * Priority: Normal * ruby -v: 2.7.1 * Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN ---------------------------------------- I'm working on developing [Polyphony](https://github.com/digital-fabric/polyphony), a Ruby gem for writing highly-concurrent Ruby programs with fibers. In the course of my work I have come up against two problems using Ruby fibers: 1. Fiber context switching performance seem to degrade as the number of fibers is increased. This is both with `Fiber#transfer` and `Fiber#resume/Fiber.yield`. 2. The number of concurrent fibers that can exist at any time seems to be limited. Once a certain number is reached (on my system this seems to be 31744 fibers), calling `Fiber#transfer` will raise a `FiberError` with the message `can't set a guard page: Cannot allocate memory`. This is not due to RAM being saturated. With 10000 fibers, my test program hovers at around 150MB RSS (on Ruby 2.7.1). Here's a program for testing the performance of `Fiber#transfer`: ```ruby # frozen_string_literal: true require 'fiber' class Fiber attr_accessor :next end def run(num_fibers) count = 0 GC.start GC.disable first = nil last = nil supervisor = Fiber.current num_fibers.times do fiber = Fiber.new do loop do count += 1 if count == 1_000_000 supervisor.transfer else Fiber.current.next.transfer end end end first ||= fiber last.next = fiber if last last = fiber end last.next = first t0 = Time.now first.transfer elapsed = Time.now - t0 rss = `ps -o rss= -p #{Process.pid}`.to_i puts "fibers: #{num_fibers} rss: #{rss} count: #{count} rate: #{count / elapsed}" rescue Exception => e puts "Stopped at #{count} fibers" p e end run(100) run(1000) run(10000) run(100000) ``` With Ruby 2.6.5 I'm getting: ``` fibers: 100 rss: 23212 count: 1000000 rate: 3357675.1688139187 fibers: 1000 rss: 31292 count: 1000000 rate: 2455537.056439736 fibers: 10000 rss: 127388 count: 1000000 rate: 954251.1674325482 Stopped at 22718 fibers #<FiberError: can't set a guard page: Cannot allocate memory> ``` With Ruby 2.7.1 I'm getting: ``` fibers: 100 rss: 23324 count: 1000000 rate: 3443916.967616508 fibers: 1000 rss: 34676 count: 1000000 rate: 2333315.3862491543 fibers: 10000 rss: 151364 count: 1000000 rate: 916772.1008060966 Stopped at 31744 fibers #<FiberError: can't set a guard page: Cannot allocate memory> ``` With ruby-head I get an almost identical result to that of 2.7.1. As you can see, the performance degradation is similar in all the three versions of Ruby, going from ~3.4M context switches per second for 100 fibers to less then 1M context switches per second for 10000 fibers. Running with 100000 fibers fails to complete. Here's a program for testing the performance of `Fiber#resume/Fiber.yield`: ```ruby # frozen_string_literal: true require 'fiber' class Fiber attr_accessor :next end # This program shows how the performance of Fiber.transfer degrades as the fiber # count increases def run(num_fibers) count = 0 GC.start GC.disable fibers = [] num_fibers.times do fibers << Fiber.new { loop { Fiber.yield } } end t0 = Time.now while count < 1000000 fibers.each do |f| count += 1 f.resume end end elapsed = Time.now - t0 puts "fibers: #{num_fibers} count: #{count} rate: #{count / elapsed}" rescue Exception => e puts "Stopped at #{count} fibers" p e end run(100) run(1000) run(10000) run(100000) ``` With Ruby 2.7.1 I'm getting the following output: ``` fibers: 100 count: 1000000 rate: 3048230.049946255 fibers: 1000 count: 1000000 rate: 2362235.6455160403 fibers: 10000 count: 1000000 rate: 950251.7621725246 Stopped at 21745 fibers #<FiberError: can't set a guard page: Cannot allocate memory> ``` As I understand it, theoretically at least switching between fibers should have a constant cost in terms of CPU cycles, irrespective of the number of fibers currently existing in memory. I am completely ignorant the implementation details of Ruby fibers, so at least for now I don't have any idea where this problem is coming from. -- https://bugs.ruby-lang.org/

1 0

[ruby-core:114518] [Ruby master Bug#18036] Pthread fibers become invalid on fork - different from normal fibers.
by jeremyevans0 (Jeremy Evans) 25 Aug '23

25 Aug '23

Issue #18036 has been updated by jeremyevans0 (Jeremy Evans). Status changed from Open to Closed I tested the example in https://bugs.ruby-lang.org/issues/18036#note-3 in my environment, and since Ruby 3.0, it doesn't fail (it does segfault on Ruby 2.7). I tested with Ruby master and the behavior is the same with both the amd64 coroutine and pthread coroutine, which is that `Fiber.yield` raises an FiberError in the forked process ("attempt to yield on a not resumed fiber"). Since the behavior doesn't appear to be unreliable, I'm going to close this. ---------------------------------------- Bug #18036: Pthread fibers become invalid on fork - different from normal fibers. https://bugs.ruby-lang.org/issues/18036#change-104314 * Author: ioquatix (Samuel Williams) * Status: Closed * Priority: Normal * Assignee: ioquatix (Samuel Williams) * Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN ---------------------------------------- Fork is notoriously hard to use correctly and I most cases we should be encouraging `Process#spawn`. However, it does have use cases for example pre-fork model of server design. So there are some valid usage at least. We recently introduced non-native fiber based on pthread which is generally more compatible than copy coroutine w.r.t. the overall burden on the implementation. However, it has one weak point, which is that pthreads become invalid on fork, and thus fibers become invalid on fork. That means that the following program can become invalid: ``` Fiber.new do fork end.resume ``` It will create two threads, the main thread and the thread for the fiber. When the child begins execution, it will be within the child pthread, but the parent pthread is no longer valid, i.e. it's gone. I see a couple of options here (not mutually exclusive): - Combining Fibers and fork is invalid. Fork only works from main fiber. - Ignore the problem and expect users of fork to be aware that the program can potentially enter an invalid state - okay for `fork-exec` but not much else. - Terminate all non-current fibers as we do for threads, and possibly fail if the current fiber exits for some reason. Because pthread coroutine should be very uncommon, I don't think we should sacrifice the general good qualities of the fiber semantic model for some obscure case. Maybe it would be sufficient to have a warning (not printed by default unless running on pthread coroutines), that fork within a non-main fiber can have undefined results. -- https://bugs.ruby-lang.org/

1 0

[ruby-core:114516] [Ruby master Bug#17926] spec/ruby/core/file/atime_spec.rb: a random failing test on Travis ppc64le
by jeremyevans0 (Jeremy Evans) 24 Aug '23

24 Aug '23

Issue #17926 has been updated by jeremyevans0 (Jeremy Evans). Status changed from Open to Feedback @jaruga We haven't been seeing this error on the ppc64le RubyCI server recently. Is this OK to close? ---------------------------------------- Bug #17926: spec/ruby/core/file/atime_spec.rb: a random failing test on Travis ppc64le https://bugs.ruby-lang.org/issues/17926#change-104312 * Author: jaruga (Jun Aruga) * Status: Feedback * Priority: Normal * Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN ---------------------------------------- I just observed the following test failing randomly on Travis ppc64le at the commit:fd65ef2a5aa8629676d1edb6410e4d4cf60b8045. When running this commit on my forked repo, the test passed. https://travis-ci.com/github/ruby/ruby/jobs/509310540#L2255 ``` $ $SETARCH make -s test-spec MSPECOPT=-ff ... 1) File.atime returns the last access time for the named file with microseconds FAILED Expected 0 == 123456 to be truthy but was false /home/travis/build/ruby/ruby/spec/ruby/core/file/atime_spec.rb:25:in `block (3 levels) in <top (required)>' /home/travis/build/ruby/ruby/spec/ruby/core/file/atime_spec.rb:3:in `<top (required)>' ``` -- https://bugs.ruby-lang.org/

1 0