ml.ruby-lang.org
Sign In Sign Up
Manage this list Sign In Sign Up

Keyboard Shortcuts

Thread View

  • j: Next unread message
  • k: Previous unread message
  • j a: Jump to all threads
  • j l: Jump to MailingList overview

ruby-core

Thread Start a new thread
Download
Threads by month
  • ----- 2025 -----
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2024 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2023 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2022 -----
  • December
  • November
ruby-core@ml.ruby-lang.org

August 2023

  • 2 participants
  • 205 discussions
[ruby-core:114523] [Ruby master Bug#17263] Fiber context switch degrades with number of fibers, limit on number of fibers
by ioquatix (Samuel Williams) 25 Aug '23

25 Aug '23
Issue #17263 has been updated by ioquatix (Samuel Williams). File clipboard-202308251514-grqb1.png added File clipboard-202308251514-r7g4l.png added I ran some profiles to try and identify why it was so slow. What I found was `vm_push_frame` becomes slow. ![](clipboard-202308251514-grqb1.png) vs ![](clipboard-202308251514-r7g4l.png) It's quite a big difference. I believe this is because of the virtual memory mapping. I wonder if we could experimentally disable the guard pages to confirm this. ---------------------------------------- Bug #17263: Fiber context switch degrades with number of fibers, limit on number of fibers https://bugs.ruby-lang.org/issues/17263#change-104319 * Author: ciconia (Sharon Rosner) * Status: Open * Priority: Normal * ruby -v: 2.7.1 * Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN ---------------------------------------- I'm working on developing [Polyphony](https://github.com/digital-fabric/polyphony), a Ruby gem for writing highly-concurrent Ruby programs with fibers. In the course of my work I have come up against two problems using Ruby fibers: 1. Fiber context switching performance seem to degrade as the number of fibers is increased. This is both with `Fiber#transfer` and `Fiber#resume/Fiber.yield`. 2. The number of concurrent fibers that can exist at any time seems to be limited. Once a certain number is reached (on my system this seems to be 31744 fibers), calling `Fiber#transfer` will raise a `FiberError` with the message `can't set a guard page: Cannot allocate memory`. This is not due to RAM being saturated. With 10000 fibers, my test program hovers at around 150MB RSS (on Ruby 2.7.1). Here's a program for testing the performance of `Fiber#transfer`: ```ruby # frozen_string_literal: true require 'fiber' class Fiber attr_accessor :next end def run(num_fibers) count = 0 GC.start GC.disable first = nil last = nil supervisor = Fiber.current num_fibers.times do fiber = Fiber.new do loop do count += 1 if count == 1_000_000 supervisor.transfer else Fiber.current.next.transfer end end end first ||= fiber last.next = fiber if last last = fiber end last.next = first t0 = Time.now first.transfer elapsed = Time.now - t0 rss = `ps -o rss= -p #{Process.pid}`.to_i puts "fibers: #{num_fibers} rss: #{rss} count: #{count} rate: #{count / elapsed}" rescue Exception => e puts "Stopped at #{count} fibers" p e end run(100) run(1000) run(10000) run(100000) ``` With Ruby 2.6.5 I'm getting: ``` fibers: 100 rss: 23212 count: 1000000 rate: 3357675.1688139187 fibers: 1000 rss: 31292 count: 1000000 rate: 2455537.056439736 fibers: 10000 rss: 127388 count: 1000000 rate: 954251.1674325482 Stopped at 22718 fibers #<FiberError: can't set a guard page: Cannot allocate memory> ``` With Ruby 2.7.1 I'm getting: ``` fibers: 100 rss: 23324 count: 1000000 rate: 3443916.967616508 fibers: 1000 rss: 34676 count: 1000000 rate: 2333315.3862491543 fibers: 10000 rss: 151364 count: 1000000 rate: 916772.1008060966 Stopped at 31744 fibers #<FiberError: can't set a guard page: Cannot allocate memory> ``` With ruby-head I get an almost identical result to that of 2.7.1. As you can see, the performance degradation is similar in all the three versions of Ruby, going from ~3.4M context switches per second for 100 fibers to less then 1M context switches per second for 10000 fibers. Running with 100000 fibers fails to complete. Here's a program for testing the performance of `Fiber#resume/Fiber.yield`: ```ruby # frozen_string_literal: true require 'fiber' class Fiber attr_accessor :next end # This program shows how the performance of Fiber.transfer degrades as the fiber # count increases def run(num_fibers) count = 0 GC.start GC.disable fibers = [] num_fibers.times do fibers << Fiber.new { loop { Fiber.yield } } end t0 = Time.now while count < 1000000 fibers.each do |f| count += 1 f.resume end end elapsed = Time.now - t0 puts "fibers: #{num_fibers} count: #{count} rate: #{count / elapsed}" rescue Exception => e puts "Stopped at #{count} fibers" p e end run(100) run(1000) run(10000) run(100000) ``` With Ruby 2.7.1 I'm getting the following output: ``` fibers: 100 count: 1000000 rate: 3048230.049946255 fibers: 1000 count: 1000000 rate: 2362235.6455160403 fibers: 10000 count: 1000000 rate: 950251.7621725246 Stopped at 21745 fibers #<FiberError: can't set a guard page: Cannot allocate memory> ``` As I understand it, theoretically at least switching between fibers should have a constant cost in terms of CPU cycles, irrespective of the number of fibers currently existing in memory. I am completely ignorant the implementation details of Ruby fibers, so at least for now I don't have any idea where this problem is coming from. ---Files-------------------------------- clipboard-202308251514-grqb1.png (81.3 KB) clipboard-202308251514-r7g4l.png (81 KB) -- https://bugs.ruby-lang.org/
1 0
0 0
[ruby-core:114522] [Ruby master Feature#19057] Hide implementation of `rb_io_t`.
by ioquatix (Samuel Williams) 25 Aug '23

25 Aug '23
Issue #19057 has been updated by ioquatix (Samuel Williams). @naruse Here is the compatibility fix that will allow `kgio`, `unicorn` and so on to compile with no changes: https://github.com/ruby/ruby/pull/8286 Please feel free to merge that before doing the preview 2 release, if @normalperson has not released updated gems. However, bear in mind this delays the inevitable - I'd still like to propose we remove this completely in 3.4 - 3.3 has deprecations, and 3.4 is removed. Does that sound okay to you? ---------------------------------------- Feature #19057: Hide implementation of `rb_io_t`. https://bugs.ruby-lang.org/issues/19057#change-104318 * Author: ioquatix (Samuel Williams) * Status: Assigned * Priority: Normal * Assignee: ioquatix (Samuel Williams) * Target version: 3.3 ---------------------------------------- In order to make improvements to the IO implementation like <https://bugs.ruby-lang.org/issues/18455>, we need to add new fields to `struct rb_io_t`. By the way, ending types in `_t` is not recommended by POSIX, so I'm also trying to rename the internal implementation to drop `_t` where possible during this conversion. Anyway, we should try to hide the implementation of `struct rb_io`. Ideally, we don't expose any of it, but the problem is backwards compatibility. So, in order to remain backwards compatibility, we should expose some fields of `struct rb_io`, the most commonly used one is `fd` and `mode`, but several others are commonly used. There are many fields which should not be exposed because they are implementation details. ## Current proposal The current proposed change <https://github.com/ruby/ruby/pull/6511> creates two structs: ```c // include/ruby/io.h #ifndef RB_IO_T struct rb_io { int fd; // ... public fields ... }; #else struct rb_io; #endif // internal/io.h #define RB_IO_T struct rb_io { int fd; // ... public fields ... // ... private fields ... }; ``` However, we are not 100% confident this is safe according to the C specification. My experience is not sufficiently wide to say this is safe in practice, but it does look okay to both myself, and @Eregon + @tenderlovemaking have both given some kind of approval. That being said, maybe it's not safe. There are two alternatives: ## Hide all details We can make public `struct rb_io` completely invisible. ```c // include/ruby/io.h #define RB_IO_HIDDEN struct rb_io; int rb_ioptr_descriptor(struct rb_io *ioptr); // accessor for previously visible state. // internal/io.h struct rb_io { // ... all fields ... }; ``` This would only be forwards compatible, and code would need to feature detect like this: ```c #ifdef RB_IO_HIDDEN #define RB_IOPTR_DESCRIPTOR rb_ioptr_descriptor #else #define RB_IOPTR_DESCRIPTOR(ioptr) rb_ioptr_descriptor(ioptr) #endif ``` ## Nested public interface Alternatively, we can nest the public fields into the private struct: ```c // include/ruby/io.h struct rb_io_public { int fd; // ... public fields ... }; // internal/io.h #define RB_IO_T struct rb_io { struct rb_io_public public; // ... private fields ... }; ``` ## Considerations I personally think the "Hide all details" implementation is the best, but it's also the lest compatible. This is also what we are ultimately aiming for, whether we decide to take an intermediate "compatibility step" is up to us. I think "Nested public interface" is messy and introduces more complexity, but it might be slightly better defined than the "Current proposal" which might create undefined behaviour. That being said, all the tests are passing. -- https://bugs.ruby-lang.org/
1 0
0 0
[ruby-core:114521] [Ruby master Feature#19057] Hide implementation of `rb_io_t`.
by naruse (Yui NARUSE) 25 Aug '23

25 Aug '23
Issue #19057 has been updated by naruse (Yui NARUSE). I'll release Ruby 3.3.0 preview 2 soon. I'm concerning that those three projects don't support the preview yet. A preview release is to allow people to test their applications, but without support of those foundational projects it's hard to achieve the goal. If those projects are not fixed before the preview release, I'll revert changes related to Feature #19057 for preview 2. ---------------------------------------- Feature #19057: Hide implementation of `rb_io_t`. https://bugs.ruby-lang.org/issues/19057#change-104317 * Author: ioquatix (Samuel Williams) * Status: Assigned * Priority: Normal * Assignee: ioquatix (Samuel Williams) * Target version: 3.3 ---------------------------------------- In order to make improvements to the IO implementation like <https://bugs.ruby-lang.org/issues/18455>, we need to add new fields to `struct rb_io_t`. By the way, ending types in `_t` is not recommended by POSIX, so I'm also trying to rename the internal implementation to drop `_t` where possible during this conversion. Anyway, we should try to hide the implementation of `struct rb_io`. Ideally, we don't expose any of it, but the problem is backwards compatibility. So, in order to remain backwards compatibility, we should expose some fields of `struct rb_io`, the most commonly used one is `fd` and `mode`, but several others are commonly used. There are many fields which should not be exposed because they are implementation details. ## Current proposal The current proposed change <https://github.com/ruby/ruby/pull/6511> creates two structs: ```c // include/ruby/io.h #ifndef RB_IO_T struct rb_io { int fd; // ... public fields ... }; #else struct rb_io; #endif // internal/io.h #define RB_IO_T struct rb_io { int fd; // ... public fields ... // ... private fields ... }; ``` However, we are not 100% confident this is safe according to the C specification. My experience is not sufficiently wide to say this is safe in practice, but it does look okay to both myself, and @Eregon + @tenderlovemaking have both given some kind of approval. That being said, maybe it's not safe. There are two alternatives: ## Hide all details We can make public `struct rb_io` completely invisible. ```c // include/ruby/io.h #define RB_IO_HIDDEN struct rb_io; int rb_ioptr_descriptor(struct rb_io *ioptr); // accessor for previously visible state. // internal/io.h struct rb_io { // ... all fields ... }; ``` This would only be forwards compatible, and code would need to feature detect like this: ```c #ifdef RB_IO_HIDDEN #define RB_IOPTR_DESCRIPTOR rb_ioptr_descriptor #else #define RB_IOPTR_DESCRIPTOR(ioptr) rb_ioptr_descriptor(ioptr) #endif ``` ## Nested public interface Alternatively, we can nest the public fields into the private struct: ```c // include/ruby/io.h struct rb_io_public { int fd; // ... public fields ... }; // internal/io.h #define RB_IO_T struct rb_io { struct rb_io_public public; // ... private fields ... }; ``` ## Considerations I personally think the "Hide all details" implementation is the best, but it's also the lest compatible. This is also what we are ultimately aiming for, whether we decide to take an intermediate "compatibility step" is up to us. I think "Nested public interface" is messy and introduces more complexity, but it might be slightly better defined than the "Current proposal" which might create undefined behaviour. That being said, all the tests are passing. -- https://bugs.ruby-lang.org/
1 0
0 0
[ruby-core:114520] [Ruby master Bug#17263] Fiber context switch degrades with number of fibers, limit on number of fibers
by ioquatix (Samuel Williams) 25 Aug '23

25 Aug '23
Issue #17263 has been updated by ioquatix (Samuel Williams). It looks like roughly 3 page faults per fiber. If I run `x` fibers, I get `3x` page faults. It's proportional to the number of fibers, but I'm not sure how expensive this is. The CPU time is also costly, for `x` fibers, I get `50000x` kernel side CPU cycles. So for sure there is some overhead there. Hard to separate that from setup vs runtime though. ``` > sudo perf stat -e page-faults,cpu-cycles:u,cpu-cycles:k (which ruby) fiber.rb 100000 fibers: 100000 count: 1000000 rate: 2993069.28 Performance counter stats for '/home/samuel/.rubies/ruby-3.2.1/bin/ruby fiber.rb 100000': 323,536 page-faults 2,249,302,328 cpu-cycles:u 4,642,691,199 cpu-cycles:k 1.302578116 seconds time elapsed 0.439950000 seconds user 0.838409000 seconds sys > sudo perf stat -e page-faults,cpu-cycles:u,cpu-cycles:k (which ruby) fiber.rb 1000000 fibers: 1000000 count: 1000000 rate: 3029874.53 Performance counter stats for '/home/samuel/.rubies/ruby-3.2.1/bin/ruby fiber.rb 1000000': 3,210,454 page-faults 5,704,584,641 cpu-cycles:u 49,187,058,607 cpu-cycles:k 10.457917495 seconds time elapsed 1.122311000 seconds user 9.121725000 seconds sys ``` ---------------------------------------- Bug #17263: Fiber context switch degrades with number of fibers, limit on number of fibers https://bugs.ruby-lang.org/issues/17263#change-104316 * Author: ciconia (Sharon Rosner) * Status: Open * Priority: Normal * ruby -v: 2.7.1 * Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN ---------------------------------------- I'm working on developing [Polyphony](https://github.com/digital-fabric/polyphony), a Ruby gem for writing highly-concurrent Ruby programs with fibers. In the course of my work I have come up against two problems using Ruby fibers: 1. Fiber context switching performance seem to degrade as the number of fibers is increased. This is both with `Fiber#transfer` and `Fiber#resume/Fiber.yield`. 2. The number of concurrent fibers that can exist at any time seems to be limited. Once a certain number is reached (on my system this seems to be 31744 fibers), calling `Fiber#transfer` will raise a `FiberError` with the message `can't set a guard page: Cannot allocate memory`. This is not due to RAM being saturated. With 10000 fibers, my test program hovers at around 150MB RSS (on Ruby 2.7.1). Here's a program for testing the performance of `Fiber#transfer`: ```ruby # frozen_string_literal: true require 'fiber' class Fiber attr_accessor :next end def run(num_fibers) count = 0 GC.start GC.disable first = nil last = nil supervisor = Fiber.current num_fibers.times do fiber = Fiber.new do loop do count += 1 if count == 1_000_000 supervisor.transfer else Fiber.current.next.transfer end end end first ||= fiber last.next = fiber if last last = fiber end last.next = first t0 = Time.now first.transfer elapsed = Time.now - t0 rss = `ps -o rss= -p #{Process.pid}`.to_i puts "fibers: #{num_fibers} rss: #{rss} count: #{count} rate: #{count / elapsed}" rescue Exception => e puts "Stopped at #{count} fibers" p e end run(100) run(1000) run(10000) run(100000) ``` With Ruby 2.6.5 I'm getting: ``` fibers: 100 rss: 23212 count: 1000000 rate: 3357675.1688139187 fibers: 1000 rss: 31292 count: 1000000 rate: 2455537.056439736 fibers: 10000 rss: 127388 count: 1000000 rate: 954251.1674325482 Stopped at 22718 fibers #<FiberError: can't set a guard page: Cannot allocate memory> ``` With Ruby 2.7.1 I'm getting: ``` fibers: 100 rss: 23324 count: 1000000 rate: 3443916.967616508 fibers: 1000 rss: 34676 count: 1000000 rate: 2333315.3862491543 fibers: 10000 rss: 151364 count: 1000000 rate: 916772.1008060966 Stopped at 31744 fibers #<FiberError: can't set a guard page: Cannot allocate memory> ``` With ruby-head I get an almost identical result to that of 2.7.1. As you can see, the performance degradation is similar in all the three versions of Ruby, going from ~3.4M context switches per second for 100 fibers to less then 1M context switches per second for 10000 fibers. Running with 100000 fibers fails to complete. Here's a program for testing the performance of `Fiber#resume/Fiber.yield`: ```ruby # frozen_string_literal: true require 'fiber' class Fiber attr_accessor :next end # This program shows how the performance of Fiber.transfer degrades as the fiber # count increases def run(num_fibers) count = 0 GC.start GC.disable fibers = [] num_fibers.times do fibers << Fiber.new { loop { Fiber.yield } } end t0 = Time.now while count < 1000000 fibers.each do |f| count += 1 f.resume end end elapsed = Time.now - t0 puts "fibers: #{num_fibers} count: #{count} rate: #{count / elapsed}" rescue Exception => e puts "Stopped at #{count} fibers" p e end run(100) run(1000) run(10000) run(100000) ``` With Ruby 2.7.1 I'm getting the following output: ``` fibers: 100 count: 1000000 rate: 3048230.049946255 fibers: 1000 count: 1000000 rate: 2362235.6455160403 fibers: 10000 count: 1000000 rate: 950251.7621725246 Stopped at 21745 fibers #<FiberError: can't set a guard page: Cannot allocate memory> ``` As I understand it, theoretically at least switching between fibers should have a constant cost in terms of CPU cycles, irrespective of the number of fibers currently existing in memory. I am completely ignorant the implementation details of Ruby fibers, so at least for now I don't have any idea where this problem is coming from. -- https://bugs.ruby-lang.org/
1 0
0 0
[ruby-core:114519] [Ruby master Bug#17263] Fiber context switch degrades with number of fibers, limit on number of fibers
by ioquatix (Samuel Williams) 25 Aug '23

25 Aug '23
Issue #17263 has been updated by ioquatix (Samuel Williams). The difference is negligible but there did appear to be some improvement. We obviously need better benchmark tools, because this is total eye-ball statistics, but it's expected that less memory dependency between instructions should be better. https://github.com/ruby/ruby/pull/8284 ---------------------------------------- Bug #17263: Fiber context switch degrades with number of fibers, limit on number of fibers https://bugs.ruby-lang.org/issues/17263#change-104315 * Author: ciconia (Sharon Rosner) * Status: Open * Priority: Normal * ruby -v: 2.7.1 * Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN ---------------------------------------- I'm working on developing [Polyphony](https://github.com/digital-fabric/polyphony), a Ruby gem for writing highly-concurrent Ruby programs with fibers. In the course of my work I have come up against two problems using Ruby fibers: 1. Fiber context switching performance seem to degrade as the number of fibers is increased. This is both with `Fiber#transfer` and `Fiber#resume/Fiber.yield`. 2. The number of concurrent fibers that can exist at any time seems to be limited. Once a certain number is reached (on my system this seems to be 31744 fibers), calling `Fiber#transfer` will raise a `FiberError` with the message `can't set a guard page: Cannot allocate memory`. This is not due to RAM being saturated. With 10000 fibers, my test program hovers at around 150MB RSS (on Ruby 2.7.1). Here's a program for testing the performance of `Fiber#transfer`: ```ruby # frozen_string_literal: true require 'fiber' class Fiber attr_accessor :next end def run(num_fibers) count = 0 GC.start GC.disable first = nil last = nil supervisor = Fiber.current num_fibers.times do fiber = Fiber.new do loop do count += 1 if count == 1_000_000 supervisor.transfer else Fiber.current.next.transfer end end end first ||= fiber last.next = fiber if last last = fiber end last.next = first t0 = Time.now first.transfer elapsed = Time.now - t0 rss = `ps -o rss= -p #{Process.pid}`.to_i puts "fibers: #{num_fibers} rss: #{rss} count: #{count} rate: #{count / elapsed}" rescue Exception => e puts "Stopped at #{count} fibers" p e end run(100) run(1000) run(10000) run(100000) ``` With Ruby 2.6.5 I'm getting: ``` fibers: 100 rss: 23212 count: 1000000 rate: 3357675.1688139187 fibers: 1000 rss: 31292 count: 1000000 rate: 2455537.056439736 fibers: 10000 rss: 127388 count: 1000000 rate: 954251.1674325482 Stopped at 22718 fibers #<FiberError: can't set a guard page: Cannot allocate memory> ``` With Ruby 2.7.1 I'm getting: ``` fibers: 100 rss: 23324 count: 1000000 rate: 3443916.967616508 fibers: 1000 rss: 34676 count: 1000000 rate: 2333315.3862491543 fibers: 10000 rss: 151364 count: 1000000 rate: 916772.1008060966 Stopped at 31744 fibers #<FiberError: can't set a guard page: Cannot allocate memory> ``` With ruby-head I get an almost identical result to that of 2.7.1. As you can see, the performance degradation is similar in all the three versions of Ruby, going from ~3.4M context switches per second for 100 fibers to less then 1M context switches per second for 10000 fibers. Running with 100000 fibers fails to complete. Here's a program for testing the performance of `Fiber#resume/Fiber.yield`: ```ruby # frozen_string_literal: true require 'fiber' class Fiber attr_accessor :next end # This program shows how the performance of Fiber.transfer degrades as the fiber # count increases def run(num_fibers) count = 0 GC.start GC.disable fibers = [] num_fibers.times do fibers << Fiber.new { loop { Fiber.yield } } end t0 = Time.now while count < 1000000 fibers.each do |f| count += 1 f.resume end end elapsed = Time.now - t0 puts "fibers: #{num_fibers} count: #{count} rate: #{count / elapsed}" rescue Exception => e puts "Stopped at #{count} fibers" p e end run(100) run(1000) run(10000) run(100000) ``` With Ruby 2.7.1 I'm getting the following output: ``` fibers: 100 count: 1000000 rate: 3048230.049946255 fibers: 1000 count: 1000000 rate: 2362235.6455160403 fibers: 10000 count: 1000000 rate: 950251.7621725246 Stopped at 21745 fibers #<FiberError: can't set a guard page: Cannot allocate memory> ``` As I understand it, theoretically at least switching between fibers should have a constant cost in terms of CPU cycles, irrespective of the number of fibers currently existing in memory. I am completely ignorant the implementation details of Ruby fibers, so at least for now I don't have any idea where this problem is coming from. -- https://bugs.ruby-lang.org/
1 0
0 0
[ruby-core:114518] [Ruby master Bug#18036] Pthread fibers become invalid on fork - different from normal fibers.
by jeremyevans0 (Jeremy Evans) 25 Aug '23

25 Aug '23
Issue #18036 has been updated by jeremyevans0 (Jeremy Evans). Status changed from Open to Closed I tested the example in https://bugs.ruby-lang.org/issues/18036#note-3 in my environment, and since Ruby 3.0, it doesn't fail (it does segfault on Ruby 2.7). I tested with Ruby master and the behavior is the same with both the amd64 coroutine and pthread coroutine, which is that `Fiber.yield` raises an FiberError in the forked process ("attempt to yield on a not resumed fiber"). Since the behavior doesn't appear to be unreliable, I'm going to close this. ---------------------------------------- Bug #18036: Pthread fibers become invalid on fork - different from normal fibers. https://bugs.ruby-lang.org/issues/18036#change-104314 * Author: ioquatix (Samuel Williams) * Status: Closed * Priority: Normal * Assignee: ioquatix (Samuel Williams) * Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN ---------------------------------------- Fork is notoriously hard to use correctly and I most cases we should be encouraging `Process#spawn`. However, it does have use cases for example pre-fork model of server design. So there are some valid usage at least. We recently introduced non-native fiber based on pthread which is generally more compatible than copy coroutine w.r.t. the overall burden on the implementation. However, it has one weak point, which is that pthreads become invalid on fork, and thus fibers become invalid on fork. That means that the following program can become invalid: ``` Fiber.new do fork end.resume ``` It will create two threads, the main thread and the thread for the fiber. When the child begins execution, it will be within the child pthread, but the parent pthread is no longer valid, i.e. it's gone. I see a couple of options here (not mutually exclusive): - Combining Fibers and fork is invalid. Fork only works from main fiber. - Ignore the problem and expect users of fork to be aware that the program can potentially enter an invalid state - okay for `fork-exec` but not much else. - Terminate all non-current fibers as we do for threads, and possibly fail if the current fiber exits for some reason. Because pthread coroutine should be very uncommon, I don't think we should sacrifice the general good qualities of the fiber semantic model for some obscure case. Maybe it would be sufficient to have a warning (not printed by default unless running on pthread coroutines), that fork within a non-main fiber can have undefined results. -- https://bugs.ruby-lang.org/
1 0
0 0
[ruby-core:114516] [Ruby master Bug#17926] spec/ruby/core/file/atime_spec.rb: a random failing test on Travis ppc64le
by jeremyevans0 (Jeremy Evans) 24 Aug '23

24 Aug '23
Issue #17926 has been updated by jeremyevans0 (Jeremy Evans). Status changed from Open to Feedback @jaruga We haven't been seeing this error on the ppc64le RubyCI server recently. Is this OK to close? ---------------------------------------- Bug #17926: spec/ruby/core/file/atime_spec.rb: a random failing test on Travis ppc64le https://bugs.ruby-lang.org/issues/17926#change-104312 * Author: jaruga (Jun Aruga) * Status: Feedback * Priority: Normal * Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN ---------------------------------------- I just observed the following test failing randomly on Travis ppc64le at the commit:fd65ef2a5aa8629676d1edb6410e4d4cf60b8045. When running this commit on my forked repo, the test passed. https://travis-ci.com/github/ruby/ruby/jobs/509310540#L2255 ``` $ $SETARCH make -s test-spec MSPECOPT=-ff ... 1) File.atime returns the last access time for the named file with microseconds FAILED Expected 0 == 123456 to be truthy but was false /home/travis/build/ruby/ruby/spec/ruby/core/file/atime_spec.rb:25:in `block (3 levels) in <top (required)>' /home/travis/build/ruby/ruby/spec/ruby/core/file/atime_spec.rb:3:in `<top (required)>' ``` -- https://bugs.ruby-lang.org/
1 0
0 0
[ruby-core:114515] [Ruby master Bug#17826] Ractor#take hangs if used in multiple Threads
by jeremyevans0 (Jeremy Evans) 24 Aug '23

24 Aug '23
Issue #17826 has been updated by jeremyevans0 (Jeremy Evans). Status changed from Assigned to Closed This appears fixed in Ruby 3.3.0-preview1, though it doesn't work in Ruby 3.2.2. ---------------------------------------- Bug #17826: Ractor#take hangs if used in multiple Threads https://bugs.ruby-lang.org/issues/17826#change-104310 * Author: kukunin (Sergiy Kukunin) * Status: Closed * Priority: Normal * Assignee: ko1 (Koichi Sasada) * ruby -v: ruby 3.0.1p64 (2021-04-05 revision 0fb782ee38) [x86_64-linux] * Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN ---------------------------------------- Hello there. I was playing with Ractors (the awesome technology and the big leap for Ruby) and encountered weird behavior. I tried to schedule and run Ractors in multiple threads, and found out, that Ractor#take hangs even if the ractor is finished. Here is code to reproduce: ``` ruby Array.new(2) do |n| Thread.new do r = Ractor.new do sleep 0.001 end r.take puts "thread #{n} finished" end end.each(&:join) puts 'done' ``` The output is just "thread 0 finished" and the process hangs forever. Sometimes the second thread exists first, and the first freezes. Thank you for your time. Hopefully, it's a valid bug report, not me just misusing the feature =) -- https://bugs.ruby-lang.org/
1 0
0 0
[ruby-core:114514] [Ruby master Bug#17799] Seg fault in rb_class_clear_method_cache
by jeremyevans0 (Jeremy Evans) 24 Aug '23

24 Aug '23
Issue #17799 has been updated by jeremyevans0 (Jeremy Evans). Status changed from Open to Feedback @stanhu Have you had any similar failures with Ruby 3.2? ---------------------------------------- Bug #17799: Seg fault in rb_class_clear_method_cache https://bugs.ruby-lang.org/issues/17799#change-104309 * Author: stanhu (Stan Hu) * Status: Feedback * Priority: Normal * ruby -v: ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-linux] * Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN ---------------------------------------- Recently our CI tests have been intermittently failing with segmentation faults at random points, such as: ``` /builds/gitlab-org/security/gitlab/spec/support/shared_examples/requests/api/issues/merge_requests_count_shared_examples.rb:3: [BUG] Segmentation fault at 0x0000000000000000 ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-linux] -- Control frame information ----------------------------------------------- c:0042 p:0003 s:0237 e:000236 TOP /builds/gitlab-org/security/gitlab/spec/support/shared_examples/requests/api/issues/merge_requests_count_shared_examples.rb:3 [FINISH] c:0041 p:---- s:0234 e:000233 CFUNC :require c:0040 p:0012 s:0229 e:000228 BLOCK /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/bootsnap-1.4.6/lib/bootsnap/load_path_cache/core_ext/kernel_require.r c:0039 p:0070 s:0226 e:000225 METHOD /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/bootsnap-1.4.6/lib/bootsnap/load_path_cache/loaded_features_index.rb: c:0038 p:0025 s:0214 e:000213 METHOD /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/bootsnap-1.4.6/lib/bootsnap/load_path_cache/core_ext/kernel_require.r c:0037 p:0055 s:0208 e:000207 METHOD /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/bootsnap-1.4.6/lib/bootsnap/load_path_cache/core_ext/kernel_require.r c:0036 p:0006 s:0201 e:000200 BLOCK /builds/gitlab-org/security/gitlab/spec/spec_helper.rb:71 [FINISH] c:0035 p:---- s:0197 e:000196 CFUNC :each c:0034 p:0563 s:0193 e:000192 TOP /builds/gitlab-org/security/gitlab/spec/spec_helper.rb:71 [FINISH] c:0033 p:---- s:0187 e:000186 CFUNC :require c:0032 p:0007 s:0182 e:000181 METHOD /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/configuration.rb:2112 c:0031 p:0008 s:0173 e:000172 BLOCK /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/configuration.rb:1574 [FINISH] c:0030 p:---- s:0169 e:000168 CFUNC :each c:0029 p:0042 s:0165 e:000164 METHOD /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/configuration.rb:1574 c:0028 p:0048 s:0159 e:000158 BLOCK /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/configuration_options.rb:113 [FINISH] c:0027 p:---- s:0155 e:000154 CFUNC :each c:0026 p:0019 s:0151 e:000150 METHOD /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/configuration_options.rb:112 c:0025 p:0005 s:0145 e:000144 METHOD /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/configuration_options.rb:22 c:0024 p:0035 s:0140 e:000139 METHOD /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/runner.rb:132 c:0023 p:0007 s:0134 e:000133 METHOD /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/runner.rb:99 c:0022 p:0007 s:0128 e:000127 METHOD /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/runner.rb:86 c:0021 p:0065 s:0122 e:000121 METHOD /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/runner.rb:71 c:0020 p:0020 s:0114 e:000113 METHOD /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/runner.rb:45 c:0019 p:0025 s:0109 e:000108 TOP /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/exe/rspec:4 [FINISH] c:0018 p:---- s:0106 e:000105 CFUNC :load c:0017 p:0112 s:0101 e:000100 TOP /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/bin/rspec:23 [FINISH] c:0016 p:---- s:0096 e:000095 CFUNC :load c:0015 p:0107 s:0091 e:000090 METHOD /usr/local/lib/ruby/2.7.0/bundler/cli/exec.rb:63 c:0014 p:0071 s:0083 e:000082 METHOD /usr/local/lib/ruby/2.7.0/bundler/cli/exec.rb:28 c:0013 p:0024 s:0078 e:000077 METHOD /usr/local/lib/ruby/2.7.0/bundler/cli.rb:476 c:0012 p:0054 s:0073 e:000072 METHOD /usr/local/lib/ruby/2.7.0/bundler/vendor/thor/lib/thor/command.rb:27 c:0011 p:0040 s:0065 e:000064 METHOD /usr/local/lib/ruby/2.7.0/bundler/vendor/thor/lib/thor/invocation.rb:127 c:0010 p:0239 s:0058 e:000057 METHOD /usr/local/lib/ruby/2.7.0/bundler/vendor/thor/lib/thor.rb:399 c:0009 p:0008 s:0045 e:000044 METHOD /usr/local/lib/ruby/2.7.0/bundler/cli.rb:30 c:0008 p:0066 s:0040 e:000039 METHOD /usr/local/lib/ruby/2.7.0/bundler/vendor/thor/lib/thor/base.rb:476 c:0007 p:0008 s:0033 e:000032 METHOD /usr/local/lib/ruby/2.7.0/bundler/cli.rb:24 c:0006 p:0109 s:0028 e:000027 BLOCK /usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/libexec/bundle:46 c:0005 p:0002 s:0022 e:000021 METHOD /usr/local/lib/ruby/2.7.0/bundler/friendly_errors.rb:123 c:0004 p:0111 s:0017 E:001838 TOP /usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/libexec/bundle:34 [FINISH] c:0003 p:---- s:0013 e:000012 CFUNC :load c:0002 p:0112 s:0008 E:002100 EVAL /usr/local/bin/bundle:23 [FINISH] c:0001 p:0000 s:0003 E:001040 (none) [FINISH] -- Ruby level backtrace information ---------------------------------------- /usr/local/bin/bundle:23:in `<main>' /usr/local/bin/bundle:23:in `load' /usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/libexec/bundle:34:in `<top (required)>' /usr/local/lib/ruby/2.7.0/bundler/friendly_errors.rb:123:in `with_friendly_errors' /usr/local/lib/ruby/gems/2.7.0/gems/bundler-2.1.4/libexec/bundle:46:in `block in <top (required)>' /usr/local/lib/ruby/2.7.0/bundler/cli.rb:24:in `start' /usr/local/lib/ruby/2.7.0/bundler/vendor/thor/lib/thor/base.rb:476:in `start' /usr/local/lib/ruby/2.7.0/bundler/cli.rb:30:in `dispatch' /usr/local/lib/ruby/2.7.0/bundler/vendor/thor/lib/thor.rb:399:in `dispatch' /usr/local/lib/ruby/2.7.0/bundler/vendor/thor/lib/thor/invocation.rb:127:in `invoke_command' /usr/local/lib/ruby/2.7.0/bundler/vendor/thor/lib/thor/command.rb:27:in `run' /usr/local/lib/ruby/2.7.0/bundler/cli.rb:476:in `exec' /usr/local/lib/ruby/2.7.0/bundler/cli/exec.rb:28:in `run' /usr/local/lib/ruby/2.7.0/bundler/cli/exec.rb:63:in `kernel_load' /usr/local/lib/ruby/2.7.0/bundler/cli/exec.rb:63:in `load' /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/bin/rspec:23:in `<top (required)>' /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/bin/rspec:23:in `load' /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/exe/rspec:4:in `<top (required)>' /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/runner.rb:45:in `invoke' /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/runner.rb:71:in `run' /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/runner.rb:86:in `run' /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/runner.rb:99:in `setup' /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/runner.rb:132:in `configure' /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/configuration_options.rb:22:in `configure' /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/configuration_options.rb:112:in `process_options_into' /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/configuration_options.rb:112:in `each' /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/configuration_options.rb:113:in `block in process_options_into' /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/configuration.rb:1574:in `requires=' /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/configuration.rb:1574:in `each' /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/configuration.rb:1574:in `block in requires=' /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/configuration.rb:2112:in `load_file_handling_errors' /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/rspec-core-3.10.1/lib/rspec/core/configuration.rb:2112:in `require' /builds/gitlab-org/security/gitlab/spec/spec_helper.rb:71:in `<top (required)>' /builds/gitlab-org/security/gitlab/spec/spec_helper.rb:71:in `each' /builds/gitlab-org/security/gitlab/spec/spec_helper.rb:71:in `block in <top (required)>' /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/bootsnap-1.4.6/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:31:in `require' /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/bootsnap-1.4.6/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:22:in `require_with_bootsnap_lfi' /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/bootsnap-1.4.6/lib/bootsnap/load_path_cache/loaded_features_index.rb:92:in `register' /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/bootsnap-1.4.6/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:23:in `block in require_with_bootsnap_lfi' /builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/gems/bootsnap-1.4.6/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:23:in `require' /builds/gitlab-org/security/gitlab/spec/support/shared_examples/requests/api/issues/merge_requests_count_shared_examples.rb:3:in `<top (required)>' -- Machine register context ------------------------------------------------ RIP: 0x00007fba9179f8fb RBP: 0x00007ffdb2bc4dc0 RSP: 0x00007ffdb2bc3d40 RAX: 0x0000565411171e60 RBX: 0x0000000000000000 RCX: 0x0000000004bf1491 RDX: 0x00007ffdb2bc4dc0 RDI: 0x00005654110bc550 RSI: 0x00007fba9179f8c0 R8: 0x0000565406728098 R9: 0x00007fba91124170 R10: 0x0000565406726010 R11: 0x00007fba91124170 R12: 0x00007fba9179f8c0 R13: 0x0000000004bd5abc R14: 0x000056543d860c70 R15: 0x0000565435cff1e0 EFL: 0x0000000000010246 -- Other runtime information ----------------------------------------------- ``` We managed to generate a core file from this seg fault: ``` $ docker run -v /tmp/bugs:/bugs -it registry.gitlab.com/gitlab-org/gitlab-build-images:ruby-2.7.2.patched-golan… bash root@25a81975afab:/bugs# mkdir -p /builds/gitlab-org/security/gitlab/ root@25a81975afab:/bugs# cd /builds/gitlab-org/security/gitlab/ root@25a81975afab:/builds/gitlab-org/security/gitlab# unzip /bugs/cache.zip Archive: /bugs/cache.zip creating: vendor/gitaly-ruby/ creating: vendor/gitaly-ruby/ruby/ creating: vendor/gitaly-ruby/ruby/2.7.0/ creating: vendor/gitaly-ruby/ruby/2.7.0/bin/ inflating: vendor/gitaly-ruby/ruby/2.7.0/bin/codera <snip> root@25a81975afab:/bugs# gdb /usr/local/bin/ruby --core core.bundle.1618331218.363 GNU gdb (Debian 8.2.1-2+b3) 8.2.1 Copyright (C) 2018 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/local/bin/ruby...done. warning: core file may not match specified executable file. [New LWP 363] [New LWP 533] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `/builds/gitlab-org/security/gitlab/vendor/ruby/2.7.0/bin/rspec -Ispec -rspec_he'. Program terminated with signal SIGABRT, Aborted. #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. [Current thread is 1 (Thread 0x7fba90f65740 (LWP 363))] (gdb) t a a bt Thread 2 (Thread 0x7fba87c62700 (LWP 533)): #0 0x00007fba91056916 in __GI_ppoll (fds=fds@entry=0x7fba87b616d8, nfds=nfds@entry=1, timeout=<optimized out>, timeout@entry=0x7fba87b616e0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0x00007fba91771890 in rb_sigwait_sleep (th=th@entry=0x5654120da230, sigwait_fd=sigwait_fd@entry=3, rel=rel@entry=0x7fba87b61790) at hrtime.h:148 #2 0x00007fba91772599 in native_sleep (th=0x5654120da230, rel=0x7fba87b61790) at thread_pthread.c:2099 #3 0x00007fba91775e2f in sleep_hrtime (fl=2, rel=<optimized out>, th=0x5654120da230) at thread.c:1303 #4 rb_thread_wait_for (time=...) at thread.c:1351 #5 0x00007fba916e10e0 in rb_f_sleep (argc=1, argv=0x7fba87b61d58, _=<optimized out>) at process.c:4886 #6 0x00007fba917a4c39 in vm_call_cfunc_with_frame (empty_kw_splat=<optimized out>, cd=0x56540b8b7a80, calling=<optimized out>, reg_cfp=0x7fba87c61ca0, ec=0x5654120da410) at vm_insnhelper.c:2514 #7 vm_call_cfunc (ec=0x5654120da410, reg_cfp=0x7fba87c61ca0, calling=<optimized out>, cd=0x56540b8b7a80) at vm_insnhelper.c:2539 #8 0x00007fba917bd6bc in vm_call_method_each_type (ec=0x5654120da410, cfp=0x7fba87c61ca0, calling=0x7fba87b61a00, cd=0x56540b8b7a80) at vm_insnhelper.c:2925 #9 0x00007fba917bde55 in vm_call_method_each_type (cd=<optimized out>, calling=<optimized out>, cfp=<optimized out>, ec=<optimized out>) at vm_insnhelper.c:3026 #10 vm_call_method (ec=0x5654120da410, cfp=0x7fba87c61ca0, calling=<optimized out>, cd=<optimized out>) at vm_insnhelper.c:3053 #11 0x00007fba917b0072 in vm_sendish (block_handler=<optimized out>, method_explorer=<optimized out>, cd=<optimized out>, reg_cfp=<optimized out>, ec=<optimized out>) at vm_insnhelper.c:4023 #12 vm_exec_core (ec=0x7fba87b616d8, initial=1) at insns.def:801 #13 0x00007fba917b5b8c in rb_vm_exec (ec=0x5654120da410, mjit_enable_p=1) at vm.c:1920 #14 0x00007fba917b729c in invoke_iseq_block_from_c (me=0x0, is_lambda=<optimized out>, cref=0x0, passed_block_handler=0, kw_splat=<optimized out>, argv=<optimized out>, argc=1, self=94918931276240, captured=<optimized out>, ec=0x5654120da410) at vm.c:1116 #15 invoke_block_from_c_proc (me=0x0, is_lambda=<optimized out>, passed_block_handler=0, kw_splat=<optimized out>, argv=<optimized out>, argc=1, self=94918931276240, proc=0x5654120da410, ec=0x5654120da410) at vm.c:1216 #16 vm_invoke_proc (passed_block_handler=0, kw_splat=<optimized out>, argv=<optimized out>, argc=1, self=94918931276240, proc=0x5654120da410, ec=0x5654120da410) at vm.c:1238 #17 rb_vm_invoke_proc (ec=0x5654120da410, proc=proc@entry=0x5654135f2920, argc=1, argv=<optimized out>, kw_splat=<optimized out>, passed_block_handler=passed_block_handler@entry=0) at vm.c:1259 #18 0x00007fba9177447d in thread_do_start (th=0x5654120da230) at thread.c:697 #19 0x00007fba917764ff in thread_start_func_2 (th=0x5654120da230, stack_start=<optimized out>) at thread.c:745 #20 0x00007fba91776a44 in thread_start_func_1 (th_ptr=<optimized out>) at thread_pthread.c:969 #21 0x00007fba912fefa3 in start_thread (arg=<optimized out>) at pthread_create.c:486 #22 0x00007fba910614cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Thread 1 (Thread 0x7fba90f65740 (LWP 363)): #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007fba90f8a535 in __GI_abort () at abort.c:79 #2 0x00007fba9157275b in die () at error.c:664 #3 rb_bug_for_fatal_signal (default_sighandler=0x0, sig=sig@entry=11, ctx=ctx@entry=0x565406831a00, fmt=fmt@entry=0x7fba91808f8b "Segmentation fault at %p") at error.c:664 #4 0x00007fba917314db in sigsegv (sig=11, info=0x565406831b30, ctx=0x565406831a00) at signal.c:946 #5 <signal handler called> #6 rb_class_clear_method_cache (klass=0, arg=140439281334464) at vm.c:362 #7 0x00007fba9159b33d in rb_class_foreach_subclass (arg=8, f=<optimized out>, klass=<optimized out>) at class.c:122 #8 rb_class_detach_module_subclasses (klass=<optimized out>) at class.c:147 #9 0x0000000000000000 in ?? () (gdb) ``` This seg fault seems to have occurred `rb_class_clear_method_cache`, perhaps in https://github.com/ruby/ruby/blob/5445e0435260b449decf2ac16f9d09bae3cafe72/… ---Files-------------------------------- job.log (1.93 MB) -- https://bugs.ruby-lang.org/
1 0
0 0
[ruby-core:114513] [Ruby master Feature#17678] Ractors do not restart after fork
by jeremyevans0 (Jeremy Evans) 24 Aug '23

24 Aug '23
Issue #17678 has been updated by jeremyevans0 (Jeremy Evans). Tracker changed from Bug to Feature ruby -v deleted (ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-linux]) Backport deleted (2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN) As Ractors always use separate OS threads, and fork only runs the current thread in the forked process, I don't see a way for Ractors to continue where they left off after fork. I think auto-starting would likely be a bad idea, because auto-starting would not return them to the state they were at fork. The addition of `Ractor#alive?` and/or `Ractor#status` makes sense to me. Even in non-forked processes such methods could be useful. Note that you can get what you want already, by calling `Ractor#inspect`, so these methods would only need to expose information that Ractor is already storing. ---------------------------------------- Feature #17678: Ractors do not restart after fork https://bugs.ruby-lang.org/issues/17678#change-104304 * Author: ivoanjo (Ivo Anjo) * Status: Assigned * Priority: Normal * Assignee: ko1 (Koichi Sasada) ---------------------------------------- Hello there! I'm working at Datadog on the `ddtrace` gem -- <https://github.com/DataDog/dd-trace-rb> and we're experimenting with using Ractors in our library but run into a few issues. ### Background When running a Ractor as a background process, the Ractor stops & does not restart when the application forks. ### How to reproduce (Ruby version & script) `ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-linux]` ```ruby r2 = Ractor.new do loop { puts "[#{Process.pid}] Ractor"; sleep(1) } end sleep(1) puts "[#{Process.pid}] Forking..." fork do sleep(5) puts "[#{Process.pid}] End fork." end loop do sleep(1) end ``` ### Expectation and result The application prints “Ractor” each second in the main process, but not in the fork. Expected the Ractor (defined as `r2`) to run in the fork. ``` [29] Ractor [29] Ractor [29] Forking... [29] Ractor [29] Ractor [29] Ractor [29] Ractor [29] Ractor [32] End fork. [29] Ractor [29] Ractor [29] Ractor ``` ### Additional notes Threads do not restart across forks either, so it might not be unreasonable to expect consistent behavior. However, it’s possible to detect a dead Thread and recreate it after a fork (e.g. with `#alive?`, `#status`), but there’s no such mechanism for Ractors. ### Suggested solutions 1. Auto-restart Ractors after fork 2. Add additional methods to Ractors that allow users to check & manage the status of the Ractor, similar to Thread. -- https://bugs.ruby-lang.org/
1 0
0 0
  • ← Newer
  • 1
  • ...
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • ...
  • 21
  • Older →

HyperKitty Powered by HyperKitty version 1.3.12.