[ruby-core:117905] [Ruby master Bug#20493] Segfault on rb_io_getline_fast

Issue #20493 has been reported by josegomezr (Jose Gomez). ---------------------------------------- Bug #20493: Segfault on rb_io_getline_fast https://bugs.ruby-lang.org/issues/20493 * Author: josegomezr (Jose Gomez) * Status: Open * ruby -v: 3.3.1 * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- We've spotted a consistent segfault when running bundle install with `--jobs 4` When running: `bundle install -j 4` we'd get a Segfault at: ``` /usr/lib64/ruby/3.3.0/rubygems/ext/builder.rb:93: [BUG] Segmentation fault at 0x0000000000000014 ruby 3.3.1 (2024-04-23 revision c56cd86388) [x86_64-linux-gnu] ``` Full [log is available here][0]. I could not find a shorter reproducer besides using bundler with `--jobs 4` or `--jobs 8`. Here's a sample command to trigger the behavior (it creates the Gemfile and calls bundler) [1]. We installed all debug symbols and narrowed down the location of the segfault to `rb_io_getline_fast` in io.c At [line 4001][3] `str` is `T_NONE`, which makes further usage down the line in [`io_enc_str`][4] raise a null pointer dereference. With the notes from [extension.rdoc - Appendix E. RB_GC_GUARD to protect from premature GC][8] I've prepared a patched ruby 3.3.1 package that does not segfault. It's on [OBS Project home:josegomezr:branches:ruby/ruby3.3][6]. Adding a `RB_GC_GUARD` on `rb_io_getline_fast` @ `io.c:4004` just before the return ```diff --- ruby3.3.orig/ruby-3.3.1/io.c +++ ruby3.3/ruby-3.3.1/io.c @@ -4004,6 +4004,7 @@ rb_io_getline_fast(rb_io_t *fptr, rb_enc ENC_CODERANGE_SET(str, cr); fptr->lineno++; + RB_GC_GUARD(str); return str; } ``` Fixes the segfault in our tests. `bundle` finish the installation and the image is built. I've set up a project in OBS to provide reproduceables. - [ruby3.3.1 package][5]. - [ruby3.3.1 base image with enough dependencies to reproduce][7] with [the reproducer script][1]. And the corresponding container is exported in the `containers-patched` repository. Here I leave the docker images generated by OBS: - 3.3.1 [without patches, segfaults.] ``` registry.opensuse.org/home/josegomezr/branches/ruby/containers/containers/base-ruby33:latest ``` - 3.3.1 [with patch, does not fail] ``` registry.opensuse.org/home/josegomezr/branches/ruby/containers/containers-patched/base-ruby33:latest ``` [0]: https://gist.github.com/josegomezr/441c271cc731b0ec57213cb98743a699 [1]: https://gist.github.com/josegomezr/e17129bf2df33f3bea60e84a616a8322 [2]: https://gist.github.com/josegomezr/6f81878c979af334efee59b8f2225e58 [3]: https://github.com/ruby/ruby/blob/v3_3_1/io.c#L4001 [4]: https://github.com/ruby/ruby/blob/v3_3_1/io.c#L4003 [5]: https://build.opensuse.org/package/show/devel:languages:ruby/ruby3.3 [6]: https://build.opensuse.org/package/show/home:josegomezr:branches:ruby/ruby3.... [7]: https://build.opensuse.org/package/show/home:josegomezr:branches:ruby:contai... [8]: https://github.com/ruby/ruby/blob/master/doc/extension.rdoc#label-Appendix+E... -- https://bugs.ruby-lang.org/

Issue #20493 has been updated by kjtsanaktsidis (KJ Tsanaktsidis). Assignee set to kjtsanaktsidis (KJ Tsanaktsidis) ---------------------------------------- Bug #20493: Segfault on rb_io_getline_fast https://bugs.ruby-lang.org/issues/20493#change-108324 * Author: josegomezr (Jose Gomez) * Status: Open * Assignee: kjtsanaktsidis (KJ Tsanaktsidis) * ruby -v: 3.3.1 * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- We've spotted a consistent segfault when running bundle install with `--jobs 4` When running: `bundle install -j 4` we'd get a Segfault at: ``` /usr/lib64/ruby/3.3.0/rubygems/ext/builder.rb:93: [BUG] Segmentation fault at 0x0000000000000014 ruby 3.3.1 (2024-04-23 revision c56cd86388) [x86_64-linux-gnu] ``` Full [log is available here][0]. I could not find a shorter reproducer besides using bundler with `--jobs 4` or `--jobs 8`. Here's a sample command to trigger the behavior (it creates the Gemfile and calls bundler) [1]. We installed all debug symbols and narrowed down the location of the segfault to `rb_io_getline_fast` in io.c At [line 4001][3] `str` is `T_NONE`, which makes further usage down the line in [`io_enc_str`][4] raise a null pointer dereference. With the notes from [extension.rdoc - Appendix E. RB_GC_GUARD to protect from premature GC][8] I've prepared a patched ruby 3.3.1 package that does not segfault. It's on [OBS Project home:josegomezr:branches:ruby/ruby3.3][6]. Adding a `RB_GC_GUARD` on `rb_io_getline_fast` @ `io.c:4004` just before the return ```diff --- ruby3.3.orig/ruby-3.3.1/io.c +++ ruby3.3/ruby-3.3.1/io.c @@ -4004,6 +4004,7 @@ rb_io_getline_fast(rb_io_t *fptr, rb_enc ENC_CODERANGE_SET(str, cr); fptr->lineno++; + RB_GC_GUARD(str); return str; } ``` Fixes the segfault in our tests. `bundle` finish the installation and the image is built. I've set up a project in OBS to provide reproduceables. - [ruby3.3.1 package][5]. - [ruby3.3.1 base image with enough dependencies to reproduce][7] with [the reproducer script][1]. And the corresponding container is exported in the `containers-patched` repository. Here I leave the docker images generated by OBS: - 3.3.1 [without patches, segfaults.] ``` registry.opensuse.org/home/josegomezr/branches/ruby/containers/containers/base-ruby33:latest ``` - 3.3.1 [with patch, does not fail] ``` registry.opensuse.org/home/josegomezr/branches/ruby/containers/containers-patched/base-ruby33:latest ``` [0]: https://gist.github.com/josegomezr/441c271cc731b0ec57213cb98743a699 [1]: https://gist.github.com/josegomezr/e17129bf2df33f3bea60e84a616a8322 [2]: https://gist.github.com/josegomezr/6f81878c979af334efee59b8f2225e58 [3]: https://github.com/ruby/ruby/blob/v3_3_1/io.c#L4001 [4]: https://github.com/ruby/ruby/blob/v3_3_1/io.c#L4003 [5]: https://build.opensuse.org/package/show/devel:languages:ruby/ruby3.3 [6]: https://build.opensuse.org/package/show/home:josegomezr:branches:ruby/ruby3.... [7]: https://build.opensuse.org/package/show/home:josegomezr:branches:ruby:contai... [8]: https://github.com/ruby/ruby/blob/master/doc/extension.rdoc#label-Appendix+E... -- https://bugs.ruby-lang.org/

Issue #20493 has been updated by kjtsanaktsidis (KJ Tsanaktsidis). Thank you for your bug report and excellent reproduction! I've had a look at this today - I haven't reached a conclusion yet, but I want to write down where I got so far. The crash is happening because: * One thread is calling `rb_io_getline_fast` * The VALUE variable `str` is saved in register `%r15` * `rb_io_getline_fast` calls a chain of functions that eventually yields the GVL and performs a syscall * None of these other functions spill `%r15` to the stack. It's present only in `%r15` for the duration of the blocking system call. * Another Ruby thread runs and performs a GC * As part of that, it marks `ec->machine` for all other threads. This includes a representation of that thread's register state in `jmpbuf`. * However, the value for `str` seems not to be in `jmpbuf` when it performs GC marking. In theory, we're supposed to capture Ruby VALUE pointers only present in machine registers by (ab)using `setjmp` to capture an opaque blob, which should include all of the values of the machine's registers at the point `setjmp` was called. But that seems not to be happening here. Your patch works around the problem by explicitly spilling `str` to the stack with RB_GC_GUARD, so that the other thread can see it. However, this _shouldn't_ be nescessary. I believe the C extension rules here are that an RB_GC_GUARD is only necessary when you dereference and use part of a VALUE but don't retain a reference to the whole thing - i.e. code like this needs a guard: ``` VALUE str = rb_str_new_cstr(...); some_c_api(RSTRING_PTR(str), RSTRING_LEN(str)); // actually does ((struct RString *)str)->as.heap.ptr and ((struct RString *)str)->as.heap.len something_which_might_gc(); some_other_api(RSTRING_PTR(str), RSTRING_LEN(str)); RB_GC_GUARD(str); // nescessary here! ``` Code like this _shouldn't_ need a guard, i think, and plenty such code exists without a guard ``` VALUE str = rb_str_new_cstr(...); some_c_api(RSTRING_PTR(str), RSTRING_LEN(str)); // actually does ((struct RString *)str)->as.heap.ptr and ((struct RString *)str)->as.heap.len something_which_might_gc(); some_other_api(RSTRING_PTR(str), RSTRING_LEN(str)); return str; // Not guard needed; we still use str after the GC, so the compiler will necessarily keep the value of it around on the stack or in a register ``` Of course, this whole scheme could collapse if the compiler decided to do something like increment `str` in-place and then decrement it later, but it seems to work well enough _in practice_ :/ So I'm going to keep investigating why the `str` is not being captured in `ec->machine.jmpbuf`. ---------------------------------------- Bug #20493: Segfault on rb_io_getline_fast https://bugs.ruby-lang.org/issues/20493#change-108325 * Author: josegomezr (Jose Gomez) * Status: Open * Assignee: kjtsanaktsidis (KJ Tsanaktsidis) * ruby -v: 3.3.1 * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- We've spotted a consistent segfault when running bundle install with `--jobs 4` When running: `bundle install -j 4` we'd get a Segfault at: ``` /usr/lib64/ruby/3.3.0/rubygems/ext/builder.rb:93: [BUG] Segmentation fault at 0x0000000000000014 ruby 3.3.1 (2024-04-23 revision c56cd86388) [x86_64-linux-gnu] ``` Full [log is available here][0]. I could not find a shorter reproducer besides using bundler with `--jobs 4` or `--jobs 8`. Here's a sample command to trigger the behavior (it creates the Gemfile and calls bundler) [1]. We installed all debug symbols and narrowed down the location of the segfault to `rb_io_getline_fast` in io.c At [line 4001][3] `str` is `T_NONE`, which makes further usage down the line in [`io_enc_str`][4] raise a null pointer dereference. With the notes from [extension.rdoc - Appendix E. RB_GC_GUARD to protect from premature GC][8] I've prepared a patched ruby 3.3.1 package that does not segfault. It's on [OBS Project home:josegomezr:branches:ruby/ruby3.3][6]. Adding a `RB_GC_GUARD` on `rb_io_getline_fast` @ `io.c:4004` just before the return ```diff --- ruby3.3.orig/ruby-3.3.1/io.c +++ ruby3.3/ruby-3.3.1/io.c @@ -4004,6 +4004,7 @@ rb_io_getline_fast(rb_io_t *fptr, rb_enc ENC_CODERANGE_SET(str, cr); fptr->lineno++; + RB_GC_GUARD(str); return str; } ``` Fixes the segfault in our tests. `bundle` finish the installation and the image is built. I've set up a project in OBS to provide reproduceables. - [ruby3.3.1 package][5]. - [ruby3.3.1 base image with enough dependencies to reproduce][7] with [the reproducer script][1]. And the corresponding container is exported in the `containers-patched` repository. Here I leave the docker images generated by OBS: - 3.3.1 [without patches, segfaults.] ``` registry.opensuse.org/home/josegomezr/branches/ruby/containers/containers/base-ruby33:latest ``` - 3.3.1 [with patch, does not fail] ``` registry.opensuse.org/home/josegomezr/branches/ruby/containers/containers-patched/base-ruby33:latest ``` [0]: https://gist.github.com/josegomezr/441c271cc731b0ec57213cb98743a699 [1]: https://gist.github.com/josegomezr/e17129bf2df33f3bea60e84a616a8322 [2]: https://gist.github.com/josegomezr/6f81878c979af334efee59b8f2225e58 [3]: https://github.com/ruby/ruby/blob/v3_3_1/io.c#L4001 [4]: https://github.com/ruby/ruby/blob/v3_3_1/io.c#L4003 [5]: https://build.opensuse.org/package/show/devel:languages:ruby/ruby3.3 [6]: https://build.opensuse.org/package/show/home:josegomezr:branches:ruby/ruby3.... [7]: https://build.opensuse.org/package/show/home:josegomezr:branches:ruby:contai... [8]: https://github.com/ruby/ruby/blob/master/doc/extension.rdoc#label-Appendix+E... -- https://bugs.ruby-lang.org/

Issue #20493 has been updated by kjtsanaktsidis (KJ Tsanaktsidis). OK, I believe I've worked out what's wrong here. The BLOCKING_REGION macro is used to release the GVL, run some code, and re-acquire the GVL. It looks like this: ``` #define BLOCKING_REGION(th, exec, ubf, ubfarg, fail_if_interrupted) do { \ struct rb_blocking_region_buffer __region; \ if (blocking_region_begin(th, &__region, (ubf), (ubfarg), fail_if_interrupted) || \ /* always return true unless fail_if_interrupted */ \ !only_if_constant(fail_if_interrupted, TRUE)) { \ exec; \ blocking_region_end(th, &__region); \ }; \ } while(0) ``` `blocking_region_begin` calls `RB_VM_SAVE_MACHINE_CONTEXT`, which: * Updates the "end of machine stack" pointer in `ec->machine` to be the current stack pointer, * Abuses `setjmp` to save the register state in the `ec->machine` structure as well The idea being that while the thread is off doing things in C-land, another thread performing GC can mark the portion of the machine stack from before the `BLOCKING_REGION` call. The problem is that the disassembly for `blocking_region_begin` looks like this: ``` (rr) disassemble Dump of assembler code for function blocking_region_begin: 0x000074794fa36900 <+0>: push %r15 0x000074794fa36902 <+2>: push %r14 0x000074794fa36904 <+4>: mov %rdx,%r14 0x000074794fa36907 <+7>: push %r13 0x000074794fa36909 <+9>: push %r12 0x000074794fa3690b <+11>: mov %rsi,%r12 0x000074794fa3690e <+14>: push %rbp 0x000074794fa3690f <+15>: push %rbx ``` i.e. it pushes several of the callee-saved registers to the stack, and uses them for its own purposes - including `%r15`, which had the value of `str` from the `rb_io_getline_fast` frame below. When it calls `setjmp` to capture the register state, the value of `%r15` that's saved is the value used in _this_ function, not the old value. That would be OK, because it's spilled to the stack, and the "end of stack pointer" is also incremented, so it should be reachable by the GC that way. However, `blocking_region_begin` is a function, and it's not inlined, the machine stack pointer is restored at the function end. Then, the memory where `%r15` and hence `str` were spilled to gets re-used for the C code doing the without-the-GVL work. I.e. the sequence of operations is: * Call `blocking_region_begin`. Stack pointer is decremented (stack made bigger) * Callee-saved registers including the value of `str` are spilled to the stack. * %r15 is overwritten * Current $sp is set as EC's "end of the stack" value. * Registers are saved to EC's regs structure * `blocking_region_begin` returns. Stack pointer is incremented (stack made smaller). The memory containing the value of `str` now lies beyond the end of the stack. The real value of `str` is returned to %r15, because it's callee-saved. * More work happens. The stack is made bigger again by other functions, and junk is written to that stack slot * GC marking then can't see the value of `str` A "quick fix" for this is to inline the saving of machine stack state into the `BLOCKING_REGION` macro. That way, it's not in a separate function call, and so the stack frame containing the spilled registers is not invalidated until after the blocking completes. Longer term, though, I'd like to rethink how this works a bit. I think `BLOCKING_REGION` being a kind of "macro accepting a block" is a bad idea, because it mixes up the careful stack management that it has to do with the stack state of the caller. I think it should be converted to use a function pointer callback like `rb_thread_call_without_gvl` etc, and we should also consider using hand-written assembly to perform the stack management I think (like what we use in the fiber switching code). ---------------------------------------- Bug #20493: Segfault on rb_io_getline_fast https://bugs.ruby-lang.org/issues/20493#change-108326 * Author: josegomezr (Jose Gomez) * Status: Open * Assignee: kjtsanaktsidis (KJ Tsanaktsidis) * ruby -v: 3.3.1 * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- We've spotted a consistent segfault when running bundle install with `--jobs 4` When running: `bundle install -j 4` we'd get a Segfault at: ``` /usr/lib64/ruby/3.3.0/rubygems/ext/builder.rb:93: [BUG] Segmentation fault at 0x0000000000000014 ruby 3.3.1 (2024-04-23 revision c56cd86388) [x86_64-linux-gnu] ``` Full [log is available here][0]. I could not find a shorter reproducer besides using bundler with `--jobs 4` or `--jobs 8`. Here's a sample command to trigger the behavior (it creates the Gemfile and calls bundler) [1]. We installed all debug symbols and narrowed down the location of the segfault to `rb_io_getline_fast` in io.c At [line 4001][3] `str` is `T_NONE`, which makes further usage down the line in [`io_enc_str`][4] raise a null pointer dereference. With the notes from [extension.rdoc - Appendix E. RB_GC_GUARD to protect from premature GC][8] I've prepared a patched ruby 3.3.1 package that does not segfault. It's on [OBS Project home:josegomezr:branches:ruby/ruby3.3][6]. Adding a `RB_GC_GUARD` on `rb_io_getline_fast` @ `io.c:4004` just before the return ```diff --- ruby3.3.orig/ruby-3.3.1/io.c +++ ruby3.3/ruby-3.3.1/io.c @@ -4004,6 +4004,7 @@ rb_io_getline_fast(rb_io_t *fptr, rb_enc ENC_CODERANGE_SET(str, cr); fptr->lineno++; + RB_GC_GUARD(str); return str; } ``` Fixes the segfault in our tests. `bundle` finish the installation and the image is built. I've set up a project in OBS to provide reproduceables. - [ruby3.3.1 package][5]. - [ruby3.3.1 base image with enough dependencies to reproduce][7] with [the reproducer script][1]. And the corresponding container is exported in the `containers-patched` repository. Here I leave the docker images generated by OBS: - 3.3.1 [without patches, segfaults.] ``` registry.opensuse.org/home/josegomezr/branches/ruby/containers/containers/base-ruby33:latest ``` - 3.3.1 [with patch, does not fail] ``` registry.opensuse.org/home/josegomezr/branches/ruby/containers/containers-patched/base-ruby33:latest ``` [0]: https://gist.github.com/josegomezr/441c271cc731b0ec57213cb98743a699 [1]: https://gist.github.com/josegomezr/e17129bf2df33f3bea60e84a616a8322 [2]: https://gist.github.com/josegomezr/6f81878c979af334efee59b8f2225e58 [3]: https://github.com/ruby/ruby/blob/v3_3_1/io.c#L4001 [4]: https://github.com/ruby/ruby/blob/v3_3_1/io.c#L4003 [5]: https://build.opensuse.org/package/show/devel:languages:ruby/ruby3.3 [6]: https://build.opensuse.org/package/show/home:josegomezr:branches:ruby/ruby3.... [7]: https://build.opensuse.org/package/show/home:josegomezr:branches:ruby:contai... [8]: https://github.com/ruby/ruby/blob/master/doc/extension.rdoc#label-Appendix+E... -- https://bugs.ruby-lang.org/

Issue #20493 has been updated by kjtsanaktsidis (KJ Tsanaktsidis). I opened https://github.com/ruby/ruby/pull/10795 as a proposed fix for this. Let's see what other people (and CI) think :) ---------------------------------------- Bug #20493: Segfault on rb_io_getline_fast https://bugs.ruby-lang.org/issues/20493#change-108327 * Author: josegomezr (Jose Gomez) * Status: Open * Assignee: kjtsanaktsidis (KJ Tsanaktsidis) * ruby -v: 3.3.1 * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- We've spotted a consistent segfault when running bundle install with `--jobs 4` When running: `bundle install -j 4` we'd get a Segfault at: ``` /usr/lib64/ruby/3.3.0/rubygems/ext/builder.rb:93: [BUG] Segmentation fault at 0x0000000000000014 ruby 3.3.1 (2024-04-23 revision c56cd86388) [x86_64-linux-gnu] ``` Full [log is available here][0]. I could not find a shorter reproducer besides using bundler with `--jobs 4` or `--jobs 8`. Here's a sample command to trigger the behavior (it creates the Gemfile and calls bundler) [1]. We installed all debug symbols and narrowed down the location of the segfault to `rb_io_getline_fast` in io.c At [line 4001][3] `str` is `T_NONE`, which makes further usage down the line in [`io_enc_str`][4] raise a null pointer dereference. With the notes from [extension.rdoc - Appendix E. RB_GC_GUARD to protect from premature GC][8] I've prepared a patched ruby 3.3.1 package that does not segfault. It's on [OBS Project home:josegomezr:branches:ruby/ruby3.3][6]. Adding a `RB_GC_GUARD` on `rb_io_getline_fast` @ `io.c:4004` just before the return ```diff --- ruby3.3.orig/ruby-3.3.1/io.c +++ ruby3.3/ruby-3.3.1/io.c @@ -4004,6 +4004,7 @@ rb_io_getline_fast(rb_io_t *fptr, rb_enc ENC_CODERANGE_SET(str, cr); fptr->lineno++; + RB_GC_GUARD(str); return str; } ``` Fixes the segfault in our tests. `bundle` finish the installation and the image is built. I've set up a project in OBS to provide reproduceables. - [ruby3.3.1 package][5]. - [ruby3.3.1 base image with enough dependencies to reproduce][7] with [the reproducer script][1]. And the corresponding container is exported in the `containers-patched` repository. Here I leave the docker images generated by OBS: - 3.3.1 [without patches, segfaults.] ``` registry.opensuse.org/home/josegomezr/branches/ruby/containers/containers/base-ruby33:latest ``` - 3.3.1 [with patch, does not fail] ``` registry.opensuse.org/home/josegomezr/branches/ruby/containers/containers-patched/base-ruby33:latest ``` [0]: https://gist.github.com/josegomezr/441c271cc731b0ec57213cb98743a699 [1]: https://gist.github.com/josegomezr/e17129bf2df33f3bea60e84a616a8322 [2]: https://gist.github.com/josegomezr/6f81878c979af334efee59b8f2225e58 [3]: https://github.com/ruby/ruby/blob/v3_3_1/io.c#L4001 [4]: https://github.com/ruby/ruby/blob/v3_3_1/io.c#L4003 [5]: https://build.opensuse.org/package/show/devel:languages:ruby/ruby3.3 [6]: https://build.opensuse.org/package/show/home:josegomezr:branches:ruby/ruby3.... [7]: https://build.opensuse.org/package/show/home:josegomezr:branches:ruby:contai... [8]: https://github.com/ruby/ruby/blob/master/doc/extension.rdoc#label-Appendix+E... -- https://bugs.ruby-lang.org/

Issue #20493 has been updated by kjtsanaktsidis (KJ Tsanaktsidis). After applying that fix to the RPM spec file in https://build.opensuse.org/package/show/devel:languages:ruby/ruby3.3, I can't get the reproduction to crash any more. ---------------------------------------- Bug #20493: Segfault on rb_io_getline_fast https://bugs.ruby-lang.org/issues/20493#change-108328 * Author: josegomezr (Jose Gomez) * Status: Open * Assignee: kjtsanaktsidis (KJ Tsanaktsidis) * ruby -v: 3.3.1 * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- We've spotted a consistent segfault when running bundle install with `--jobs 4` When running: `bundle install -j 4` we'd get a Segfault at: ``` /usr/lib64/ruby/3.3.0/rubygems/ext/builder.rb:93: [BUG] Segmentation fault at 0x0000000000000014 ruby 3.3.1 (2024-04-23 revision c56cd86388) [x86_64-linux-gnu] ``` Full [log is available here][0]. I could not find a shorter reproducer besides using bundler with `--jobs 4` or `--jobs 8`. Here's a sample command to trigger the behavior (it creates the Gemfile and calls bundler) [1]. We installed all debug symbols and narrowed down the location of the segfault to `rb_io_getline_fast` in io.c At [line 4001][3] `str` is `T_NONE`, which makes further usage down the line in [`io_enc_str`][4] raise a null pointer dereference. With the notes from [extension.rdoc - Appendix E. RB_GC_GUARD to protect from premature GC][8] I've prepared a patched ruby 3.3.1 package that does not segfault. It's on [OBS Project home:josegomezr:branches:ruby/ruby3.3][6]. Adding a `RB_GC_GUARD` on `rb_io_getline_fast` @ `io.c:4004` just before the return ```diff --- ruby3.3.orig/ruby-3.3.1/io.c +++ ruby3.3/ruby-3.3.1/io.c @@ -4004,6 +4004,7 @@ rb_io_getline_fast(rb_io_t *fptr, rb_enc ENC_CODERANGE_SET(str, cr); fptr->lineno++; + RB_GC_GUARD(str); return str; } ``` Fixes the segfault in our tests. `bundle` finish the installation and the image is built. I've set up a project in OBS to provide reproduceables. - [ruby3.3.1 package][5]. - [ruby3.3.1 base image with enough dependencies to reproduce][7] with [the reproducer script][1]. And the corresponding container is exported in the `containers-patched` repository. Here I leave the docker images generated by OBS: - 3.3.1 [without patches, segfaults.] ``` registry.opensuse.org/home/josegomezr/branches/ruby/containers/containers/base-ruby33:latest ``` - 3.3.1 [with patch, does not fail] ``` registry.opensuse.org/home/josegomezr/branches/ruby/containers/containers-patched/base-ruby33:latest ``` [0]: https://gist.github.com/josegomezr/441c271cc731b0ec57213cb98743a699 [1]: https://gist.github.com/josegomezr/e17129bf2df33f3bea60e84a616a8322 [2]: https://gist.github.com/josegomezr/6f81878c979af334efee59b8f2225e58 [3]: https://github.com/ruby/ruby/blob/v3_3_1/io.c#L4001 [4]: https://github.com/ruby/ruby/blob/v3_3_1/io.c#L4003 [5]: https://build.opensuse.org/package/show/devel:languages:ruby/ruby3.3 [6]: https://build.opensuse.org/package/show/home:josegomezr:branches:ruby/ruby3.... [7]: https://build.opensuse.org/package/show/home:josegomezr:branches:ruby:contai... [8]: https://github.com/ruby/ruby/blob/master/doc/extension.rdoc#label-Appendix+E... -- https://bugs.ruby-lang.org/

Issue #20493 has been updated by josegomezr (Jose Gomez). Thanks for this extremely detailed explanation! Just for my own understanding [I'm not as proficient in low level C, took me a while reading through all details], can I assert that: the fix I suggested would've only "fixed" the segfault on a reduced scope [what I understand are the C definition of the ruby `IO#read*` methods] but wouldn't solve it across the board because the root cause lies on how the context is saved/restored when yielding execution to C-land functions? ---------------------------------------- Bug #20493: Segfault on rb_io_getline_fast https://bugs.ruby-lang.org/issues/20493#change-108330 * Author: josegomezr (Jose Gomez) * Status: Open * Assignee: kjtsanaktsidis (KJ Tsanaktsidis) * ruby -v: 3.3.1 * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- We've spotted a consistent segfault when running bundle install with `--jobs 4` When running: `bundle install -j 4` we'd get a Segfault at: ``` /usr/lib64/ruby/3.3.0/rubygems/ext/builder.rb:93: [BUG] Segmentation fault at 0x0000000000000014 ruby 3.3.1 (2024-04-23 revision c56cd86388) [x86_64-linux-gnu] ``` Full [log is available here][0]. I could not find a shorter reproducer besides using bundler with `--jobs 4` or `--jobs 8`. Here's a sample command to trigger the behavior (it creates the Gemfile and calls bundler) [1]. We installed all debug symbols and narrowed down the location of the segfault to `rb_io_getline_fast` in io.c At [line 4001][3] `str` is `T_NONE`, which makes further usage down the line in [`io_enc_str`][4] raise a null pointer dereference. With the notes from [extension.rdoc - Appendix E. RB_GC_GUARD to protect from premature GC][8] I've prepared a patched ruby 3.3.1 package that does not segfault. It's on [OBS Project home:josegomezr:branches:ruby/ruby3.3][6]. Adding a `RB_GC_GUARD` on `rb_io_getline_fast` @ `io.c:4004` just before the return ```diff --- ruby3.3.orig/ruby-3.3.1/io.c +++ ruby3.3/ruby-3.3.1/io.c @@ -4004,6 +4004,7 @@ rb_io_getline_fast(rb_io_t *fptr, rb_enc ENC_CODERANGE_SET(str, cr); fptr->lineno++; + RB_GC_GUARD(str); return str; } ``` Fixes the segfault in our tests. `bundle` finish the installation and the image is built. I've set up a project in OBS to provide reproduceables. - [ruby3.3.1 package][5]. - [ruby3.3.1 base image with enough dependencies to reproduce][7] with [the reproducer script][1]. And the corresponding container is exported in the `containers-patched` repository. Here I leave the docker images generated by OBS: - 3.3.1 [without patches, segfaults.] ``` registry.opensuse.org/home/josegomezr/branches/ruby/containers/containers/base-ruby33:latest ``` - 3.3.1 [with patch, does not fail] ``` registry.opensuse.org/home/josegomezr/branches/ruby/containers/containers-patched/base-ruby33:latest ``` [0]: https://gist.github.com/josegomezr/441c271cc731b0ec57213cb98743a699 [1]: https://gist.github.com/josegomezr/e17129bf2df33f3bea60e84a616a8322 [2]: https://gist.github.com/josegomezr/6f81878c979af334efee59b8f2225e58 [3]: https://github.com/ruby/ruby/blob/v3_3_1/io.c#L4001 [4]: https://github.com/ruby/ruby/blob/v3_3_1/io.c#L4003 [5]: https://build.opensuse.org/package/show/devel:languages:ruby/ruby3.3 [6]: https://build.opensuse.org/package/show/home:josegomezr:branches:ruby/ruby3.... [7]: https://build.opensuse.org/package/show/home:josegomezr:branches:ruby:contai... [8]: https://github.com/ruby/ruby/blob/master/doc/extension.rdoc#label-Appendix+E... -- https://bugs.ruby-lang.org/

Issue #20493 has been updated by kjtsanaktsidis (KJ Tsanaktsidis). You're welcome, thank you for posting such a detailed and actionable report. You're exactly right; your patch deals with this issue in IO methods using `rb_io_getline_fast` but this is probably causing crashes in other places as well; _hopefully_ I've fixed that. ---------------------------------------- Bug #20493: Segfault on rb_io_getline_fast https://bugs.ruby-lang.org/issues/20493#change-108332 * Author: josegomezr (Jose Gomez) * Status: Open * Assignee: kjtsanaktsidis (KJ Tsanaktsidis) * ruby -v: 3.3.1 * Backport: 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- We've spotted a consistent segfault when running bundle install with `--jobs 4` When running: `bundle install -j 4` we'd get a Segfault at: ``` /usr/lib64/ruby/3.3.0/rubygems/ext/builder.rb:93: [BUG] Segmentation fault at 0x0000000000000014 ruby 3.3.1 (2024-04-23 revision c56cd86388) [x86_64-linux-gnu] ``` Full [log is available here][0]. I could not find a shorter reproducer besides using bundler with `--jobs 4` or `--jobs 8`. Here's a sample command to trigger the behavior (it creates the Gemfile and calls bundler) [1]. We installed all debug symbols and narrowed down the location of the segfault to `rb_io_getline_fast` in io.c At [line 4001][3] `str` is `T_NONE`, which makes further usage down the line in [`io_enc_str`][4] raise a null pointer dereference. With the notes from [extension.rdoc - Appendix E. RB_GC_GUARD to protect from premature GC][8] I've prepared a patched ruby 3.3.1 package that does not segfault. It's on [OBS Project home:josegomezr:branches:ruby/ruby3.3][6]. Adding a `RB_GC_GUARD` on `rb_io_getline_fast` @ `io.c:4004` just before the return ```diff --- ruby3.3.orig/ruby-3.3.1/io.c +++ ruby3.3/ruby-3.3.1/io.c @@ -4004,6 +4004,7 @@ rb_io_getline_fast(rb_io_t *fptr, rb_enc ENC_CODERANGE_SET(str, cr); fptr->lineno++; + RB_GC_GUARD(str); return str; } ``` Fixes the segfault in our tests. `bundle` finish the installation and the image is built. I've set up a project in OBS to provide reproduceables. - [ruby3.3.1 package][5]. - [ruby3.3.1 base image with enough dependencies to reproduce][7] with [the reproducer script][1]. And the corresponding container is exported in the `containers-patched` repository. Here I leave the docker images generated by OBS: - 3.3.1 [without patches, segfaults.] ``` registry.opensuse.org/home/josegomezr/branches/ruby/containers/containers/base-ruby33:latest ``` - 3.3.1 [with patch, does not fail] ``` registry.opensuse.org/home/josegomezr/branches/ruby/containers/containers-patched/base-ruby33:latest ``` [0]: https://gist.github.com/josegomezr/441c271cc731b0ec57213cb98743a699 [1]: https://gist.github.com/josegomezr/e17129bf2df33f3bea60e84a616a8322 [2]: https://gist.github.com/josegomezr/6f81878c979af334efee59b8f2225e58 [3]: https://github.com/ruby/ruby/blob/v3_3_1/io.c#L4001 [4]: https://github.com/ruby/ruby/blob/v3_3_1/io.c#L4003 [5]: https://build.opensuse.org/package/show/devel:languages:ruby/ruby3.3 [6]: https://build.opensuse.org/package/show/home:josegomezr:branches:ruby/ruby3.... [7]: https://build.opensuse.org/package/show/home:josegomezr:branches:ruby:contai... [8]: https://github.com/ruby/ruby/blob/master/doc/extension.rdoc#label-Appendix+E... -- https://bugs.ruby-lang.org/

Issue #20493 has been updated by kjtsanaktsidis (KJ Tsanaktsidis). Backport changed from 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN to 3.1: UNKNOWN, 3.2: REQUIRED, 3.3: REQUIRED Closed by merging https://github.com/ruby/ruby/pull/10795 I opened two PR's to backport this to 3.2 & 3.3 (I think crashes like this should be backported?) * 3.2: https://github.com/ruby/ruby/pull/10797 * 3.3: https://github.com/ruby/ruby/pull/10798 ---------------------------------------- Bug #20493: Segfault on rb_io_getline_fast https://bugs.ruby-lang.org/issues/20493#change-108335 * Author: josegomezr (Jose Gomez) * Status: Open * Assignee: kjtsanaktsidis (KJ Tsanaktsidis) * ruby -v: 3.3.1 * Backport: 3.1: UNKNOWN, 3.2: REQUIRED, 3.3: REQUIRED ---------------------------------------- We've spotted a consistent segfault when running bundle install with `--jobs 4` When running: `bundle install -j 4` we'd get a Segfault at: ``` /usr/lib64/ruby/3.3.0/rubygems/ext/builder.rb:93: [BUG] Segmentation fault at 0x0000000000000014 ruby 3.3.1 (2024-04-23 revision c56cd86388) [x86_64-linux-gnu] ``` Full [log is available here][0]. I could not find a shorter reproducer besides using bundler with `--jobs 4` or `--jobs 8`. Here's a sample command to trigger the behavior (it creates the Gemfile and calls bundler) [1]. We installed all debug symbols and narrowed down the location of the segfault to `rb_io_getline_fast` in io.c At [line 4001][3] `str` is `T_NONE`, which makes further usage down the line in [`io_enc_str`][4] raise a null pointer dereference. With the notes from [extension.rdoc - Appendix E. RB_GC_GUARD to protect from premature GC][8] I've prepared a patched ruby 3.3.1 package that does not segfault. It's on [OBS Project home:josegomezr:branches:ruby/ruby3.3][6]. Adding a `RB_GC_GUARD` on `rb_io_getline_fast` @ `io.c:4004` just before the return ```diff --- ruby3.3.orig/ruby-3.3.1/io.c +++ ruby3.3/ruby-3.3.1/io.c @@ -4004,6 +4004,7 @@ rb_io_getline_fast(rb_io_t *fptr, rb_enc ENC_CODERANGE_SET(str, cr); fptr->lineno++; + RB_GC_GUARD(str); return str; } ``` Fixes the segfault in our tests. `bundle` finish the installation and the image is built. I've set up a project in OBS to provide reproduceables. - [ruby3.3.1 package][5]. - [ruby3.3.1 base image with enough dependencies to reproduce][7] with [the reproducer script][1]. And the corresponding container is exported in the `containers-patched` repository. Here I leave the docker images generated by OBS: - 3.3.1 [without patches, segfaults.] ``` registry.opensuse.org/home/josegomezr/branches/ruby/containers/containers/base-ruby33:latest ``` - 3.3.1 [with patch, does not fail] ``` registry.opensuse.org/home/josegomezr/branches/ruby/containers/containers-patched/base-ruby33:latest ``` [0]: https://gist.github.com/josegomezr/441c271cc731b0ec57213cb98743a699 [1]: https://gist.github.com/josegomezr/e17129bf2df33f3bea60e84a616a8322 [2]: https://gist.github.com/josegomezr/6f81878c979af334efee59b8f2225e58 [3]: https://github.com/ruby/ruby/blob/v3_3_1/io.c#L4001 [4]: https://github.com/ruby/ruby/blob/v3_3_1/io.c#L4003 [5]: https://build.opensuse.org/package/show/devel:languages:ruby/ruby3.3 [6]: https://build.opensuse.org/package/show/home:josegomezr:branches:ruby/ruby3.... [7]: https://build.opensuse.org/package/show/home:josegomezr:branches:ruby:contai... [8]: https://github.com/ruby/ruby/blob/master/doc/extension.rdoc#label-Appendix+E... -- https://bugs.ruby-lang.org/

Issue #20493 has been updated by josegomezr (Jose Gomez). Awesome! I was about to ask what was the plan for having this change in ruby 3.2/3.3. I've submitted the patches to the projects in openSUSE with the contents of the Backport PR's to have our rubies fixed - https://build.opensuse.org/request/show/1175209 - https://build.opensuse.org/request/show/1175208 Is there an estimation for the next ruby release? ---------------------------------------- Bug #20493: Segfault on rb_io_getline_fast https://bugs.ruby-lang.org/issues/20493#change-108340 * Author: josegomezr (Jose Gomez) * Status: Closed * Assignee: kjtsanaktsidis (KJ Tsanaktsidis) * ruby -v: 3.3.1 * Backport: 3.1: UNKNOWN, 3.2: REQUIRED, 3.3: REQUIRED ---------------------------------------- We've spotted a consistent segfault when running bundle install with `--jobs 4` When running: `bundle install -j 4` we'd get a Segfault at: ``` /usr/lib64/ruby/3.3.0/rubygems/ext/builder.rb:93: [BUG] Segmentation fault at 0x0000000000000014 ruby 3.3.1 (2024-04-23 revision c56cd86388) [x86_64-linux-gnu] ``` Full [log is available here][0]. I could not find a shorter reproducer besides using bundler with `--jobs 4` or `--jobs 8`. Here's a sample command to trigger the behavior (it creates the Gemfile and calls bundler) [1]. We installed all debug symbols and narrowed down the location of the segfault to `rb_io_getline_fast` in io.c At [line 4001][3] `str` is `T_NONE`, which makes further usage down the line in [`io_enc_str`][4] raise a null pointer dereference. With the notes from [extension.rdoc - Appendix E. RB_GC_GUARD to protect from premature GC][8] I've prepared a patched ruby 3.3.1 package that does not segfault. It's on [OBS Project home:josegomezr:branches:ruby/ruby3.3][6]. Adding a `RB_GC_GUARD` on `rb_io_getline_fast` @ `io.c:4004` just before the return ```diff --- ruby3.3.orig/ruby-3.3.1/io.c +++ ruby3.3/ruby-3.3.1/io.c @@ -4004,6 +4004,7 @@ rb_io_getline_fast(rb_io_t *fptr, rb_enc ENC_CODERANGE_SET(str, cr); fptr->lineno++; + RB_GC_GUARD(str); return str; } ``` Fixes the segfault in our tests. `bundle` finish the installation and the image is built. I've set up a project in OBS to provide reproduceables. - [ruby3.3.1 package][5]. - [ruby3.3.1 base image with enough dependencies to reproduce][7] with [the reproducer script][1]. And the corresponding container is exported in the `containers-patched` repository. Here I leave the docker images generated by OBS: - 3.3.1 [without patches, segfaults.] ``` registry.opensuse.org/home/josegomezr/branches/ruby/containers/containers/base-ruby33:latest ``` - 3.3.1 [with patch, does not fail] ``` registry.opensuse.org/home/josegomezr/branches/ruby/containers/containers-patched/base-ruby33:latest ``` [0]: https://gist.github.com/josegomezr/441c271cc731b0ec57213cb98743a699 [1]: https://gist.github.com/josegomezr/e17129bf2df33f3bea60e84a616a8322 [2]: https://gist.github.com/josegomezr/6f81878c979af334efee59b8f2225e58 [3]: https://github.com/ruby/ruby/blob/v3_3_1/io.c#L4001 [4]: https://github.com/ruby/ruby/blob/v3_3_1/io.c#L4003 [5]: https://build.opensuse.org/package/show/devel:languages:ruby/ruby3.3 [6]: https://build.opensuse.org/package/show/home:josegomezr:branches:ruby/ruby3.... [7]: https://build.opensuse.org/package/show/home:josegomezr:branches:ruby:contai... [8]: https://github.com/ruby/ruby/blob/master/doc/extension.rdoc#label-Appendix+E... -- https://bugs.ruby-lang.org/

Issue #20493 has been updated by k0kubun (Takashi Kokubun). Backport changed from 3.1: UNKNOWN, 3.2: REQUIRED, 3.3: REQUIRED to 3.1: UNKNOWN, 3.2: REQUIRED, 3.3: DONE ruby_3_3 7afc16aa48beb093b06eb978bc430f90dd771690. ---------------------------------------- Bug #20493: Segfault on rb_io_getline_fast https://bugs.ruby-lang.org/issues/20493#change-108471 * Author: josegomezr (Jose Gomez) * Status: Closed * Assignee: kjtsanaktsidis (KJ Tsanaktsidis) * ruby -v: 3.3.1 * Backport: 3.1: UNKNOWN, 3.2: REQUIRED, 3.3: DONE ---------------------------------------- We've spotted a consistent segfault when running bundle install with `--jobs 4` When running: `bundle install -j 4` we'd get a Segfault at: ``` /usr/lib64/ruby/3.3.0/rubygems/ext/builder.rb:93: [BUG] Segmentation fault at 0x0000000000000014 ruby 3.3.1 (2024-04-23 revision c56cd86388) [x86_64-linux-gnu] ``` Full [log is available here][0]. I could not find a shorter reproducer besides using bundler with `--jobs 4` or `--jobs 8`. Here's a sample command to trigger the behavior (it creates the Gemfile and calls bundler) [1]. We installed all debug symbols and narrowed down the location of the segfault to `rb_io_getline_fast` in io.c At [line 4001][3] `str` is `T_NONE`, which makes further usage down the line in [`io_enc_str`][4] raise a null pointer dereference. With the notes from [extension.rdoc - Appendix E. RB_GC_GUARD to protect from premature GC][8] I've prepared a patched ruby 3.3.1 package that does not segfault. It's on [OBS Project home:josegomezr:branches:ruby/ruby3.3][6]. Adding a `RB_GC_GUARD` on `rb_io_getline_fast` @ `io.c:4004` just before the return ```diff --- ruby3.3.orig/ruby-3.3.1/io.c +++ ruby3.3/ruby-3.3.1/io.c @@ -4004,6 +4004,7 @@ rb_io_getline_fast(rb_io_t *fptr, rb_enc ENC_CODERANGE_SET(str, cr); fptr->lineno++; + RB_GC_GUARD(str); return str; } ``` Fixes the segfault in our tests. `bundle` finish the installation and the image is built. I've set up a project in OBS to provide reproduceables. - [ruby3.3.1 package][5]. - [ruby3.3.1 base image with enough dependencies to reproduce][7] with [the reproducer script][1]. And the corresponding container is exported in the `containers-patched` repository. Here I leave the docker images generated by OBS: - 3.3.1 [without patches, segfaults.] ``` registry.opensuse.org/home/josegomezr/branches/ruby/containers/containers/base-ruby33:latest ``` - 3.3.1 [with patch, does not fail] ``` registry.opensuse.org/home/josegomezr/branches/ruby/containers/containers-patched/base-ruby33:latest ``` [0]: https://gist.github.com/josegomezr/441c271cc731b0ec57213cb98743a699 [1]: https://gist.github.com/josegomezr/e17129bf2df33f3bea60e84a616a8322 [2]: https://gist.github.com/josegomezr/6f81878c979af334efee59b8f2225e58 [3]: https://github.com/ruby/ruby/blob/v3_3_1/io.c#L4001 [4]: https://github.com/ruby/ruby/blob/v3_3_1/io.c#L4003 [5]: https://build.opensuse.org/package/show/devel:languages:ruby/ruby3.3 [6]: https://build.opensuse.org/package/show/home:josegomezr:branches:ruby/ruby3.... [7]: https://build.opensuse.org/package/show/home:josegomezr:branches:ruby:contai... [8]: https://github.com/ruby/ruby/blob/master/doc/extension.rdoc#label-Appendix+E... -- https://bugs.ruby-lang.org/

Issue #20493 has been updated by k0kubun (Takashi Kokubun).
Is there an estimation for the next ruby release?
Today. I just released Ruby 3.3.2. ---------------------------------------- Bug #20493: Segfault on rb_io_getline_fast https://bugs.ruby-lang.org/issues/20493#change-108516 * Author: josegomezr (Jose Gomez) * Status: Closed * Assignee: kjtsanaktsidis (KJ Tsanaktsidis) * ruby -v: 3.3.1 * Backport: 3.1: UNKNOWN, 3.2: REQUIRED, 3.3: DONE ---------------------------------------- We've spotted a consistent segfault when running bundle install with `--jobs 4` When running: `bundle install -j 4` we'd get a Segfault at: ``` /usr/lib64/ruby/3.3.0/rubygems/ext/builder.rb:93: [BUG] Segmentation fault at 0x0000000000000014 ruby 3.3.1 (2024-04-23 revision c56cd86388) [x86_64-linux-gnu] ``` Full [log is available here][0]. I could not find a shorter reproducer besides using bundler with `--jobs 4` or `--jobs 8`. Here's a sample command to trigger the behavior (it creates the Gemfile and calls bundler) [1]. We installed all debug symbols and narrowed down the location of the segfault to `rb_io_getline_fast` in io.c At [line 4001][3] `str` is `T_NONE`, which makes further usage down the line in [`io_enc_str`][4] raise a null pointer dereference. With the notes from [extension.rdoc - Appendix E. RB_GC_GUARD to protect from premature GC][8] I've prepared a patched ruby 3.3.1 package that does not segfault. It's on [OBS Project home:josegomezr:branches:ruby/ruby3.3][6]. Adding a `RB_GC_GUARD` on `rb_io_getline_fast` @ `io.c:4004` just before the return ```diff --- ruby3.3.orig/ruby-3.3.1/io.c +++ ruby3.3/ruby-3.3.1/io.c @@ -4004,6 +4004,7 @@ rb_io_getline_fast(rb_io_t *fptr, rb_enc ENC_CODERANGE_SET(str, cr); fptr->lineno++; + RB_GC_GUARD(str); return str; } ``` Fixes the segfault in our tests. `bundle` finish the installation and the image is built. I've set up a project in OBS to provide reproduceables. - [ruby3.3.1 package][5]. - [ruby3.3.1 base image with enough dependencies to reproduce][7] with [the reproducer script][1]. And the corresponding container is exported in the `containers-patched` repository. Here I leave the docker images generated by OBS: - 3.3.1 [without patches, segfaults.] ``` registry.opensuse.org/home/josegomezr/branches/ruby/containers/containers/base-ruby33:latest ``` - 3.3.1 [with patch, does not fail] ``` registry.opensuse.org/home/josegomezr/branches/ruby/containers/containers-patched/base-ruby33:latest ``` [0]: https://gist.github.com/josegomezr/441c271cc731b0ec57213cb98743a699 [1]: https://gist.github.com/josegomezr/e17129bf2df33f3bea60e84a616a8322 [2]: https://gist.github.com/josegomezr/6f81878c979af334efee59b8f2225e58 [3]: https://github.com/ruby/ruby/blob/v3_3_1/io.c#L4001 [4]: https://github.com/ruby/ruby/blob/v3_3_1/io.c#L4003 [5]: https://build.opensuse.org/package/show/devel:languages:ruby/ruby3.3 [6]: https://build.opensuse.org/package/show/home:josegomezr:branches:ruby/ruby3.... [7]: https://build.opensuse.org/package/show/home:josegomezr:branches:ruby:contai... [8]: https://github.com/ruby/ruby/blob/master/doc/extension.rdoc#label-Appendix+E... -- https://bugs.ruby-lang.org/

Issue #20493 has been updated by josegomezr (Jose Gomez). k0kubun (Takashi Kokubun) wrote in #note-12:
Today. I just released Ruby 3.3.2.
Awesome! I forgot to respond when I read it. I see it propagated to OBS too, thanks all for your work & efforts! Until the next bug! 😹 ---------------------------------------- Bug #20493: Segfault on rb_io_getline_fast https://bugs.ruby-lang.org/issues/20493#change-108610 * Author: josegomezr (Jose Gomez) * Status: Closed * Assignee: kjtsanaktsidis (KJ Tsanaktsidis) * ruby -v: 3.3.1 * Backport: 3.1: UNKNOWN, 3.2: REQUIRED, 3.3: DONE ---------------------------------------- We've spotted a consistent segfault when running bundle install with `--jobs 4` When running: `bundle install -j 4` we'd get a Segfault at: ``` /usr/lib64/ruby/3.3.0/rubygems/ext/builder.rb:93: [BUG] Segmentation fault at 0x0000000000000014 ruby 3.3.1 (2024-04-23 revision c56cd86388) [x86_64-linux-gnu] ``` Full [log is available here][0]. I could not find a shorter reproducer besides using bundler with `--jobs 4` or `--jobs 8`. Here's a sample command to trigger the behavior (it creates the Gemfile and calls bundler) [1]. We installed all debug symbols and narrowed down the location of the segfault to `rb_io_getline_fast` in io.c At [line 4001][3] `str` is `T_NONE`, which makes further usage down the line in [`io_enc_str`][4] raise a null pointer dereference. With the notes from [extension.rdoc - Appendix E. RB_GC_GUARD to protect from premature GC][8] I've prepared a patched ruby 3.3.1 package that does not segfault. It's on [OBS Project home:josegomezr:branches:ruby/ruby3.3][6]. Adding a `RB_GC_GUARD` on `rb_io_getline_fast` @ `io.c:4004` just before the return ```diff --- ruby3.3.orig/ruby-3.3.1/io.c +++ ruby3.3/ruby-3.3.1/io.c @@ -4004,6 +4004,7 @@ rb_io_getline_fast(rb_io_t *fptr, rb_enc ENC_CODERANGE_SET(str, cr); fptr->lineno++; + RB_GC_GUARD(str); return str; } ``` Fixes the segfault in our tests. `bundle` finish the installation and the image is built. I've set up a project in OBS to provide reproduceables. - [ruby3.3.1 package][5]. - [ruby3.3.1 base image with enough dependencies to reproduce][7] with [the reproducer script][1]. And the corresponding container is exported in the `containers-patched` repository. Here I leave the docker images generated by OBS: - 3.3.1 [without patches, segfaults.] ``` registry.opensuse.org/home/josegomezr/branches/ruby/containers/containers/base-ruby33:latest ``` - 3.3.1 [with patch, does not fail] ``` registry.opensuse.org/home/josegomezr/branches/ruby/containers/containers-patched/base-ruby33:latest ``` [0]: https://gist.github.com/josegomezr/441c271cc731b0ec57213cb98743a699 [1]: https://gist.github.com/josegomezr/e17129bf2df33f3bea60e84a616a8322 [2]: https://gist.github.com/josegomezr/6f81878c979af334efee59b8f2225e58 [3]: https://github.com/ruby/ruby/blob/v3_3_1/io.c#L4001 [4]: https://github.com/ruby/ruby/blob/v3_3_1/io.c#L4003 [5]: https://build.opensuse.org/package/show/devel:languages:ruby/ruby3.3 [6]: https://build.opensuse.org/package/show/home:josegomezr:branches:ruby/ruby3.... [7]: https://build.opensuse.org/package/show/home:josegomezr:branches:ruby:contai... [8]: https://github.com/ruby/ruby/blob/master/doc/extension.rdoc#label-Appendix+E... -- https://bugs.ruby-lang.org/

Issue #20493 has been updated by nagachika (Tomoyuki Chikanaga). Backport changed from 3.1: UNKNOWN, 3.2: REQUIRED, 3.3: DONE to 3.1: UNKNOWN, 3.2: DONE, 3.3: DONE ruby_3_2 commit:2f010f31f1887ad0f429709a2906fc5a4dae8e87. ---------------------------------------- Bug #20493: Segfault on rb_io_getline_fast https://bugs.ruby-lang.org/issues/20493#change-108832 * Author: josegomezr (Jose Gomez) * Status: Closed * Assignee: kjtsanaktsidis (KJ Tsanaktsidis) * ruby -v: 3.3.1 * Backport: 3.1: UNKNOWN, 3.2: DONE, 3.3: DONE ---------------------------------------- We've spotted a consistent segfault when running bundle install with `--jobs 4` When running: `bundle install -j 4` we'd get a Segfault at: ``` /usr/lib64/ruby/3.3.0/rubygems/ext/builder.rb:93: [BUG] Segmentation fault at 0x0000000000000014 ruby 3.3.1 (2024-04-23 revision c56cd86388) [x86_64-linux-gnu] ``` Full [log is available here][0]. I could not find a shorter reproducer besides using bundler with `--jobs 4` or `--jobs 8`. Here's a sample command to trigger the behavior (it creates the Gemfile and calls bundler) [1]. We installed all debug symbols and narrowed down the location of the segfault to `rb_io_getline_fast` in io.c At [line 4001][3] `str` is `T_NONE`, which makes further usage down the line in [`io_enc_str`][4] raise a null pointer dereference. With the notes from [extension.rdoc - Appendix E. RB_GC_GUARD to protect from premature GC][8] I've prepared a patched ruby 3.3.1 package that does not segfault. It's on [OBS Project home:josegomezr:branches:ruby/ruby3.3][6]. Adding a `RB_GC_GUARD` on `rb_io_getline_fast` @ `io.c:4004` just before the return ```diff --- ruby3.3.orig/ruby-3.3.1/io.c +++ ruby3.3/ruby-3.3.1/io.c @@ -4004,6 +4004,7 @@ rb_io_getline_fast(rb_io_t *fptr, rb_enc ENC_CODERANGE_SET(str, cr); fptr->lineno++; + RB_GC_GUARD(str); return str; } ``` Fixes the segfault in our tests. `bundle` finish the installation and the image is built. I've set up a project in OBS to provide reproduceables. - [ruby3.3.1 package][5]. - [ruby3.3.1 base image with enough dependencies to reproduce][7] with [the reproducer script][1]. And the corresponding container is exported in the `containers-patched` repository. Here I leave the docker images generated by OBS: - 3.3.1 [without patches, segfaults.] ``` registry.opensuse.org/home/josegomezr/branches/ruby/containers/containers/base-ruby33:latest ``` - 3.3.1 [with patch, does not fail] ``` registry.opensuse.org/home/josegomezr/branches/ruby/containers/containers-patched/base-ruby33:latest ``` [0]: https://gist.github.com/josegomezr/441c271cc731b0ec57213cb98743a699 [1]: https://gist.github.com/josegomezr/e17129bf2df33f3bea60e84a616a8322 [2]: https://gist.github.com/josegomezr/6f81878c979af334efee59b8f2225e58 [3]: https://github.com/ruby/ruby/blob/v3_3_1/io.c#L4001 [4]: https://github.com/ruby/ruby/blob/v3_3_1/io.c#L4003 [5]: https://build.opensuse.org/package/show/devel:languages:ruby/ruby3.3 [6]: https://build.opensuse.org/package/show/home:josegomezr:branches:ruby/ruby3.... [7]: https://build.opensuse.org/package/show/home:josegomezr:branches:ruby:contai... [8]: https://github.com/ruby/ruby/blob/master/doc/extension.rdoc#label-Appendix+E... -- https://bugs.ruby-lang.org/
participants (4)
-
josegomezr (Jose Gomez)
-
k0kubun (Takashi Kokubun)
-
kjtsanaktsidis (KJ Tsanaktsidis)
-
nagachika (Tomoyuki Chikanaga)