[ruby-core:116370] [Ruby master Bug#20203] `TestEnumerable` test failures with GCC 14

Issue #20203 has been reported by vo.x (Vit Ondruch). ---------------------------------------- Bug #20203: `TestEnumerable` test failures with GCC 14 https://bugs.ruby-lang.org/issues/20203 * Author: vo.x (Vit Ondruch) * Status: Open * Priority: Normal * ruby -v: ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-linux] * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- There is ongoing mass rebuild in Fedora and that is first time GCC 14 is used and we observe test failures in `TestEnumerable`. Here are a few examples: ~~~ [ 3000/26419] TestEnumerable#test_transient_heap_sort_bymalloc_consolidate(): unaligned fastbin chunk detected ~~~ ~~~ [ 2455/26535] TestEnumerable#test_transient_heap_sort_bycorrupted size vs. prev_size in fastbins ~~~ ~~~ [ 9716/26532] TestEnumerable#test_any_with_unused_blockdouble free or corruption (fasttop) ~~~ The full logs are accessible [here](https://koji.fedoraproject.org/koji/taskinfo?taskID=112176941). Please drill through `Descendants` and `build.log` -- https://bugs.ruby-lang.org/

Issue #20203 has been updated by vo.x (Vit Ondruch). This is the backtrace I was able to get: ~~~ [72/83] TestEnumerable#test_inject_array_op_redefined[Detaching after vfork from child process 94] = 0.00 s 10) Error: TestEnumerable#test_inject_array_op_redefined: Errno::ENOENT: No such file or directory - /usr/bin/ruby /builddir/build/BUILD/ruby-3.3.0/tool/lib/envutil.rb:161:in `spawn' /builddir/build/BUILD/ruby-3.3.0/tool/lib/envutil.rb:161:in `invoke_ruby' malloc_consolidate(): unaligned fastbin chunk detected Thread 1 "ruby" received signal SIGABRT, Aborted. 0x00007ffff7723184 in __pthread_kill_implementation () from /lib64/libc.so.6 Missing separate debuginfos, use: dnf debuginfo-install glibc-2.38.9000-33.fc40.x86_64 gmp-6.2.1-5.fc39.x86_64 libgcc-14.0.1-0.2.fc40.x86_64 libxcrypt-4.4.36-4.fc40.x86_64 zlib-ng-compat-2.1.6-1.fc40.x86_64 (gdb) bt #0 0x00007ffff7723184 in __pthread_kill_implementation () from /lib64/libc.so.6 #1 0x00007ffff76cb65e in raise () from /lib64/libc.so.6 #2 0x00007ffff76b3902 in abort () from /lib64/libc.so.6 #3 0x00007ffff76b4767 in __libc_message_impl.cold () from /lib64/libc.so.6 #4 0x00007ffff772d1b5 in malloc_printerr () from /lib64/libc.so.6 #5 0x00007ffff772dd7c in malloc_consolidate () from /lib64/libc.so.6 #6 0x00007ffff772ef90 in _int_free_maybe_consolidate.part.0 () from /lib64/libc.so.6 #7 0x00007ffff772f5fa in _int_free () from /lib64/libc.so.6 #8 0x00007ffff7731e0e in free () from /lib64/libc.so.6 #9 0x00007ffff7ad52ec in objspace_xfree (old_size=11256, ptr=0x555555930250, objspace=0x55555555dc70) at /builddir/build/BUILD/ruby-3.3.0/gc.c:12823 #10 objspace_xfree (old_size=<optimized out>, ptr=0x555555930250, objspace=0x55555555dc70) at /builddir/build/BUILD/ruby-3.3.0/gc.c:12754 #11 ruby_sized_xfree (x=0x555555930250, size=<optimized out>) at /builddir/build/BUILD/ruby-3.3.0/gc.c:12927 #12 0x00007ffff7a91ec1 in cont_free (ptr=0x555555c9f3e0) at /builddir/build/BUILD/ruby-3.3.0/cont.c:1059 #13 0x00007ffff7accd01 in rb_data_free (obj=140736887130000, objspace=0x55555555dc70) at /builddir/build/BUILD/ruby-3.3.0/gc.c:3500 #14 obj_free (objspace=0x55555555dc70, obj=140736887130000) at /builddir/build/BUILD/ruby-3.3.0/gc.c:3659 #15 0x00007ffff7cd9b10 in gc_sweep_plane (heap=0x55555555dce0, ctx=<optimized out>, bitset=<optimized out>, p=140736887130000, objspace=0x55555555dc70) at /builddir/build/BUILD/ruby-3.3.0/gc.c:5680 #16 gc_sweep_page.constprop.0 (objspace=0x55555555dc70, heap=0x55555555dce0, ctx=0x7fffffffca40) at /builddir/build/BUILD/ruby-3.3.0/gc.c:5758 #17 0x00007ffff7acac51 in gc_sweep_step (objspace=objspace@entry=0x55555555dc70, size_pool=size_pool@entry=0x55555555dc90, heap=heap@entry=0x55555555dce0) at /builddir/build/BUILD/ruby-3.3.0/gc.c:6047 #18 0x00007ffff7acfa71 in gc_sweep (objspace=<optimized out>) at /builddir/build/BUILD/ruby-3.3.0/gc.c:6272 #19 0x00007ffff7adadce in gc_start (objspace=0x55555555dc70, reason=<optimized out>) at /builddir/build/BUILD/ruby-3.3.0/gc.c:9609 #20 0x00007ffff7ad35ab in heap_prepare (heap=0x55555555dce0, size_pool=0x55555555dc90, objspace=0x55555555dc70) at /builddir/build/BUILD/ruby-3.3.0/gc.c:2517 #21 heap_next_free_page (heap=0x55555555dce0, size_pool=0x55555555dc90, objspace=0x55555555dc70) at /builddir/build/BUILD/ruby-3.3.0/gc.c:2725 #22 newobj_alloc (objspace=0x55555555dc70, cr=0x55555555e960, size_pool_idx=0, vm_locked=<optimized out>, vm_locked@entry=false) at /builddir/build/BUILD/ruby-3.3.0/gc.c:2827 #23 0x00007ffff7ad3eb4 in newobj_of0 (alloc_size=<optimized out>, cr=<optimized out>, wb_protected=1, flags=<optimized out>, klass=140736918383680) at /builddir/build/BUILD/ruby-3.3.0/gc.c:2930 #24 newobj_of (alloc_size=<optimized out>, wb_protected=1, v3=0, v2=0, v1=0, flags=<optimized out>, klass=140736918383680, cr=<optimized out>) at /builddir/build/BUILD/ruby-3.3.0/gc.c:2947 #25 rb_wb_protected_newobj_of (ec=<optimized out>, klass=140736918383680, flags=<optimized out>, size=<optimized out>) at /builddir/build/BUILD/ruby-3.3.0/gc.c:2962 #26 0x00007ffff7bdf0c1 in str_alloc_embed (capa=6, klass=140736918383680) at /builddir/build/BUILD/ruby-3.3.0/vm_core.h:1954 #27 str_new0 (klass=140736918383680, ptr=0x5555555fa945 "vt100", len=5, termlen=1) at /builddir/build/BUILD/ruby-3.3.0/string.c:871 #28 0x00007ffff7bdf9fe in rb_enc_str_new (ptr=<optimized out>, len=<optimized out>, enc=0x555555579650) at /builddir/build/BUILD/ruby-3.3.0/string.c:928 #29 0x00007ffff7ae4ae5 in env_enc_str_new (enc=<optimized out>, len=5, ptr=0x5555555fa945 "vt100") at /builddir/build/BUILD/ruby-3.3.0/hash.c:4810 #30 env_str_new (len=5, ptr=0x5555555fa945 "vt100") at /builddir/build/BUILD/ruby-3.3.0/hash.c:4819 #31 env_str_new2 (ptr=0x5555555fa945 "vt100") at /builddir/build/BUILD/ruby-3.3.0/hash.c:4826 #32 env_to_hash () at /builddir/build/BUILD/ruby-3.3.0/hash.c:6257 #33 0x00007ffff7ae4bf0 in env_to_h (_=<optimized out>) at /builddir/build/BUILD/ruby-3.3.0/hash.c:6317 #34 0x00007ffff7c3aa46 in vm_call_cfunc_with_frame_ (ec=0x55555555ec10, reg_cfp=0x7ffff75fca98, calling=<optimized out>, argc=0, argv=0x7ffff74fd4a8, stack_bottom=<optimized out>) at /builddir/build/BUILD/ruby-3.3.0/vm_insnhelper.c:3490 #35 0x00007ffff7c4233b in vm_sendish (method_explorer=<optimized out>, block_handler=<optimized out>, cd=<optimized out>, reg_cfp=<optimized out>, ec=<optimized out>) at /builddir/build/BUILD/ruby-3.3.0/vm_insnhelper.c:5581 #36 vm_exec_core (ec=0x55555555ec10) at /builddir/build/BUILD/ruby-3.3.0/redhat-linux-build/insns.def:834 #37 0x00007ffff7c5a680 in vm_exec_loop (result=<optimized out>, tag=0x7fffffffd0b0, state=<optimized out>, ec=0x55555555ec10) at /builddir/build/BUILD/ruby-3.3.0/vm.c:2513 #38 rb_vm_exec (ec=0x55555555ec10) at /builddir/build/BUILD/ruby-3.3.0/vm.c:2492 #39 0x00007ffff7c47c67 in vm_yield_with_cref (is_lambda=0, cref=0x0, kw_splat=0, argv=0x7fffffffd1e8, argc=1, ec=<optimized out>) at /builddir/build/BUILD/ruby-3.3.0/vm.c:1634 #40 vm_yield (kw_splat=0, argv=0x7fffffffd1e8, argc=1, ec=<optimized out>) at /builddir/build/BUILD/ruby-3.3.0/vm.c:1642 #41 rb_yield_0 (argv=0x7fffffffd1e8, argc=1) at /builddir/build/BUILD/ruby-3.3.0/vm_eval.c:1366 #42 rb_yield (val=<optimized out>) at /builddir/build/BUILD/ruby-3.3.0/vm_eval.c:1382 #43 0x00007ffff7a4220c in rb_ary_collect (ary=140736886959720) at /builddir/build/BUILD/ruby-3.3.0/array.c:3630 #44 0x00007ffff7c3aa46 in vm_call_cfunc_with_frame_ (ec=0x55555555ec10, reg_cfp=0x7ffff75fcbb0, calling=<optimized out>, argc=0, argv=0x7ffff74fd3a8, stack_bottom=<optimized out>) at /builddir/build/BUILD/ruby-3.3.0/vm_insnhelper.c:3490 #45 0x00007ffff7c4527d in vm_sendish (method_explorer=<optimized out>, block_handler=<optimized out>, cd=<optimized out>, reg_cfp=<optimized out>, ec=<optimized out>) at /builddir/build/BUILD/ruby-3.3.0/vm_insnhelper.c:5581 #46 vm_exec_core (ec=0x55555555ec10) at /builddir/build/BUILD/ruby-3.3.0/redhat-linux-build/insns.def:814 #47 0x00007ffff7c5a46d in rb_vm_exec (ec=0x55555555ec10) at /builddir/build/BUILD/ruby-3.3.0/vm.c:2486 #48 0x00007ffff7c47c67 in vm_yield_with_cref (is_lambda=0, cref=0x0, kw_splat=0, argv=0x7fffffffd538, argc=1, ec=<optimized out>) at /builddir/build/BUILD/ruby-3.3.0/vm.c:1634 #49 vm_yield (kw_splat=0, argv=0x7fffffffd538, argc=1, ec=<optimized out>) at /builddir/build/BUILD/ruby-3.3.0/vm.c:1642 #50 rb_yield_0 (argv=0x7fffffffd538, argc=1) at /builddir/build/BUILD/ruby-3.3.0/vm_eval.c:1366 #51 rb_yield (val=<optimized out>) at /builddir/build/BUILD/ruby-3.3.0/vm_eval.c:1382 #52 0x00007ffff7a41fc4 in rb_ary_each (ary=<optimized out>) at /builddir/build/BUILD/ruby-3.3.0/array.c:2538 #53 rb_ary_each (ary=140736887015160) at /builddir/build/BUILD/ruby-3.3.0/array.c:2532 #54 0x00007ffff7c3aa46 in vm_call_cfunc_with_frame_ (ec=0x55555555ec10, reg_cfp=0x7ffff75fcc90, calling=<optimized out>, argc=0, argv=0x7ffff74fd2c0, stack_bottom=<optimized out>) at /builddir/build/BUILD/ruby-3.3.0/vm_insnhelper.c:3490 #55 0x00007ffff7c4527d in vm_sendish (method_explorer=<optimized out>, block_handler=<optimized out>, cd=<optimized out>, reg_cfp=<optimized out>, ec=<optimized out>) at /builddir/build/BUILD/ruby-3.3.0/vm_insnhelper.c:5581 #56 vm_exec_core (ec=0x55555555ec10) at /builddir/build/BUILD/ruby-3.3.0/redhat-linux-build/insns.def:814 #57 0x00007ffff7c5a46d in rb_vm_exec (ec=0x55555555ec10) at /builddir/build/BUILD/ruby-3.3.0/vm.c:2486 #58 0x00007ffff7c47c67 in vm_yield_with_cref (is_lambda=0, cref=0x0, kw_splat=0, argv=0x7fffffffd888, argc=1, ec=<optimized out>) at /builddir/build/BUILD/ruby-3.3.0/vm.c:1634 #59 vm_yield (kw_splat=0, argv=0x7fffffffd888, argc=1, ec=<optimized out>) at /builddir/build/BUILD/ruby-3.3.0/vm.c:1642 --Type <RET> for more, q to quit, c to continue without paging-- #60 rb_yield_0 (argv=0x7fffffffd888, argc=1) at /builddir/build/BUILD/ruby-3.3.0/vm_eval.c:1366 #61 rb_yield (val=<optimized out>) at /builddir/build/BUILD/ruby-3.3.0/vm_eval.c:1382 #62 0x00007ffff7a41fc4 in rb_ary_each (ary=<optimized out>) at /builddir/build/BUILD/ruby-3.3.0/array.c:2538 #63 rb_ary_each (ary=140736886986000) at /builddir/build/BUILD/ruby-3.3.0/array.c:2532 #64 0x00007ffff7c3aa46 in vm_call_cfunc_with_frame_ (ec=0x55555555ec10, reg_cfp=0x7ffff75fce18, calling=<optimized out>, argc=0, argv=0x7ffff74fd160, stack_bottom=<optimized out>) at /builddir/build/BUILD/ruby-3.3.0/vm_insnhelper.c:3490 #65 0x00007ffff7c4527d in vm_sendish (method_explorer=<optimized out>, block_handler=<optimized out>, cd=<optimized out>, reg_cfp=<optimized out>, ec=<optimized out>) at /builddir/build/BUILD/ruby-3.3.0/vm_insnhelper.c:5581 #66 vm_exec_core (ec=0x55555555ec10) at /builddir/build/BUILD/ruby-3.3.0/redhat-linux-build/insns.def:814 #67 0x00007ffff7c5a680 in vm_exec_loop (result=<optimized out>, tag=0x7fffffffdaa0, state=<optimized out>, ec=0x55555555ec10) at /builddir/build/BUILD/ruby-3.3.0/vm.c:2513 #68 rb_vm_exec (ec=0x55555555ec10) at /builddir/build/BUILD/ruby-3.3.0/vm.c:2492 #69 0x00007ffff7c5b92e in rb_vm_invoke_proc (ec=<optimized out>, proc=<optimized out>, argc=<optimized out>, argv=<optimized out>, kw_splat=<optimized out>, passed_block_handler=<optimized out>) at /builddir/build/BUILD/ruby-3.3.0/vm.c:1728 #70 0x00007ffff7b75d61 in rb_proc_call_kw (self=<optimized out>, args=<optimized out>, kw_splat=0) at /builddir/build/BUILD/ruby-3.3.0/proc.c:978 #71 0x00007ffff7ab8e0a in exec_end_procs_chain (errp=<optimized out>, procs=<optimized out>) at /builddir/build/BUILD/ruby-3.3.0/eval_jump.c:105 #72 rb_ec_exec_end_proc (ec=ec@entry=0x55555555ec10) at /builddir/build/BUILD/ruby-3.3.0/eval_jump.c:120 #73 0x00007ffff7ab9d98 in rb_ec_teardown (ec=ec@entry=0x55555555ec10) at /builddir/build/BUILD/ruby-3.3.0/eval.c:159 #74 0x00007ffff7aba32c in rb_ec_cleanup (ec=ec@entry=0x55555555ec10, ex=RUBY_TAG_NONE) at /builddir/build/BUILD/ruby-3.3.0/eval.c:212 #75 0x00007ffff7aba99d in ruby_run_node (n=0x7fffdc315b30) at /builddir/build/BUILD/ruby-3.3.0/eval.c:328 #76 0x0000555555555195 in rb_main (argv=0x7fffffffe1c8, argc=7) at /builddir/build/BUILD/ruby-3.3.0/main.c:39 #77 main (argc=<optimized out>, argv=<optimized out>) at /builddir/build/BUILD/ruby-3.3.0/main.c:58 ~~~ ---------------------------------------- Bug #20203: `TestEnumerable` test failures with GCC 14 https://bugs.ruby-lang.org/issues/20203#change-106392 * Author: vo.x (Vit Ondruch) * Status: Open * Priority: Normal * ruby -v: ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-linux] * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- There is ongoing mass rebuild in Fedora and that is first time GCC 14 is used and we observe test failures in `TestEnumerable`. Here are a few examples: ~~~ [ 3000/26419] TestEnumerable#test_transient_heap_sort_bymalloc_consolidate(): unaligned fastbin chunk detected ~~~ ~~~ [ 2455/26535] TestEnumerable#test_transient_heap_sort_bycorrupted size vs. prev_size in fastbins ~~~ ~~~ [ 9716/26532] TestEnumerable#test_any_with_unused_blockdouble free or corruption (fasttop) ~~~ The full logs are accessible [here](https://koji.fedoraproject.org/koji/taskinfo?taskID=112176941). Please drill through `Descendants` and `build.log` -- https://bugs.ruby-lang.org/

Issue #20203 has been updated by vo.x (Vit Ondruch). I have also reported the issue against GCC in Fedora where the toolchain folks are already at it: https://bugzilla.redhat.com/show_bug.cgi?id=2259845 ---------------------------------------- Bug #20203: `TestEnumerable` test failures with GCC 14 https://bugs.ruby-lang.org/issues/20203#change-106393 * Author: vo.x (Vit Ondruch) * Status: Open * Priority: Normal * ruby -v: ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-linux] * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- There is ongoing mass rebuild in Fedora and that is first time GCC 14 is used and we observe test failures in `TestEnumerable`. Here are a few examples: ~~~ [ 3000/26419] TestEnumerable#test_transient_heap_sort_bymalloc_consolidate(): unaligned fastbin chunk detected ~~~ ~~~ [ 2455/26535] TestEnumerable#test_transient_heap_sort_bycorrupted size vs. prev_size in fastbins ~~~ ~~~ [ 9716/26532] TestEnumerable#test_any_with_unused_blockdouble free or corruption (fasttop) ~~~ The full logs are accessible [here](https://koji.fedoraproject.org/koji/taskinfo?taskID=112176941). Please drill through `Descendants` and `build.log` -- https://bugs.ruby-lang.org/

Issue #20203 has been updated by vo.x (Vit Ondruch). So far, it was attributed to "issue within glibc qsort". ---------------------------------------- Bug #20203: `TestEnumerable` test failures with GCC 14 https://bugs.ruby-lang.org/issues/20203#change-106395 * Author: vo.x (Vit Ondruch) * Status: Open * Priority: Normal * ruby -v: ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-linux] * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- There is ongoing mass rebuild in Fedora and that is first time GCC 14 is used and we observe test failures in `TestEnumerable`. Here are a few examples: ~~~ [ 3000/26419] TestEnumerable#test_transient_heap_sort_bymalloc_consolidate(): unaligned fastbin chunk detected ~~~ ~~~ [ 2455/26535] TestEnumerable#test_transient_heap_sort_bycorrupted size vs. prev_size in fastbins ~~~ ~~~ [ 9716/26532] TestEnumerable#test_any_with_unused_blockdouble free or corruption (fasttop) ~~~ The full logs are accessible [here](https://koji.fedoraproject.org/koji/taskinfo?taskID=112176941). Please drill through `Descendants` and `build.log` -- https://bugs.ruby-lang.org/

Issue #20203 has been updated by vo.x (Vit Ondruch). vo.x (Vit Ondruch) wrote in #note-3:
So far, it was attributed to "issue within glibc qsort".
By mistake. Back to GCC ---------------------------------------- Bug #20203: `TestEnumerable` test failures with GCC 14 https://bugs.ruby-lang.org/issues/20203#change-106401 * Author: vo.x (Vit Ondruch) * Status: Open * Priority: Normal * ruby -v: ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-linux] * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- There is ongoing mass rebuild in Fedora and that is first time GCC 14 is used and we observe test failures in `TestEnumerable`. Here are a few examples: ~~~ [ 3000/26419] TestEnumerable#test_transient_heap_sort_bymalloc_consolidate(): unaligned fastbin chunk detected ~~~ ~~~ [ 2455/26535] TestEnumerable#test_transient_heap_sort_bycorrupted size vs. prev_size in fastbins ~~~ ~~~ [ 9716/26532] TestEnumerable#test_any_with_unused_blockdouble free or corruption (fasttop) ~~~ The full logs are accessible [here](https://koji.fedoraproject.org/koji/taskinfo?taskID=112176941). Please drill through `Descendants` and `build.log` -- https://bugs.ruby-lang.org/

Issue #20203 has been updated by vo.x (Vit Ondruch). So there is more insights from glibc developers and it seems the issue is that "Ruby uses qsort_r in an undefined way". Let me quote @fweimer from [RH bugzilla](https://bugzilla.redhat.com/show_bug.cgi?id=2259845#c15): ~~~ In current rawhide glibc (glibc-2.38.9000-33.fc40.x86_64), a buffer allocated with malloc is used for the qsort scratch buffer. This is actually a glibc bug because the array is very short and we should use an on-stack buffer. I need to confirm the details yet, but I think what happens is that the Ruby garbage collector runs during the sort_by callback. I suspect the collector writes to the array, which is quite undefined (“The comparison function shall not alter the contents of the array.” says the C standard). This causes problems subsequently when we copy back previous array contents from the scratch buffer. With a stack-based buffer, the collector pins objects, so the issue is not visible. Sorry, this is all very speculative, but I don't want you to spend more time chasing this. I can reproduce the crash in Fedora 38 (with upstream Ruby sources) if I increase the size of the array being sorted so that qsort_r uses a malloc-based buffer there as well: diff --git a/test/ruby/test_enum.rb b/test/ruby/test_enum.rb index f7c8f012d8..23e18cc590 100644 --- a/test/ruby/test_enum.rb +++ b/test/ruby/test_enum.rb @@ -871,7 +871,9 @@ class << o; self; end.class_eval do 0 end end - [o, o, o].sort_by {|x| x } + l = [] + (1..100).each {|x| l += [o] } + l.sort_by {|x| x } c.call end The whole thing is probably quite sensitive to allocation patterns etc., so I have no idea how reliable this is as a trigger for the bug. ~~~ and [followup](https://bugzilla.redhat.com/show_bug.cgi?id=2259845#c18) ~~~ With this instrumentation patch applied to glibc: diff --git a/stdlib/qsort.c b/stdlib/qsort.c index 7f5a00fb33..c5263d9f5f 100644 --- a/stdlib/qsort.c +++ b/stdlib/qsort.c @@ -25,6 +25,7 @@ #include <stdlib.h> #include <string.h> #include <stdbool.h> +#include <assert.h> /* Swap SIZE bytes between addresses A and B. These helpers are provided along the generic one as an optimization. */ @@ -338,9 +339,9 @@ indirect_msort_with_tmp (const struct msort_param *p, void *b, size_t n, } } -void -__qsort_r (void *const pbase, size_t total_elems, size_t size, - __compar_d_fn_t cmp, void *arg) +static void +__qsort_r_real (void *const pbase, size_t total_elems, size_t size, + __compar_d_fn_t cmp, void *arg) { if (total_elems <= 1) return; @@ -396,6 +397,43 @@ __qsort_r (void *const pbase, size_t total_elems, size_t size, if (buf != tmp) free (buf); } + +struct qsort_r_data +{ + __compar_d_fn_t cmp; + void *arg; + void *array; + size_t size; + void *copy; +}; + +static int +qsort_compare_wrapper (const void *a, const void *b, void *data1) +{ + struct qsort_r_data *data = data1; + memcpy (data->copy, data->array, data->size); + int ret = data->cmp (a, b, data->arg); + assert (memcmp (data->array, data->copy, data->size) == 0); + return ret; +} + +void +__qsort_r (void *pbase, size_t total_elems, size_t size, + __compar_d_fn_t cmp, void *arg) +{ + struct qsort_r_data data = + { + .cmp = cmp, + .arg = arg, + .array = pbase, + .size = total_elems * size, + }; + data.copy = malloc (data.size); + assert (data.copy != NULL); + __qsort_r_real (pbase, total_elems, size, qsort_compare_wrapper, &data); + free (data.copy); +} + libc_hidden_def (__qsort_r) weak_alias (__qsort_r, qsort_r) And using the Fedora rawhide glibc variant with the heap allocation and the unchanged Ruby test case, I get: [54/83] TestEnumerable#test_callccFatal glibc error: qsort.c:416 (qsort_compare_wrapper): assertion failed: memcmp (data->array, data->copy, data->size) == 0 Thread 1 "ruby" received signal SIGABRT, Aborted. __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44 44 return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0; (gdb) bt #0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44 #1 0x00007ffff7c57423 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78 #2 0x00007ffff7c0493e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #3 0x00007ffff7bec8ff in __GI_abort () at abort.c:79 #4 0x00007ffff7bed7d5 in __libc_message_impl ( fmt=fmt@entry=0x7ffff7d6cba0 "Fatal glibc error: %s:%s (%s): assertion failed: %s\n") at ../sysdeps/posix/libc_fatal.c:132 #5 0x00007ffff7bfcaa9 in __libc_assert_fail ( assertion=assertion@entry=0x7ffff7d6cd70 "memcmp (data->array, data->copy, data->size) == 0", file=file@entry=0x7ffff7d67d51 "qsort.c", line=line@entry=416, function=function@entry=0x7ffff7d71390 <__PRETTY_FUNCTION__.1> "qsort_compare_wrapper") at __libc_assert_fail.c:31 #6 0x00007ffff7c0873c in qsort_compare_wrapper (a=a@entry=0x7fffdc852fe0, b=b@entry=0x7fffdc852ff0, data1=data1@entry=0x7fffffffd520) at qsort.c:416 #7 0x00007ffff7c08923 in msort_with_tmp (p=p@entry=0x7fffffffd0a0, b=b@entry=0x7fffdc852fe0, n=n@entry=2) at qsort.c:276 #8 0x00007ffff7c08ced in msort_with_tmp (n=2, b=0x7fffdc852fe0, p=0x7fffffffd0a0) at qsort.c:202 #9 __qsort_r_real (pbase=pbase@entry=0x7fffdc852fe0, total_elems=total_elems@entry=2, size=size@entry=16, arg=arg@entry=0x7fffffffd520, cmp=0x7ffff7c086c0 <qsort_compare_wrapper>) at qsort.c:394 #10 0x00007ffff7c09140 in __GI___qsort_r (pbase=0x7fffdc852fe0, total_elems=2, size=size@entry=16, cmp=cmp@entry=0x5555559709a0 <sort_by_cmp>, arg=arg@entry=0x7fffdc852fd0) at qsort.c:433 #11 0x000055555596f3ad in enum_sort_by (obj=<optimized out>) at enum.c:1691 I think that's pretty good evidence that ruby uses qsort_r in an undefined way ~~~ ---------------------------------------- Bug #20203: `TestEnumerable` test failures with GCC 14 https://bugs.ruby-lang.org/issues/20203#change-106424 * Author: vo.x (Vit Ondruch) * Status: Open * Priority: Normal * ruby -v: ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-linux] * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- There is ongoing mass rebuild in Fedora and that is first time GCC 14 is used and we observe test failures in `TestEnumerable`. Here are a few examples: ~~~ [ 3000/26419] TestEnumerable#test_transient_heap_sort_bymalloc_consolidate(): unaligned fastbin chunk detected ~~~ ~~~ [ 2455/26535] TestEnumerable#test_transient_heap_sort_bycorrupted size vs. prev_size in fastbins ~~~ ~~~ [ 9716/26532] TestEnumerable#test_any_with_unused_blockdouble free or corruption (fasttop) ~~~ The full logs are accessible [here](https://koji.fedoraproject.org/koji/taskinfo?taskID=112176941). Please drill through `Descendants` and `build.log` -- https://bugs.ruby-lang.org/

Issue #20203 has been updated by fweimer (Florian Weimer). If you want to keep using `qsort_r`, please use it to sort a temporary array of array indices, and not an array of GC-managed pointers. ---------------------------------------- Bug #20203: `TestEnumerable` test failures with GCC 14 https://bugs.ruby-lang.org/issues/20203#change-106428 * Author: vo.x (Vit Ondruch) * Status: Open * Priority: Normal * ruby -v: ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-linux] * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- There is ongoing mass rebuild in Fedora and that is first time GCC 14 is used and we observe test failures in `TestEnumerable`. Here are a few examples: ~~~ [ 3000/26419] TestEnumerable#test_transient_heap_sort_bymalloc_consolidate(): unaligned fastbin chunk detected ~~~ ~~~ [ 2455/26535] TestEnumerable#test_transient_heap_sort_bycorrupted size vs. prev_size in fastbins ~~~ ~~~ [ 9716/26532] TestEnumerable#test_any_with_unused_blockdouble free or corruption (fasttop) ~~~ The full logs are accessible [here](https://koji.fedoraproject.org/koji/taskinfo?taskID=112176941). Please drill through `Descendants` and `build.log` -- https://bugs.ruby-lang.org/

Issue #20203 has been updated by alanwu (Alan Wu). The GC by default doesn't move objects so it shouldn't write to the array. I guess it's a use-after-free and double free caused by `callcc` in the test jumping into the middle of `qsort_r` like the glibc devs noticed https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=dfa3394a605c8f6f25e4f... The `c.call` line in the test jumps back into the middle of qsort_r (with a saved stack) after it already returned, so if the buffer is malloc'd it'd be freed already the second time around. ---------------------------------------- Bug #20203: `TestEnumerable` test failures with GCC 14 https://bugs.ruby-lang.org/issues/20203#change-106456 * Author: vo.x (Vit Ondruch) * Status: Open * Priority: Normal * ruby -v: ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-linux] * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- There is ongoing mass rebuild in Fedora and that is first time GCC 14 is used and we observe test failures in `TestEnumerable`. Here are a few examples: ~~~ [ 3000/26419] TestEnumerable#test_transient_heap_sort_bymalloc_consolidate(): unaligned fastbin chunk detected ~~~ ~~~ [ 2455/26535] TestEnumerable#test_transient_heap_sort_bycorrupted size vs. prev_size in fastbins ~~~ ~~~ [ 9716/26532] TestEnumerable#test_any_with_unused_blockdouble free or corruption (fasttop) ~~~ The full logs are accessible [here](https://koji.fedoraproject.org/koji/taskinfo?taskID=112176941). Please drill through `Descendants` and `build.log` -- https://bugs.ruby-lang.org/

Issue #20203 has been updated by alanwu (Alan Wu). So there are currently 2 issues with using `qsort_r` that I see. 1. As noticed in this issue, if the comparison function uses coroutine/fiber to reenter the middle of `qsort_r`, that results in heap corruption. This happens on older glibc too. Valgrind shows this issue: ```ruby require 'continuation' c = nil o = Object.new class << o; self; end.class_eval do define_method(:<=>) do |x| callcc {|c2| c ||= c2 } 0 end end Array.new(1000, o).sort_by {|x| x } c.call ``` ``` ==8321== Invalid read of size 8 ==8321== at 0x484DE5E: memmove (vg_replace_strmem.c:1410) ==8321== by 0x4E42270: msort_with_tmp (msort.c:44) ==8321== by 0x4E42270: msort_with_tmp.part.0 (msort.c:53) <snip> ==8321== by 0x4E427B5: qsort_r (msort.c:296) ==8321== by 0x492BFEA: enum_sort_by (enum.c:1293) ==8321== Address 0x9bb55a0 is 0 bytes inside a block of size 16,000 free'd ==8321== at 0x484488F: free (vg_replace_malloc.c:985) ==8321== by 0x4E427C2: qsort_r (msort.c:298) <snip> ==8321== Block was alloc'd at ==8321== at 0x4841828: malloc (vg_replace_malloc.c:442) ==8321== by 0x4E42641: qsort_r (msort.c:221) ==8321== by 0x492BFEA: enum_sort_by (enum.c:1293) ``` 2. With GC compaction, the GC can update references inside the comparison function. This is undefined behavior: > The application shall ensure that the comparison function pointed to by _compar_ does not alter the contents of the array. https://pubs.opengroup.org/onlinepubs/9699919799/functions/qsort.html In practice, I've seen this causing issue with `GC.auto_compact` in our app's CI, where sorting leaving moved objects in the array. It seems that to fix these issues we need to stop using `qsort_r` and use our own `ruby_qsort` implementation that doesn't malloc. ---------------------------------------- Bug #20203: `TestEnumerable` test failures with GCC 14 https://bugs.ruby-lang.org/issues/20203#change-106565 * Author: vo.x (Vit Ondruch) * Status: Open * Priority: Normal * ruby -v: ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-linux] * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- There is ongoing mass rebuild in Fedora and that is first time GCC 14 is used and we observe test failures in `TestEnumerable`. Here are a few examples: ~~~ [ 3000/26419] TestEnumerable#test_transient_heap_sort_bymalloc_consolidate(): unaligned fastbin chunk detected ~~~ ~~~ [ 2455/26535] TestEnumerable#test_transient_heap_sort_bycorrupted size vs. prev_size in fastbins ~~~ ~~~ [ 9716/26532] TestEnumerable#test_any_with_unused_blockdouble free or corruption (fasttop) ~~~ The full logs are accessible [here](https://koji.fedoraproject.org/koji/taskinfo?taskID=112176941). Please drill through `Descendants` and `build.log` -- https://bugs.ruby-lang.org/

Issue #20203 has been updated by ko1 (Koichi Sasada). @alanwu do you have measurements with system qsort and ruby's qsort? ---------------------------------------- Bug #20203: `TestEnumerable` test failures with GCC 14 https://bugs.ruby-lang.org/issues/20203#change-106751 * Author: vo.x (Vit Ondruch) * Status: Open * Priority: Normal * ruby -v: ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-linux] * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- There is ongoing mass rebuild in Fedora and that is first time GCC 14 is used and we observe test failures in `TestEnumerable`. Here are a few examples: ~~~ [ 3000/26419] TestEnumerable#test_transient_heap_sort_bymalloc_consolidate(): unaligned fastbin chunk detected ~~~ ~~~ [ 2455/26535] TestEnumerable#test_transient_heap_sort_bycorrupted size vs. prev_size in fastbins ~~~ ~~~ [ 9716/26532] TestEnumerable#test_any_with_unused_blockdouble free or corruption (fasttop) ~~~ The full logs are accessible [here](https://koji.fedoraproject.org/koji/taskinfo?taskID=112176941). Please drill through `Descendants` and `build.log` -- https://bugs.ruby-lang.org/

Issue #20203 has been updated by mame (Yusuke Endoh). This was discussed at the February dev meeting and @matz said "give it a try." @alanwu Can you please discuss with @ko1 and proceed it? ---------------------------------------- Bug #20203: `TestEnumerable` test failures with GCC 14 https://bugs.ruby-lang.org/issues/20203#change-107235 * Author: vo.x (Vit Ondruch) * Status: Open * ruby -v: ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-linux] * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- There is ongoing mass rebuild in Fedora and that is first time GCC 14 is used and we observe test failures in `TestEnumerable`. Here are a few examples: ~~~ [ 3000/26419] TestEnumerable#test_transient_heap_sort_bymalloc_consolidate(): unaligned fastbin chunk detected ~~~ ~~~ [ 2455/26535] TestEnumerable#test_transient_heap_sort_bycorrupted size vs. prev_size in fastbins ~~~ ~~~ [ 9716/26532] TestEnumerable#test_any_with_unused_blockdouble free or corruption (fasttop) ~~~ The full logs are accessible [here](https://koji.fedoraproject.org/koji/taskinfo?taskID=112176941). Please drill through `Descendants` and `build.log` -- https://bugs.ruby-lang.org/

Issue #20203 has been updated by alanwu (Alan Wu). File sort-benchmark-ubuntu.png added File sort-benchmark-macos.png added I ran some benchmarks comparing the builtin `ruby_qsort()` and `qsort_r()` on macOS with an M1 chip and on Ubuntu 22.04 (glibc 2.35) wtih a Xeon Platinum 8000 chip. The rubies are built off of commit:3f5f04afa7 and the `ruby_qsort()` one is built with `configure ac_cv_func_qsort_r=no` to use the builtin sort (verified with a debugger). Note that I used `benchmark-driver` in `time` mode, which picks the number times to repeat based on the workload, so larger input size doesn't necessarily run longer.   With shuffled inputs, the builtin sort is about 5% faster on macOS but 10 to 20% slower on Ubuntu. The builtin sort seems very good with ordered inputs and outperforms `qsort_r()` across the board. Considering it's faster on macOS, I think the builtin sort has acceptable performance. ---------------------------------------- Bug #20203: `TestEnumerable` test failures with GCC 14 https://bugs.ruby-lang.org/issues/20203#change-107391 * Author: vo.x (Vit Ondruch) * Status: Open * ruby -v: ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-linux] * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- There is ongoing mass rebuild in Fedora and that is first time GCC 14 is used and we observe test failures in `TestEnumerable`. Here are a few examples: ~~~ [ 3000/26419] TestEnumerable#test_transient_heap_sort_bymalloc_consolidate(): unaligned fastbin chunk detected ~~~ ~~~ [ 2455/26535] TestEnumerable#test_transient_heap_sort_bycorrupted size vs. prev_size in fastbins ~~~ ~~~ [ 9716/26532] TestEnumerable#test_any_with_unused_blockdouble free or corruption (fasttop) ~~~ The full logs are accessible [here](https://koji.fedoraproject.org/koji/taskinfo?taskID=112176941). Please drill through `Descendants` and `build.log` ---Files-------------------------------- sort-benchmark-ubuntu.png (233 KB) sort-benchmark-macos.png (232 KB) -- https://bugs.ruby-lang.org/

Issue #20203 has been updated by vo.x (Vit Ondruch). Was this ever resolved or is the undefined behavior still used? ---------------------------------------- Bug #20203: `TestEnumerable` test failures with GCC 14 https://bugs.ruby-lang.org/issues/20203#change-112805 * Author: vo.x (Vit Ondruch) * Status: Open * ruby -v: ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-linux] * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- There is ongoing mass rebuild in Fedora and that is first time GCC 14 is used and we observe test failures in `TestEnumerable`. Here are a few examples: ~~~ [ 3000/26419] TestEnumerable#test_transient_heap_sort_bymalloc_consolidate(): unaligned fastbin chunk detected ~~~ ~~~ [ 2455/26535] TestEnumerable#test_transient_heap_sort_bycorrupted size vs. prev_size in fastbins ~~~ ~~~ [ 9716/26532] TestEnumerable#test_any_with_unused_blockdouble free or corruption (fasttop) ~~~ The full logs are accessible [here](https://koji.fedoraproject.org/koji/taskinfo?taskID=112176941). Please drill through `Descendants` and `build.log` ---Files-------------------------------- sort-benchmark-ubuntu.png (233 KB) sort-benchmark-macos.png (232 KB) -- https://bugs.ruby-lang.org/
participants (5)
-
alanwu (Alan Wu)
-
fweimer (Florian Weimer)
-
ko1 (Koichi Sasada)
-
mame (Yusuke Endoh)
-
vo.x (Vit Ondruch)