Issue #19156 has been updated by mk (Matthias Käppler).
byroot (Jean Boussier) wrote in #note-2:
I suspect this
is also a problem with MRI master since the code looks unchanged from 3.0.4.
Well, It might not be a bug in `objspace_dump.c`.
Here `coderange_scan / search_nonascii` end up reading invalid memory regions which
suggest that you have a corrupted string in your heap. Somehow either that String memory
was freed, or it is pointing to an invalid region in the first place.
This might be a bug in Ruby, but I suspect it might be a C-extension creating an invalid
string. It will be very hard to track down without a repro or a core dump though.
Do you have a core dump? It would be interesting to inspect what this string look like.
Yes, I just obtained one. This is a little tricky to debug, however. The object space dump
is streamed to disk as ndjson, but the last entry I see written to disk changes randomly
so I suspect the heap slot that references the broken string crashes the VM before it even
starts writing the object address to the dump.
In other words I have no idea what this string is or where it is located in memory.
I did look at the memory region of the last entry I can actually see in the dump:
```json
{"address":"0x7f584f4a0930", "type":"SYMBOL",
"class":"0x7f58791d0410", "frozen":true,
"bytesize":11, "value":"GitalyCheck"
```
Note the missing closing brace; so before finalizing this entry, dump_all crashed. I am
not sure whether this is a red herring though; this is a symbol, not a string? And as
mentioned above, it does not always crash here. In a repeat run, the last entry was an
iseq instead.
Anyway I tried inspecting memory around this address to see if anything stands out, but it
doesn't to me anyway:
```
(gdb) dump memory /tmp/worker_mem 0x7f584f4a0930 0x7f584f4a1000
```
```
xxd -l 128 /tmp/worker_mem
00000000: 7408 8000 0000 0000 1004 1d79 587f 0000 t..........yX...
00000010: a6fa 64f8 dfb2 23c9 5809 4a4f 587f 0000 ..d...#.X.JOX...
00000020: 4a6d 2500 0000 0000 65c8 9220 0000 0000 Jm%.....e.. ....
00000030: 2828 1d79 587f 0000 4769 7461 6c79 4368 ((.yX...GitalyCh
00000040: 6563 6b00 0072 6200 0000 0000 0000 0000 eck..rb.........
00000050: 0800 0000 0000 0000 f881 1b79 587f 0000 ...........yX...
00000060: 0000 0000 0000 0000 0800 0000 0000 0000 ................
00000070: 0000 0000 0000 0000 0120 0000 0000 0000 ......... ......
```
I'm not sure that's helpful. Is there anything else I could do?
----------------------------------------
Bug #19156: ObjectSpace.dump_all segfault during string inspection
https://bugs.ruby-lang.org/issues/19156#change-100313
* Author: mk (Matthias Käppler)
* Status: Open
* Priority: Normal
* ruby -v: ruby 3.0.4p208 (2022-04-12 revision 3fa771dded) [x86_64-linux]
* Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
I am working on a feature that would allow our application to capture heap dumps during
shutdown for later inspection.
These heap dumps are captured via `ObjectSpace.dump_all(output: io)`. While walking the
object space, MRI occasionally segfaults while inspecting string objects in
`search_nonascii` of `string.c`:
```
/usr/local/lib/ruby/3.0.0/objspace.rb:87: [BUG] Segmentation fault at 0x00007efee4201000
ruby 3.0.4p208 (2022-04-12 revision 3fa771dded) [x86_64-linux]
...
-- Control frame information -----------------------------------------------
c:0053 p:---- s:0312 e:000311 CFUNC :_dump_all
c:0052 p:0130 s:0305 e:000304 METHOD /usr/local/lib/ruby/3.0.0/objspace.rb:87
c:0051 p:0023 s:0295 e:000294 METHOD
/home/git/gitlab/lib/gitlab/memory/reports/heap_dump.rb:26
...
-- C level backtrace information -------------------------------------------
/usr/local/lib/libruby.so.3.0(rb_print_backtrace+0x11) [0x7efee4ad0c5e] vm_dump.c:758
/usr/local/lib/libruby.so.3.0(rb_vm_bugreport) vm_dump.c:998
/usr/local/lib/libruby.so.3.0(rb_bug_for_fatal_signal+0xf8) [0x7efee48d0b08] error.c:787
/usr/local/lib/libruby.so.3.0(sigsegv+0x55) [0x7efee4a23db5] signal.c:963
/lib/x86_64-linux-gnu/libpthread.so.0(__restore_rt+0x0) [0x7efee4f12140]
../sysdeps/pthread/funlockfile.c:28
/usr/local/lib/libruby.so.3.0(search_nonascii+0x30) [0x7efee4a3ca60] string.c:552
/usr/local/lib/libruby.so.3.0(coderange_scan) string.c:585
/usr/local/lib/libruby.so.3.0(enc_coderange_scan+0x1b) [0x7efee4a3e28a] string.c:709
/usr/local/lib/libruby.so.3.0(rb_enc_str_coderange) string.c:727
/usr/local/lib/ruby/3.0.0/x86_64-linux/objspace.so(is_broken_string+0x8) [0x7efeced9c304]
../../internal/string.h:116
/usr/local/lib/ruby/3.0.0/x86_64-linux/objspace.so(dump_object) objspace_dump.c:388
/usr/local/lib/ruby/3.0.0/x86_64-linux/objspace.so(heap_i+0x39) [0x7efeced9caa9]
objspace_dump.c:521
/usr/local/lib/libruby.so.3.0(objspace_each_objects_without_setup+0xaf) [0x7efee48e878f]
gc.c:3232
/usr/local/lib/libruby.so.3.0(objspace_each_objects_protected+0x14) [0x7efee48e87c4]
gc.c:3242
/usr/local/lib/libruby.so.3.0(rb_ensure+0x12a) [0x7efee48d96aa] eval.c:1162
/usr/local/lib/libruby.so.3.0(objspace_each_objects+0x28) [0x7efee48fb458] gc.c:3310
/usr/local/lib/libruby.so.3.0(rb_objspace_each_objects) gc.c:3298
/usr/local/lib/ruby/3.0.0/x86_64-linux/objspace.so(objspace_dump_all+0x88)
[0x7efeced9b068] objspace_dump.c:616
...
```
Unfortunately I couldn't get my hands on that memory region to see which strings are
causing this since this doesn't always happen.
I suspect this is also a problem with MRI master since the code looks unchanged from
3.0.4.
--
https://bugs.ruby-lang.org/