Issue #19156 has been updated by mk (Matthias Käppler).
@byroot I wonder if you could help me understand the underlying issue better. I found a
minimal, executable test case that reproduces this issue reliably:
```ruby
Prometheus::Client::MmapedValue.new(:counter, :counter, 'ordered_counter', {
label_1: 'x' * 1024**2 })
# This will crash
ObjectSpace.each_object(String, &:valid_encoding?)
```
I am trying to pour this into an automated test to make sure we won't regress on this
again. This test does not crash if the metric label here is relatively short, a few
characters perhaps. It _does_ crash once the label string grows above a certain size,
however.
Here is what I don't understand yet:
1. `prometheus-client-mmap` calls into `new_str0`, which when the string is large enough,
will be malloc'ed by MRI, correct? I had to make the string _much_ larger than
`sizeof(RVALUE)` for the crash to occur. Aren't strings malloc'ed as soon as they
do not fit into a heap slot anymore?
1. The library proceeds to overwrite the internal pointer pointing to the malloc'ed
memory region and let's it point to the mapped file memory instead. But how come MRI
does not see this when accessing this string memory through a function like
`valid_encoding?` Shouldn't that result in traversing the same pointer, now pointing
to the string data in the memory map? From MRIs perspective, why does it matter where the
actual string data resides?
I also don't think it's because of the `MmapValue` object being GC'ed; I ran
this example through `GC.stress` and even nulled out the parent object creating this
string, but it never crashed in response to that. In fact, I can completely disable GC in
this test case, and it will still crash, which leads me to think this is not related to
the mmap memory being freed before the Ruby string is or vice versa?
----------------------------------------
Bug #19156: ObjectSpace.dump_all segfault during string inspection
https://bugs.ruby-lang.org/issues/19156#change-100432
* Author: mk (Matthias Käppler)
* Status: Third Party's Issue
* Priority: Normal
* ruby -v: ruby 3.0.4p208 (2022-04-12 revision 3fa771dded) [x86_64-linux]
* Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
I am working on a feature that would allow our application to capture heap dumps during
shutdown for later inspection.
These heap dumps are captured via `ObjectSpace.dump_all(output: io)`. While walking the
object space, MRI occasionally segfaults while inspecting string objects in
`search_nonascii` of `string.c`:
```
/usr/local/lib/ruby/3.0.0/objspace.rb:87: [BUG] Segmentation fault at 0x00007efee4201000
ruby 3.0.4p208 (2022-04-12 revision 3fa771dded) [x86_64-linux]
...
-- Control frame information -----------------------------------------------
c:0053 p:---- s:0312 e:000311 CFUNC :_dump_all
c:0052 p:0130 s:0305 e:000304 METHOD /usr/local/lib/ruby/3.0.0/objspace.rb:87
c:0051 p:0023 s:0295 e:000294 METHOD
/home/git/gitlab/lib/gitlab/memory/reports/heap_dump.rb:26
...
-- C level backtrace information -------------------------------------------
/usr/local/lib/libruby.so.3.0(rb_print_backtrace+0x11) [0x7efee4ad0c5e] vm_dump.c:758
/usr/local/lib/libruby.so.3.0(rb_vm_bugreport) vm_dump.c:998
/usr/local/lib/libruby.so.3.0(rb_bug_for_fatal_signal+0xf8) [0x7efee48d0b08] error.c:787
/usr/local/lib/libruby.so.3.0(sigsegv+0x55) [0x7efee4a23db5] signal.c:963
/lib/x86_64-linux-gnu/libpthread.so.0(__restore_rt+0x0) [0x7efee4f12140]
../sysdeps/pthread/funlockfile.c:28
/usr/local/lib/libruby.so.3.0(search_nonascii+0x30) [0x7efee4a3ca60] string.c:552
/usr/local/lib/libruby.so.3.0(coderange_scan) string.c:585
/usr/local/lib/libruby.so.3.0(enc_coderange_scan+0x1b) [0x7efee4a3e28a] string.c:709
/usr/local/lib/libruby.so.3.0(rb_enc_str_coderange) string.c:727
/usr/local/lib/ruby/3.0.0/x86_64-linux/objspace.so(is_broken_string+0x8) [0x7efeced9c304]
../../internal/string.h:116
/usr/local/lib/ruby/3.0.0/x86_64-linux/objspace.so(dump_object) objspace_dump.c:388
/usr/local/lib/ruby/3.0.0/x86_64-linux/objspace.so(heap_i+0x39) [0x7efeced9caa9]
objspace_dump.c:521
/usr/local/lib/libruby.so.3.0(objspace_each_objects_without_setup+0xaf) [0x7efee48e878f]
gc.c:3232
/usr/local/lib/libruby.so.3.0(objspace_each_objects_protected+0x14) [0x7efee48e87c4]
gc.c:3242
/usr/local/lib/libruby.so.3.0(rb_ensure+0x12a) [0x7efee48d96aa] eval.c:1162
/usr/local/lib/libruby.so.3.0(objspace_each_objects+0x28) [0x7efee48fb458] gc.c:3310
/usr/local/lib/libruby.so.3.0(rb_objspace_each_objects) gc.c:3298
/usr/local/lib/ruby/3.0.0/x86_64-linux/objspace.so(objspace_dump_all+0x88)
[0x7efeced9b068] objspace_dump.c:616
...
```
Unfortunately I couldn't get my hands on that memory region to see which strings are
causing this since this doesn't always happen.
I suspect this is also a problem with MRI master since the code looks unchanged from
3.0.4.
--
https://bugs.ruby-lang.org/