New subject: [ruby-core:112976] [Ruby master Feature#19541] Proposal: Generate frame unwinding info for YJIT code

19 Mar 2023

Issue #19541 has been reported by kjtsanaktsidis (KJ Tsanaktsidis).

----------------------------------------
Feature #19541: Proposal: Generate frame unwinding info for YJIT code
https://bugs.ruby-lang.org/issues/19541

* Author: kjtsanaktsidis (KJ Tsanaktsidis)
* Status: Open
* Priority: Normal
----------------------------------------
## What is being propsed?

Currently, Ruby crashes with yjit generated code on the stack, `rb_print_backtrace()` is
unable to actually show any frames underneath the yjit code. For example, if you send
SIGSEGV to a Ruby process running yjit, this is what you see:

```
/ruby/miniruby(rb_print_backtrace+0xc) [0xaaaad0276884] /ruby/vm_dump.c:785
/ruby/miniruby(rb_vm_bugreport) /ruby/vm_dump.c:1093
/ruby/miniruby(rb_bug_for_fatal_signal+0xd0) [0xaaaad0075580] /ruby/error.c:813
/ruby/miniruby(sigsegv+0x5c) [0xaaaad01bedac] /ruby/signal.c:919
linux-vdso.so.1(__kernel_rt_sigreturn+0x0) [0xffff91a3e8bc]
/ruby/miniruby(map<(usize, yjit::backend::ir::Insn), (usize, yjit::backend::ir::Insn),
yjit::backend::ir::{impl#17}::next_mapped::{closure_env#0}>+0x8c) [0xaaaad03b8b00]
/rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/option.rs:929
/ruby/miniruby(next_mapped+0x3c) [0xaaaad0291dc0] src/backend/ir.rs:1225
/ruby/miniruby(arm64_split+0x114) [0xaaaad0287744] src/backend/arm64/mod.rs:359
/ruby/miniruby(compile_with_regs+0x80) [0xaaaad028bf84] src/backend/arm64/mod.rs:1106
/ruby/miniruby(compile+0xc4) [0xaaaad0291ae0] src/backend/ir.rs:1158
/ruby/miniruby(gen_single_block+0xe44) [0xaaaad02b1f88] src/codegen.rs:854
/ruby/miniruby(gen_block_series_body+0x9c) [0xaaaad03b0250] src/core.rs:1698
/ruby/miniruby(gen_block_series+0x50) [0xaaaad03b0100] src/core.rs:1676
/ruby/miniruby(branch_stub_hit_body+0x80c) [0xaaaad03b1f68] src/core.rs:2021
/ruby/miniruby({closure#0}+0x28) [0xaaaad02eb86c] src/core.rs:1924
/ruby/miniruby(do_call<yjit::core::branch_stub_hit::{closure_env#0}, *const
u8>+0x98) [0xaaaad035ba3c]
/rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:492
[0xaaaad035c9b4]
```

(n.b. - I compiled Ruby with `-fasynchronous-unwind-tables –rdynamic –g` in cflags to make
sure gcc generates appropriate unwind info & keeps the symbol tables).

Likewise, if you attach gdb to a Ruby process with yjit enabled, gdb can't show thread
backtraces through yjit-generated code either.

My proposal is that YJIT generate sufficient unwinding and debug information on all
platforms to allow both `rb_print_backtrace()` and the platform's debugger
(gdb/lldb/WinDbg) to show:

* Full stack traces all the way back to `main`. That is, it should be possible to see
frames _underneath_ `[0xaaaad035c9b4]` from the backtrace above.
* Names for the dynamically generated yjit blocks (e.g. instead of `[0xaaaad035c9b4]`, we
should see something like `yjit$$name_of_ruby_method`, where `name_of_ruby_method` is the
`label` for the iseq this is JIT'd code for).

## Motivation

I have a few motivations for wanting this. Firstly, I feel this functionality is
independently useful. When Ruby crashes, the more information we can get, the more likely
we are to find the root cause. Likewise, the same principle applies to debugging with gdb
- you can get a fuller understanding of what the process is doing if you see the whole
stack.

I have often found attaching gdb to the Ruby interpreter helps in understanding problems
in Ruby code or C extensions and is something I do relatively frequently; yjit breaking
that will definitely be inconvenient for me!

## Implementation

I have a draft implementation here on how I'd implement this: . It's currently
missing tests & platform support (it only works on Linux aarch64). Also, it implements
unwind info generation, so unwinding can work _through_ yjit code, but it does not
currently emit symbols to give _names_ to those yjit frames.

My PR contains a document which explains how the Linux interfaces for registering unwind
info for JIT'd code work, so I won't duplicate that information here.

The biggest implementation question I had is around the use of Rust crates. Currently, I
prototyped my implementation using the gimli & object crates, for generating DWARF
info and ELF binaries. However, the yjit build does purposefully does not use cargo &
external crates for release builds. There are a few different ways we could go here:

* Don't use the gimli & object crates; instead, re-implement all debug info &
object file generation code in yjit.
* Don't use the crates; instead, link againt C libraries to provide this functionality
& call them from Rust (perhaps some combination of libelf, libdw, libbfd, or llvm
might do what we need)
* Use cargo after all for the release build & download the crates at build-time
* Use cargo for the release build, but vendor everything, so the build doesn't need to
download anything
* Only make unwind info generation available in dev mode where cargo is used, and so mark
the gimli/object dependencies as optional in Cargo.toml.

We'd need to decide on one of these approaches for this proposal to work. I don't
really have a strong sense of the pros/cons of each.

(Side note - my PR actually depends on a _fork_ of gimli - I've been discussing adding
the needed interfaces upstream here: https://github.com/gimli-rs/gimli/issues/648).

## Benchmarks

I ran the yit-bench suite on my branch and compared it to Ruby master:

* My branch: https://gist.github.com/KJTsanaktsidis/5741a9f64e5cd75cdf5fedd846091a4f
* Ruby master: https://gist.github.com/KJTsanaktsidis/592d3ebcf98f6745dfa3efbd30a25acf

This is a (very simple) comparison:

```
-------------- ------------ ------------ ---------------
bench          yjit (ms)    branch (ms)  branch/yjit (%)
activerecord   97.5         98.5         101.03%
hexapdf        2415.3       2458.2       101.78%
liquid-c       61.9         63.1         101.94%
liquid-render  135.3        135.0        99.78%
mail           104.6        105.5        100.86%
psych-load     1887.1       1922.0       101.85%
railsbench     1544.4       1556.0       100.75%
ruby-lsp       88.4         89.5         101.24%
sequel         147.5        151.1        102.44%
binarytrees    303          305.6        100.86%
chunky_png     1075.8       1079.4       100.33%
erubi          392.9        392.3        99.85%
erubi_rails    14.7         14.7         100.00%
etanni         792.3        791.4        99.89%
fannkuchredux  3815.9       3813.6       99.94%
lee            1030.2       1039.2       100.87%
nbody          49.2         49.3         100.20%
optcarrot      4142         4143.3       100.03%
ruby-json      2860.7       2874.0       100.46%
rubykon        7906.6       7904.2       99.97%
30k_ifelse     348.7        345.4        99.05%
30k_methods    828.6        831.8        100.39%
cfunc_itself   28.8         28.9         100.35%
fib            34.4         34.5         100.29%
getivar        115.5        109.7        94.98%
keyword_args   37.7         38.0         100.80%
respond_to     26           26.1         100.38%
setivar        33.8         33.5         99.11%
setivar_object 208.7        194.3        93.10%
str_concat     52.6         52.2         99.24%
throw          23.8         24.1         101.26%
-------------- ------------ ------------ ---------------
```

It seems like the performance impact of generating and registering the debug info is
marginal.

-- 
https://bugs.ruby-lang.org/

[ruby-core:112944] [Ruby master Feature#19541] Proposal: Generate frame unwinding info for YJIT code