[ruby-core:112944] [Ruby master Feature#19541] Proposal: Generate frame unwinding info for YJIT code

Issue #19541 has been reported by kjtsanaktsidis (KJ Tsanaktsidis). ---------------------------------------- Feature #19541: Proposal: Generate frame unwinding info for YJIT code https://bugs.ruby-lang.org/issues/19541 * Author: kjtsanaktsidis (KJ Tsanaktsidis) * Status: Open * Priority: Normal ---------------------------------------- ## What is being propsed? Currently, Ruby crashes with yjit generated code on the stack, `rb_print_backtrace()` is unable to actually show any frames underneath the yjit code. For example, if you send SIGSEGV to a Ruby process running yjit, this is what you see: ``` /ruby/miniruby(rb_print_backtrace+0xc) [0xaaaad0276884] /ruby/vm_dump.c:785 /ruby/miniruby(rb_vm_bugreport) /ruby/vm_dump.c:1093 /ruby/miniruby(rb_bug_for_fatal_signal+0xd0) [0xaaaad0075580] /ruby/error.c:813 /ruby/miniruby(sigsegv+0x5c) [0xaaaad01bedac] /ruby/signal.c:919 linux-vdso.so.1(__kernel_rt_sigreturn+0x0) [0xffff91a3e8bc] /ruby/miniruby(map<(usize, yjit::backend::ir::Insn), (usize, yjit::backend::ir::Insn), yjit::backend::ir::{impl#17}::next_mapped::{closure_env#0}>+0x8c) [0xaaaad03b8b00] /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/option.rs:929 /ruby/miniruby(next_mapped+0x3c) [0xaaaad0291dc0] src/backend/ir.rs:1225 /ruby/miniruby(arm64_split+0x114) [0xaaaad0287744] src/backend/arm64/mod.rs:359 /ruby/miniruby(compile_with_regs+0x80) [0xaaaad028bf84] src/backend/arm64/mod.rs:1106 /ruby/miniruby(compile+0xc4) [0xaaaad0291ae0] src/backend/ir.rs:1158 /ruby/miniruby(gen_single_block+0xe44) [0xaaaad02b1f88] src/codegen.rs:854 /ruby/miniruby(gen_block_series_body+0x9c) [0xaaaad03b0250] src/core.rs:1698 /ruby/miniruby(gen_block_series+0x50) [0xaaaad03b0100] src/core.rs:1676 /ruby/miniruby(branch_stub_hit_body+0x80c) [0xaaaad03b1f68] src/core.rs:2021 /ruby/miniruby({closure#0}+0x28) [0xaaaad02eb86c] src/core.rs:1924 /ruby/miniruby(do_call<yjit::core::branch_stub_hit::{closure_env#0}, *const u8>+0x98) [0xaaaad035ba3c] /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:492 [0xaaaad035c9b4] ``` (n.b. - I compiled Ruby with `-fasynchronous-unwind-tables –rdynamic –g` in cflags to make sure gcc generates appropriate unwind info & keeps the symbol tables). Likewise, if you attach gdb to a Ruby process with yjit enabled, gdb can't show thread backtraces through yjit-generated code either. My proposal is that YJIT generate sufficient unwinding and debug information on all platforms to allow both `rb_print_backtrace()` and the platform's debugger (gdb/lldb/WinDbg) to show: * Full stack traces all the way back to `main`. That is, it should be possible to see frames _underneath_ `[0xaaaad035c9b4]` from the backtrace above. * Names for the dynamically generated yjit blocks (e.g. instead of `[0xaaaad035c9b4]`, we should see something like `yjit$$name_of_ruby_method`, where `name_of_ruby_method` is the `label` for the iseq this is JIT'd code for). ## Motivation I have a few motivations for wanting this. Firstly, I feel this functionality is independently useful. When Ruby crashes, the more information we can get, the more likely we are to find the root cause. Likewise, the same principle applies to debugging with gdb - you can get a fuller understanding of what the process is doing if you see the whole stack. I have often found attaching gdb to the Ruby interpreter helps in understanding problems in Ruby code or C extensions and is something I do relatively frequently; yjit breaking that will definitely be inconvenient for me! ## Implementation I have a draft implementation here on how I'd implement this: . It's currently missing tests & platform support (it only works on Linux aarch64). Also, it implements unwind info generation, so unwinding can work _through_ yjit code, but it does not currently emit symbols to give _names_ to those yjit frames. My PR contains a document which explains how the Linux interfaces for registering unwind info for JIT'd code work, so I won't duplicate that information here. The biggest implementation question I had is around the use of Rust crates. Currently, I prototyped my implementation using the gimli & object crates, for generating DWARF info and ELF binaries. However, the yjit build does purposefully does not use cargo & external crates for release builds. There are a few different ways we could go here: * Don't use the gimli & object crates; instead, re-implement all debug info & object file generation code in yjit. * Don't use the crates; instead, link againt C libraries to provide this functionality & call them from Rust (perhaps some combination of libelf, libdw, libbfd, or llvm might do what we need) * Use cargo after all for the release build & download the crates at build-time * Use cargo for the release build, but vendor everything, so the build doesn't need to download anything * Only make unwind info generation available in dev mode where cargo is used, and so mark the gimli/object dependencies as optional in Cargo.toml. We'd need to decide on one of these approaches for this proposal to work. I don't really have a strong sense of the pros/cons of each. (Side note - my PR actually depends on a _fork_ of gimli - I've been discussing adding the needed interfaces upstream here: https://github.com/gimli-rs/gimli/issues/648). ## Benchmarks I ran the yit-bench suite on my branch and compared it to Ruby master: * My branch: https://gist.github.com/KJTsanaktsidis/5741a9f64e5cd75cdf5fedd846091a4f * Ruby master: https://gist.github.com/KJTsanaktsidis/592d3ebcf98f6745dfa3efbd30a25acf This is a (very simple) comparison: ``` -------------- ------------ ------------ --------------- bench yjit (ms) branch (ms) branch/yjit (%) activerecord 97.5 98.5 101.03% hexapdf 2415.3 2458.2 101.78% liquid-c 61.9 63.1 101.94% liquid-render 135.3 135.0 99.78% mail 104.6 105.5 100.86% psych-load 1887.1 1922.0 101.85% railsbench 1544.4 1556.0 100.75% ruby-lsp 88.4 89.5 101.24% sequel 147.5 151.1 102.44% binarytrees 303 305.6 100.86% chunky_png 1075.8 1079.4 100.33% erubi 392.9 392.3 99.85% erubi_rails 14.7 14.7 100.00% etanni 792.3 791.4 99.89% fannkuchredux 3815.9 3813.6 99.94% lee 1030.2 1039.2 100.87% nbody 49.2 49.3 100.20% optcarrot 4142 4143.3 100.03% ruby-json 2860.7 2874.0 100.46% rubykon 7906.6 7904.2 99.97% 30k_ifelse 348.7 345.4 99.05% 30k_methods 828.6 831.8 100.39% cfunc_itself 28.8 28.9 100.35% fib 34.4 34.5 100.29% getivar 115.5 109.7 94.98% keyword_args 37.7 38.0 100.80% respond_to 26 26.1 100.38% setivar 33.8 33.5 99.11% setivar_object 208.7 194.3 93.10% str_concat 52.6 52.2 99.24% throw 23.8 24.1 101.26% -------------- ------------ ------------ --------------- ``` It seems like the performance impact of generating and registering the debug info is marginal. -- https://bugs.ruby-lang.org/

Issue #19541 has been updated by kjtsanaktsidis (KJ Tsanaktsidis). A thought crossed my mind - I wonder if this should actually be implemented in the C parts of ruby, rather than in rust. so it can be shared with RJIT? Or is debug object generation something each jit should do for itself? ---------------------------------------- Feature #19541: Proposal: Generate frame unwinding info for YJIT code https://bugs.ruby-lang.org/issues/19541#change-102504 * Author: kjtsanaktsidis (KJ Tsanaktsidis) * Status: Open * Priority: Normal ---------------------------------------- ## What is being propsed? Currently, when Ruby crashes with yjit generated code on the stack, `rb_print_backtrace()` is unable to actually show any frames underneath the yjit code. For example, if you send SIGSEGV to a Ruby process running yjit, this is what you see: ``` /ruby/miniruby(rb_print_backtrace+0xc) [0xaaaad0276884] /ruby/vm_dump.c:785 /ruby/miniruby(rb_vm_bugreport) /ruby/vm_dump.c:1093 /ruby/miniruby(rb_bug_for_fatal_signal+0xd0) [0xaaaad0075580] /ruby/error.c:813 /ruby/miniruby(sigsegv+0x5c) [0xaaaad01bedac] /ruby/signal.c:919 linux-vdso.so.1(__kernel_rt_sigreturn+0x0) [0xffff91a3e8bc] /ruby/miniruby(map<(usize, yjit::backend::ir::Insn), (usize, yjit::backend::ir::Insn), yjit::backend::ir::{impl#17}::next_mapped::{closure_env#0}>+0x8c) [0xaaaad03b8b00] /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/option.rs:929 /ruby/miniruby(next_mapped+0x3c) [0xaaaad0291dc0] src/backend/ir.rs:1225 /ruby/miniruby(arm64_split+0x114) [0xaaaad0287744] src/backend/arm64/mod.rs:359 /ruby/miniruby(compile_with_regs+0x80) [0xaaaad028bf84] src/backend/arm64/mod.rs:1106 /ruby/miniruby(compile+0xc4) [0xaaaad0291ae0] src/backend/ir.rs:1158 /ruby/miniruby(gen_single_block+0xe44) [0xaaaad02b1f88] src/codegen.rs:854 /ruby/miniruby(gen_block_series_body+0x9c) [0xaaaad03b0250] src/core.rs:1698 /ruby/miniruby(gen_block_series+0x50) [0xaaaad03b0100] src/core.rs:1676 /ruby/miniruby(branch_stub_hit_body+0x80c) [0xaaaad03b1f68] src/core.rs:2021 /ruby/miniruby({closure#0}+0x28) [0xaaaad02eb86c] src/core.rs:1924 /ruby/miniruby(do_call<yjit::core::branch_stub_hit::{closure_env#0}, *const u8>+0x98) [0xaaaad035ba3c] /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:492 [0xaaaad035c9b4] ``` (n.b. - I compiled Ruby with `-fasynchronous-unwind-tables –rdynamic –g` in cflags to make sure gcc generates appropriate unwind info & keeps the symbol tables). Likewise, if you attach gdb to a Ruby process with yjit enabled, gdb can't show thread backtraces through yjit-generated code either. My proposal is that YJIT generate sufficient unwinding and debug information on all platforms to allow both `rb_print_backtrace()` and the platform's debugger (gdb/lldb/WinDbg) to show: * Full stack traces all the way back to `main`. That is, it should be possible to see frames _underneath_ `[0xaaaad035c9b4]` from the backtrace above. * Names for the dynamically generated yjit blocks (e.g. instead of `[0xaaaad035c9b4]`, we should see something like `yjit$$name_of_ruby_method`, where `name_of_ruby_method` is the `label` for the iseq this is JIT'd code for). ## Motivation I have a few motivations for wanting this. Firstly, I feel this functionality is independently useful. When Ruby crashes, the more information we can get, the more likely we are to find the root cause. Likewise, the same principle applies to debugging with gdb - you can get a fuller understanding of what the process is doing if you see the whole stack. I have often found attaching gdb to the Ruby interpreter helps in understanding problems in Ruby code or C extensions and is something I do relatively frequently; yjit breaking that will definitely be inconvenient for me! ## Implementation I have a draft implementation here on how I'd implement this: https://github.com/ruby/ruby/pull/7567. It's currently missing tests & platform support (it only works on Linux aarch64). Also, it implements unwind info generation, so unwinding can work _through_ yjit code, but it does not currently emit symbols to give _names_ to those yjit frames. My PR contains a document which explains how the Linux interfaces for registering unwind info for JIT'd code work, so I won't duplicate that information here. The biggest implementation question I had is around the use of Rust crates. Currently, I prototyped my implementation using the gimli & object crates, for generating DWARF info and ELF binaries. However, the yjit build does purposefully does not use cargo & external crates for release builds. There are a few different ways we could go here: * Don't use the gimli & object crates; instead, re-implement all debug info & object file generation code in yjit. * Don't use the crates; instead, link againt C libraries to provide this functionality & call them from Rust (perhaps some combination of libelf, libdw, libbfd, or llvm might do what we need) * Use cargo after all for the release build & download the crates at build-time * Use cargo for the release build, but vendor everything, so the build doesn't need to download anything * Only make unwind info generation available in dev mode where cargo is used, and so mark the gimli/object dependencies as optional in Cargo.toml. We'd need to decide on one of these approaches for this proposal to work. I don't really have a strong sense of the pros/cons of each. (Side note - my PR actually depends on a _fork_ of gimli - I've been discussing adding the needed interfaces upstream here: https://github.com/gimli-rs/gimli/issues/648). ## Benchmarks I ran the yit-bench suite on my branch and compared it to Ruby master: * My branch: https://gist.github.com/KJTsanaktsidis/5741a9f64e5cd75cdf5fedd846091a4f * Ruby master: https://gist.github.com/KJTsanaktsidis/592d3ebcf98f6745dfa3efbd30a25acf This is a (very simple) comparison: ``` -------------- ------------ ------------ --------------- bench yjit (ms) branch (ms) branch/yjit (%) activerecord 97.5 98.5 101.03% hexapdf 2415.3 2458.2 101.78% liquid-c 61.9 63.1 101.94% liquid-render 135.3 135.0 99.78% mail 104.6 105.5 100.86% psych-load 1887.1 1922.0 101.85% railsbench 1544.4 1556.0 100.75% ruby-lsp 88.4 89.5 101.24% sequel 147.5 151.1 102.44% binarytrees 303 305.6 100.86% chunky_png 1075.8 1079.4 100.33% erubi 392.9 392.3 99.85% erubi_rails 14.7 14.7 100.00% etanni 792.3 791.4 99.89% fannkuchredux 3815.9 3813.6 99.94% lee 1030.2 1039.2 100.87% nbody 49.2 49.3 100.20% optcarrot 4142 4143.3 100.03% ruby-json 2860.7 2874.0 100.46% rubykon 7906.6 7904.2 99.97% 30k_ifelse 348.7 345.4 99.05% 30k_methods 828.6 831.8 100.39% cfunc_itself 28.8 28.9 100.35% fib 34.4 34.5 100.29% getivar 115.5 109.7 94.98% keyword_args 37.7 38.0 100.80% respond_to 26 26.1 100.38% setivar 33.8 33.5 99.11% setivar_object 208.7 194.3 93.10% str_concat 52.6 52.2 99.24% throw 23.8 24.1 101.26% -------------- ------------ ------------ --------------- ``` It seems like the performance impact of generating and registering the debug info is marginal. -- https://bugs.ruby-lang.org/

Issue #19541 has been updated by hsbt (Hiroshi SHIBATA). Status changed from Open to Assigned Assignee set to yjit ---------------------------------------- Feature #19541: Proposal: Generate frame unwinding info for YJIT code https://bugs.ruby-lang.org/issues/19541#change-102505 * Author: kjtsanaktsidis (KJ Tsanaktsidis) * Status: Assigned * Priority: Normal * Assignee: yjit ---------------------------------------- ## What is being propsed? Currently, when Ruby crashes with yjit generated code on the stack, `rb_print_backtrace()` is unable to actually show any frames underneath the yjit code. For example, if you send SIGSEGV to a Ruby process running yjit, this is what you see: ``` /ruby/miniruby(rb_print_backtrace+0xc) [0xaaaad0276884] /ruby/vm_dump.c:785 /ruby/miniruby(rb_vm_bugreport) /ruby/vm_dump.c:1093 /ruby/miniruby(rb_bug_for_fatal_signal+0xd0) [0xaaaad0075580] /ruby/error.c:813 /ruby/miniruby(sigsegv+0x5c) [0xaaaad01bedac] /ruby/signal.c:919 linux-vdso.so.1(__kernel_rt_sigreturn+0x0) [0xffff91a3e8bc] /ruby/miniruby(map<(usize, yjit::backend::ir::Insn), (usize, yjit::backend::ir::Insn), yjit::backend::ir::{impl#17}::next_mapped::{closure_env#0}>+0x8c) [0xaaaad03b8b00] /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/option.rs:929 /ruby/miniruby(next_mapped+0x3c) [0xaaaad0291dc0] src/backend/ir.rs:1225 /ruby/miniruby(arm64_split+0x114) [0xaaaad0287744] src/backend/arm64/mod.rs:359 /ruby/miniruby(compile_with_regs+0x80) [0xaaaad028bf84] src/backend/arm64/mod.rs:1106 /ruby/miniruby(compile+0xc4) [0xaaaad0291ae0] src/backend/ir.rs:1158 /ruby/miniruby(gen_single_block+0xe44) [0xaaaad02b1f88] src/codegen.rs:854 /ruby/miniruby(gen_block_series_body+0x9c) [0xaaaad03b0250] src/core.rs:1698 /ruby/miniruby(gen_block_series+0x50) [0xaaaad03b0100] src/core.rs:1676 /ruby/miniruby(branch_stub_hit_body+0x80c) [0xaaaad03b1f68] src/core.rs:2021 /ruby/miniruby({closure#0}+0x28) [0xaaaad02eb86c] src/core.rs:1924 /ruby/miniruby(do_call<yjit::core::branch_stub_hit::{closure_env#0}, *const u8>+0x98) [0xaaaad035ba3c] /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:492 [0xaaaad035c9b4] ``` (n.b. - I compiled Ruby with `-fasynchronous-unwind-tables –rdynamic –g` in cflags to make sure gcc generates appropriate unwind info & keeps the symbol tables). Likewise, if you attach gdb to a Ruby process with yjit enabled, gdb can't show thread backtraces through yjit-generated code either. My proposal is that YJIT generate sufficient unwinding and debug information on all platforms to allow both `rb_print_backtrace()` and the platform's debugger (gdb/lldb/WinDbg) to show: * Full stack traces all the way back to `main`. That is, it should be possible to see frames _underneath_ `[0xaaaad035c9b4]` from the backtrace above. * Names for the dynamically generated yjit blocks (e.g. instead of `[0xaaaad035c9b4]`, we should see something like `yjit$$name_of_ruby_method`, where `name_of_ruby_method` is the `label` for the iseq this is JIT'd code for). ## Motivation I have a few motivations for wanting this. Firstly, I feel this functionality is independently useful. When Ruby crashes, the more information we can get, the more likely we are to find the root cause. Likewise, the same principle applies to debugging with gdb - you can get a fuller understanding of what the process is doing if you see the whole stack. I have often found attaching gdb to the Ruby interpreter helps in understanding problems in Ruby code or C extensions and is something I do relatively frequently; yjit breaking that will definitely be inconvenient for me! ## Implementation I have a draft implementation here on how I'd implement this: https://github.com/ruby/ruby/pull/7567. It's currently missing tests & platform support (it only works on Linux aarch64). Also, it implements unwind info generation, so unwinding can work _through_ yjit code, but it does not currently emit symbols to give _names_ to those yjit frames. My PR contains a document which explains how the Linux interfaces for registering unwind info for JIT'd code work, so I won't duplicate that information here. The biggest implementation question I had is around the use of Rust crates. Currently, I prototyped my implementation using the gimli & object crates, for generating DWARF info and ELF binaries. However, the yjit build does purposefully does not use cargo & external crates for release builds. There are a few different ways we could go here: * Don't use the gimli & object crates; instead, re-implement all debug info & object file generation code in yjit. * Don't use the crates; instead, link againt C libraries to provide this functionality & call them from Rust (perhaps some combination of libelf, libdw, libbfd, or llvm might do what we need) * Use cargo after all for the release build & download the crates at build-time * Use cargo for the release build, but vendor everything, so the build doesn't need to download anything * Only make unwind info generation available in dev mode where cargo is used, and so mark the gimli/object dependencies as optional in Cargo.toml. We'd need to decide on one of these approaches for this proposal to work. I don't really have a strong sense of the pros/cons of each. (Side note - my PR actually depends on a _fork_ of gimli - I've been discussing adding the needed interfaces upstream here: https://github.com/gimli-rs/gimli/issues/648). ## Benchmarks I ran the yit-bench suite on my branch and compared it to Ruby master: * My branch: https://gist.github.com/KJTsanaktsidis/5741a9f64e5cd75cdf5fedd846091a4f * Ruby master: https://gist.github.com/KJTsanaktsidis/592d3ebcf98f6745dfa3efbd30a25acf This is a (very simple) comparison: ``` -------------- ------------ ------------ --------------- bench yjit (ms) branch (ms) branch/yjit (%) activerecord 97.5 98.5 101.03% hexapdf 2415.3 2458.2 101.78% liquid-c 61.9 63.1 101.94% liquid-render 135.3 135.0 99.78% mail 104.6 105.5 100.86% psych-load 1887.1 1922.0 101.85% railsbench 1544.4 1556.0 100.75% ruby-lsp 88.4 89.5 101.24% sequel 147.5 151.1 102.44% binarytrees 303 305.6 100.86% chunky_png 1075.8 1079.4 100.33% erubi 392.9 392.3 99.85% erubi_rails 14.7 14.7 100.00% etanni 792.3 791.4 99.89% fannkuchredux 3815.9 3813.6 99.94% lee 1030.2 1039.2 100.87% nbody 49.2 49.3 100.20% optcarrot 4142 4143.3 100.03% ruby-json 2860.7 2874.0 100.46% rubykon 7906.6 7904.2 99.97% 30k_ifelse 348.7 345.4 99.05% 30k_methods 828.6 831.8 100.39% cfunc_itself 28.8 28.9 100.35% fib 34.4 34.5 100.29% getivar 115.5 109.7 94.98% keyword_args 37.7 38.0 100.80% respond_to 26 26.1 100.38% setivar 33.8 33.5 99.11% setivar_object 208.7 194.3 93.10% str_concat 52.6 52.2 99.24% throw 23.8 24.1 101.26% -------------- ------------ ------------ --------------- ``` It seems like the performance impact of generating and registering the debug info is marginal. -- https://bugs.ruby-lang.org/

Issue #19541 has been updated by k0kubun (Takashi Kokubun).
I wonder if this should actually be implemented in the C parts of ruby, rather than in rust.
RJIT's goal is to help YJIT. We shouldn't consider writing something in C instead of Rust just for RJIT. We should choose what's the best for YJIT, and RJIT could separately maintain it as needed. ---------------------------------------- Feature #19541: Proposal: Generate frame unwinding info for YJIT code https://bugs.ruby-lang.org/issues/19541#change-102506 * Author: kjtsanaktsidis (KJ Tsanaktsidis) * Status: Assigned * Priority: Normal * Assignee: yjit ---------------------------------------- ## What is being propsed? Currently, when Ruby crashes with yjit generated code on the stack, `rb_print_backtrace()` is unable to actually show any frames underneath the yjit code. For example, if you send SIGSEGV to a Ruby process running yjit, this is what you see: ``` /ruby/miniruby(rb_print_backtrace+0xc) [0xaaaad0276884] /ruby/vm_dump.c:785 /ruby/miniruby(rb_vm_bugreport) /ruby/vm_dump.c:1093 /ruby/miniruby(rb_bug_for_fatal_signal+0xd0) [0xaaaad0075580] /ruby/error.c:813 /ruby/miniruby(sigsegv+0x5c) [0xaaaad01bedac] /ruby/signal.c:919 linux-vdso.so.1(__kernel_rt_sigreturn+0x0) [0xffff91a3e8bc] /ruby/miniruby(map<(usize, yjit::backend::ir::Insn), (usize, yjit::backend::ir::Insn), yjit::backend::ir::{impl#17}::next_mapped::{closure_env#0}>+0x8c) [0xaaaad03b8b00] /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/option.rs:929 /ruby/miniruby(next_mapped+0x3c) [0xaaaad0291dc0] src/backend/ir.rs:1225 /ruby/miniruby(arm64_split+0x114) [0xaaaad0287744] src/backend/arm64/mod.rs:359 /ruby/miniruby(compile_with_regs+0x80) [0xaaaad028bf84] src/backend/arm64/mod.rs:1106 /ruby/miniruby(compile+0xc4) [0xaaaad0291ae0] src/backend/ir.rs:1158 /ruby/miniruby(gen_single_block+0xe44) [0xaaaad02b1f88] src/codegen.rs:854 /ruby/miniruby(gen_block_series_body+0x9c) [0xaaaad03b0250] src/core.rs:1698 /ruby/miniruby(gen_block_series+0x50) [0xaaaad03b0100] src/core.rs:1676 /ruby/miniruby(branch_stub_hit_body+0x80c) [0xaaaad03b1f68] src/core.rs:2021 /ruby/miniruby({closure#0}+0x28) [0xaaaad02eb86c] src/core.rs:1924 /ruby/miniruby(do_call<yjit::core::branch_stub_hit::{closure_env#0}, *const u8>+0x98) [0xaaaad035ba3c] /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:492 [0xaaaad035c9b4] ``` (n.b. - I compiled Ruby with `-fasynchronous-unwind-tables –rdynamic –g` in cflags to make sure gcc generates appropriate unwind info & keeps the symbol tables). Likewise, if you attach gdb to a Ruby process with yjit enabled, gdb can't show thread backtraces through yjit-generated code either. My proposal is that YJIT generate sufficient unwinding and debug information on all platforms to allow both `rb_print_backtrace()` and the platform's debugger (gdb/lldb/WinDbg) to show: * Full stack traces all the way back to `main`. That is, it should be possible to see frames _underneath_ `[0xaaaad035c9b4]` from the backtrace above. * Names for the dynamically generated yjit blocks (e.g. instead of `[0xaaaad035c9b4]`, we should see something like `yjit$$name_of_ruby_method`, where `name_of_ruby_method` is the `label` for the iseq this is JIT'd code for). ## Motivation I have a few motivations for wanting this. Firstly, I feel this functionality is independently useful. When Ruby crashes, the more information we can get, the more likely we are to find the root cause. Likewise, the same principle applies to debugging with gdb - you can get a fuller understanding of what the process is doing if you see the whole stack. I have often found attaching gdb to the Ruby interpreter helps in understanding problems in Ruby code or C extensions and is something I do relatively frequently; yjit breaking that will definitely be inconvenient for me! ## Implementation I have a draft implementation here on how I'd implement this: https://github.com/ruby/ruby/pull/7567. It's currently missing tests & platform support (it only works on Linux aarch64). Also, it implements unwind info generation, so unwinding can work _through_ yjit code, but it does not currently emit symbols to give _names_ to those yjit frames. My PR contains a document which explains how the Linux interfaces for registering unwind info for JIT'd code work, so I won't duplicate that information here. The biggest implementation question I had is around the use of Rust crates. Currently, I prototyped my implementation using the gimli & object crates, for generating DWARF info and ELF binaries. However, the yjit build does purposefully does not use cargo & external crates for release builds. There are a few different ways we could go here: * Don't use the gimli & object crates; instead, re-implement all debug info & object file generation code in yjit. * Don't use the crates; instead, link againt C libraries to provide this functionality & call them from Rust (perhaps some combination of libelf, libdw, libbfd, or llvm might do what we need) * Use cargo after all for the release build & download the crates at build-time * Use cargo for the release build, but vendor everything, so the build doesn't need to download anything * Only make unwind info generation available in dev mode where cargo is used, and so mark the gimli/object dependencies as optional in Cargo.toml. We'd need to decide on one of these approaches for this proposal to work. I don't really have a strong sense of the pros/cons of each. (Side note - my PR actually depends on a _fork_ of gimli - I've been discussing adding the needed interfaces upstream here: https://github.com/gimli-rs/gimli/issues/648). ## Benchmarks I ran the yit-bench suite on my branch and compared it to Ruby master: * My branch: https://gist.github.com/KJTsanaktsidis/5741a9f64e5cd75cdf5fedd846091a4f * Ruby master: https://gist.github.com/KJTsanaktsidis/592d3ebcf98f6745dfa3efbd30a25acf This is a (very simple) comparison: ``` -------------- ------------ ------------ --------------- bench yjit (ms) branch (ms) branch/yjit (%) activerecord 97.5 98.5 101.03% hexapdf 2415.3 2458.2 101.78% liquid-c 61.9 63.1 101.94% liquid-render 135.3 135.0 99.78% mail 104.6 105.5 100.86% psych-load 1887.1 1922.0 101.85% railsbench 1544.4 1556.0 100.75% ruby-lsp 88.4 89.5 101.24% sequel 147.5 151.1 102.44% binarytrees 303 305.6 100.86% chunky_png 1075.8 1079.4 100.33% erubi 392.9 392.3 99.85% erubi_rails 14.7 14.7 100.00% etanni 792.3 791.4 99.89% fannkuchredux 3815.9 3813.6 99.94% lee 1030.2 1039.2 100.87% nbody 49.2 49.3 100.20% optcarrot 4142 4143.3 100.03% ruby-json 2860.7 2874.0 100.46% rubykon 7906.6 7904.2 99.97% 30k_ifelse 348.7 345.4 99.05% 30k_methods 828.6 831.8 100.39% cfunc_itself 28.8 28.9 100.35% fib 34.4 34.5 100.29% getivar 115.5 109.7 94.98% keyword_args 37.7 38.0 100.80% respond_to 26 26.1 100.38% setivar 33.8 33.5 99.11% setivar_object 208.7 194.3 93.10% str_concat 52.6 52.2 99.24% throw 23.8 24.1 101.26% -------------- ------------ ------------ --------------- ``` It seems like the performance impact of generating and registering the debug info is marginal. -- https://bugs.ruby-lang.org/

Issue #19541 has been updated by alanwu (Alan Wu). Thank you for looking at this. You clearly put in a lot of effort. However, this proposal conflates too many concerns, while the goals are related, the solution to solve each one have different constraints. I suggest sending smaller proposals in the future. I'll respond to just the unwinding concern here, because that has an implemented proof-of-concept. At first blush, for solving a debugging concerns, the added complexity from depending on the massive `glimi` crate feels bad. Also, the need to generate ELF objects in-memory is antithetical to YJIT's goal of keeping memory consumption low. For the goal of providing unwindability in release builds, generating DWARF and ELF objects in memory is more complex than needs to be. DWARF unwind is very expressive, way more powerful than what we need to unwind through YJIT generated frames. The unused complexity show up as extra memory consumption. The Linux kernel has [its own unwinding format][1] partly because DWARF is more complex than what they need. What YJIT needs is even simpler than what Linux needs. Since you mentioned WinDbg, unwindability is technically an ABI requirement on Windows. The interface there doesn't require pre-registration for each piece of code; it simply calls back when unwinding needs to happen. That interface, combined with a designed prologue, should allows for unwinding through generated frames without *any* extra metadata. This is ideal memory consumption wise. For platforms YJIT already supports, we might have no choice but to register code before hand. Registering using the GDB interface seems less than ideal, though. It requires generating ELF objects in-memory, which is bad for memory consumption, and it's also [known to be not have the best speed][2]. For cases where Ruby already links with `libunwind` (some Linux distros and BSDs), we can register with [its dynamic interface][3], or use it to teach `addr2line.c` how to unwind through YJIT frames without needing to generate extra metadata. Note that on A64 macOS, because Apple [mandates][4] frame pointer unwinding, LLDB already unwinds through YJIT frames just fine. We generate the same code on A64 Linux with GNU userspace, but the same guarantee doesn't exist there. In summary, I do agree that we should try to give fuller backtraces in the bug reporter and help debuggers, but if the proposal is "let's use a bunch of memory and take on a few big dependencies to do it", then the answer is no. That competes with and undermines too many other goals. [1]: https://www.kernel.org/doc/html/latest/x86/orc-unwinder.html [2]: https://github.com/JuliaLang/julia/issues/14846 [3]: https://www.nongnu.org/libunwind/man/libunwind-dynamic(3).html [4]: https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple... ---------------------------------------- Feature #19541: Proposal: Generate frame unwinding info for YJIT code https://bugs.ruby-lang.org/issues/19541#change-102507 * Author: kjtsanaktsidis (KJ Tsanaktsidis) * Status: Assigned * Priority: Normal * Assignee: yjit ---------------------------------------- ## What is being propsed? Currently, when Ruby crashes with yjit generated code on the stack, `rb_print_backtrace()` is unable to actually show any frames underneath the yjit code. For example, if you send SIGSEGV to a Ruby process running yjit, this is what you see: ``` /ruby/miniruby(rb_print_backtrace+0xc) [0xaaaad0276884] /ruby/vm_dump.c:785 /ruby/miniruby(rb_vm_bugreport) /ruby/vm_dump.c:1093 /ruby/miniruby(rb_bug_for_fatal_signal+0xd0) [0xaaaad0075580] /ruby/error.c:813 /ruby/miniruby(sigsegv+0x5c) [0xaaaad01bedac] /ruby/signal.c:919 linux-vdso.so.1(__kernel_rt_sigreturn+0x0) [0xffff91a3e8bc] /ruby/miniruby(map<(usize, yjit::backend::ir::Insn), (usize, yjit::backend::ir::Insn), yjit::backend::ir::{impl#17}::next_mapped::{closure_env#0}>+0x8c) [0xaaaad03b8b00] /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/option.rs:929 /ruby/miniruby(next_mapped+0x3c) [0xaaaad0291dc0] src/backend/ir.rs:1225 /ruby/miniruby(arm64_split+0x114) [0xaaaad0287744] src/backend/arm64/mod.rs:359 /ruby/miniruby(compile_with_regs+0x80) [0xaaaad028bf84] src/backend/arm64/mod.rs:1106 /ruby/miniruby(compile+0xc4) [0xaaaad0291ae0] src/backend/ir.rs:1158 /ruby/miniruby(gen_single_block+0xe44) [0xaaaad02b1f88] src/codegen.rs:854 /ruby/miniruby(gen_block_series_body+0x9c) [0xaaaad03b0250] src/core.rs:1698 /ruby/miniruby(gen_block_series+0x50) [0xaaaad03b0100] src/core.rs:1676 /ruby/miniruby(branch_stub_hit_body+0x80c) [0xaaaad03b1f68] src/core.rs:2021 /ruby/miniruby({closure#0}+0x28) [0xaaaad02eb86c] src/core.rs:1924 /ruby/miniruby(do_call<yjit::core::branch_stub_hit::{closure_env#0}, *const u8>+0x98) [0xaaaad035ba3c] /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:492 [0xaaaad035c9b4] ``` (n.b. - I compiled Ruby with `-fasynchronous-unwind-tables –rdynamic –g` in cflags to make sure gcc generates appropriate unwind info & keeps the symbol tables). Likewise, if you attach gdb to a Ruby process with yjit enabled, gdb can't show thread backtraces through yjit-generated code either. My proposal is that YJIT generate sufficient unwinding and debug information on all platforms to allow both `rb_print_backtrace()` and the platform's debugger (gdb/lldb/WinDbg) to show: * Full stack traces all the way back to `main`. That is, it should be possible to see frames _underneath_ `[0xaaaad035c9b4]` from the backtrace above. * Names for the dynamically generated yjit blocks (e.g. instead of `[0xaaaad035c9b4]`, we should see something like `yjit$$name_of_ruby_method`, where `name_of_ruby_method` is the `label` for the iseq this is JIT'd code for). ## Motivation I have a few motivations for wanting this. Firstly, I feel this functionality is independently useful. When Ruby crashes, the more information we can get, the more likely we are to find the root cause. Likewise, the same principle applies to debugging with gdb - you can get a fuller understanding of what the process is doing if you see the whole stack. I have often found attaching gdb to the Ruby interpreter helps in understanding problems in Ruby code or C extensions and is something I do relatively frequently; yjit breaking that will definitely be inconvenient for me! ## Implementation I have a draft implementation here on how I'd implement this: https://github.com/ruby/ruby/pull/7567. It's currently missing tests & platform support (it only works on Linux aarch64). Also, it implements unwind info generation, so unwinding can work _through_ yjit code, but it does not currently emit symbols to give _names_ to those yjit frames. My PR contains a document which explains how the Linux interfaces for registering unwind info for JIT'd code work, so I won't duplicate that information here. The biggest implementation question I had is around the use of Rust crates. Currently, I prototyped my implementation using the gimli & object crates, for generating DWARF info and ELF binaries. However, the yjit build does purposefully does not use cargo & external crates for release builds. There are a few different ways we could go here: * Don't use the gimli & object crates; instead, re-implement all debug info & object file generation code in yjit. * Don't use the crates; instead, link againt C libraries to provide this functionality & call them from Rust (perhaps some combination of libelf, libdw, libbfd, or llvm might do what we need) * Use cargo after all for the release build & download the crates at build-time * Use cargo for the release build, but vendor everything, so the build doesn't need to download anything * Only make unwind info generation available in dev mode where cargo is used, and so mark the gimli/object dependencies as optional in Cargo.toml. We'd need to decide on one of these approaches for this proposal to work. I don't really have a strong sense of the pros/cons of each. (Side note - my PR actually depends on a _fork_ of gimli - I've been discussing adding the needed interfaces upstream here: https://github.com/gimli-rs/gimli/issues/648). ## Benchmarks I ran the yit-bench suite on my branch and compared it to Ruby master: * My branch: https://gist.github.com/KJTsanaktsidis/5741a9f64e5cd75cdf5fedd846091a4f * Ruby master: https://gist.github.com/KJTsanaktsidis/592d3ebcf98f6745dfa3efbd30a25acf This is a (very simple) comparison: ``` -------------- ------------ ------------ --------------- bench yjit (ms) branch (ms) branch/yjit (%) activerecord 97.5 98.5 101.03% hexapdf 2415.3 2458.2 101.78% liquid-c 61.9 63.1 101.94% liquid-render 135.3 135.0 99.78% mail 104.6 105.5 100.86% psych-load 1887.1 1922.0 101.85% railsbench 1544.4 1556.0 100.75% ruby-lsp 88.4 89.5 101.24% sequel 147.5 151.1 102.44% binarytrees 303 305.6 100.86% chunky_png 1075.8 1079.4 100.33% erubi 392.9 392.3 99.85% erubi_rails 14.7 14.7 100.00% etanni 792.3 791.4 99.89% fannkuchredux 3815.9 3813.6 99.94% lee 1030.2 1039.2 100.87% nbody 49.2 49.3 100.20% optcarrot 4142 4143.3 100.03% ruby-json 2860.7 2874.0 100.46% rubykon 7906.6 7904.2 99.97% 30k_ifelse 348.7 345.4 99.05% 30k_methods 828.6 831.8 100.39% cfunc_itself 28.8 28.9 100.35% fib 34.4 34.5 100.29% getivar 115.5 109.7 94.98% keyword_args 37.7 38.0 100.80% respond_to 26 26.1 100.38% setivar 33.8 33.5 99.11% setivar_object 208.7 194.3 93.10% str_concat 52.6 52.2 99.24% throw 23.8 24.1 101.26% -------------- ------------ ------------ --------------- ``` It seems like the performance impact of generating and registering the debug info is marginal. -- https://bugs.ruby-lang.org/

Issue #19541 has been updated by kjtsanaktsidis (KJ Tsanaktsidis). Thanks Alan for your feedback and clarifying YJIT's goals for me. First off, let me confirm I'm on the same page as you about a couple of things. I totally agree the unwind-info-registration API's in GNU land are _awful_. Windows does this way better with `RtlInstallFunctionTableCallback` - it covers both in-process and out-of-process unwinding in a lazy way. Alas, not what we have available on GNU/Linux. I agree with your premise that YJIT does not muck around with the stack in very creative ways, and very little information is actually needed to unwind through YJIT frames. My approach in my POC was to make Ruby use the most obvious and well-exercised platform APIs for registering unwind info, which to me seemed to be `__register_frame` and `__jit_debug_register_code`. That is, I went through the rigmarole of DWARF CFI & ELF generation to try and be a good platform citizen. I also agree however that this is pretty heavyweight - the ELF file generation especially because it has to be regenerated periodically if it runs out of "free" space to jam more debuginfo in there. Finally, I also acknowledge that adding Rust dependencies increases compile times & is a huge pain for downstream distributors etc, and you've gone to quite some effort to _not_ do that - I assume these are the main issues with actually adding the gimli/object dependencies per-se? The general "vibe" I get from your feedback is that we don't want to introduce huge implementation complexity just to make YJIT use the "standard" unwinding mechanisms; rather, we should actually implement the simplest thing that works for YJIT, and then tailor _that_ to platform interfaces. One final thing to clarify though:
For cases where Ruby already links with libunwind (some Linux distros and BSDs), we can register with its dynamic interface
If you're referring to `UNW_INFO_FORMAT_DYNAMIC` info, that's actually totally unimplemented in libunwind for anything except Itanium (which... I assume is not a target YJIT wants to support xD ). `UNW_INFO_FORMAT_TABLE` works AFAICT, but requries generating DWARF CFI info (which is something we'd like to avoid). --- OK, so what can we do that satisfies the following constraints? 1. Lets us unwind stacks containing YJIT frames in both GDB and the crash reporter 2. Does not require us to construct complex in-memory structures which are really designed for on-disk use (i.e. no ELF files) 3. Does not require us to use DWARF CFI (which is far too complex for the simple stacks that YJIT lays out) 4. Has very little runtime CPU cost to construct and register 5. Has very little runtime memory cost to have hanging around I think I have a rough idea of something that might fit the bill. Firstly, let's have YJIT generate a "compact unwind info format" of our own. I definitely need to experiment with implementation before being too specific here, but roughly... * There would actually be two separate tables - one for inline, and one for outline. * It would be sorted by IP * It would be only _appended_ to when code is generated - this is because (normally) the IP of generated code for each code block only increases. This means hopefully a minimum of gratuitous memcpy'g around of data (except for when it needs to grow). * Need to do something about Code GC, which violates the "IP only increases" invariant. Since Code GC frees only whole pages, perhaps the unwind info could be per-page, and the pages would be stored in a hash table. That would make it O(1) both to get the right block of unwind info to append to when generating code, as well as when looking up the unwind info for a given IP. * For each block, the unwind info would store: * Start/end IP of the block * Whether or not this block has a frame_setup prologue * Whether or not this block has a frame_teardown epilogue * Whether or not this block is split into the next inline/outline page as well * ~A pointer to the iseq structure~ (this can come later - it'd be needed for naming the block, but also introduces some fun GC mark/compaction issues). If we're allowed to rely on the frame pointer being setup [1], and the shape of our prologue/epilogues, I think that's all the information needed to do frame unwinding. [1] This would mean we'd need to add it to x86_64 code generation. The register isn't actually used for any of YJIT's generated code for any other reason, so I doubt it'll have a big performance impact. Now, how do we connect that to GDB & the crash backtracer? Let's treat those separately... For GDB, there are actually _three_ JIT code registration mechanisms (that I could count)... 1. The one using `__jit_debug_register_code` (which I used in my POC): https://sourceware.org/gdb/onlinedocs/gdb/JIT-Interface.html 2. One that lets you load a .so file in GDB to help it understand your JIT stacks: https://sourceware.org/gdb/onlinedocs/gdb/Writing-JIT-Debug-Info-Readers.htm... 3. One based on the Python interface: https://sourceware.org/gdb/onlinedocs/gdb/Unwinding-Frames-in-Python.html We already ship GDB helpers with Ruby (in `.gdbinit`). It's hopefully possible to write some Python which can unwind YJIT stacks using the custom unwind info, and also distribute that inside the Ruby source tree (perhaps it's even possible to distribute it inline in `.gdbinit` - I can experiment with the specifics of this). For the crash backtracer, I _think_ libunwind can be bent into shape for our purposes. * We can add a configure flag `--with-libunwind` or such to compile Ruby against libunwind if present, even when that would not normally be the case on a given platform. * If libunwind is present, instead of using `backtrace(3)` to collect the stack all at once, instead use `unw_init_local` to begin unwinding, and unwind frame-by-frame with `unw_step`. * If we encounter an IP we recognise as belonging to YJIT, do _NOT_ call `unw_step` to unwind that frame. * Instead, perform the unwinding logic ourselves using the YJIT unwind info, and then construct a `unw_context_t` for the previous frame by hand (it looks like the necessary struct definitions are present in the `libunwind-${arch}.h` header files. * Start unwinding again based on this custom context struct by calling `unw_init_local`; this _should_ start unwinding from the frame below if we've done it right. Essentially, the tradeoff here is that we can make unwind info generation much simpler, at the expense of making unwinding itself more complex (because we can't just rely on the platform's DWARF unwinder). That seems like a reasonable tradeoff to me. Does this sound like a fruitful path to go down? I should have a few weeks more or less full time to work on this coming up (I'm taking a sabbatical from work to do open source stuff!), so I'd really like to know if something along these lines would be useful, more in line with YJIT's goals, and something which would be considered for merging. Thanks again for your time, I appreciate it. --- Footnote:
it's (GDB's jit interface) also known to be not have the best speed.
I think this concern only applies while GDB is actually _attached_; I don't think the speed of running the program under a debugger should be a primary concern of this unwinding work. This is moot anyway though because the ELF generation is a huge pain as you point out. ---------------------------------------- Feature #19541: Proposal: Generate frame unwinding info for YJIT code https://bugs.ruby-lang.org/issues/19541#change-102510 * Author: kjtsanaktsidis (KJ Tsanaktsidis) * Status: Assigned * Priority: Normal * Assignee: yjit ---------------------------------------- ## What is being propsed? Currently, when Ruby crashes with yjit generated code on the stack, `rb_print_backtrace()` is unable to actually show any frames underneath the yjit code. For example, if you send SIGSEGV to a Ruby process running yjit, this is what you see: ``` /ruby/miniruby(rb_print_backtrace+0xc) [0xaaaad0276884] /ruby/vm_dump.c:785 /ruby/miniruby(rb_vm_bugreport) /ruby/vm_dump.c:1093 /ruby/miniruby(rb_bug_for_fatal_signal+0xd0) [0xaaaad0075580] /ruby/error.c:813 /ruby/miniruby(sigsegv+0x5c) [0xaaaad01bedac] /ruby/signal.c:919 linux-vdso.so.1(__kernel_rt_sigreturn+0x0) [0xffff91a3e8bc] /ruby/miniruby(map<(usize, yjit::backend::ir::Insn), (usize, yjit::backend::ir::Insn), yjit::backend::ir::{impl#17}::next_mapped::{closure_env#0}>+0x8c) [0xaaaad03b8b00] /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/option.rs:929 /ruby/miniruby(next_mapped+0x3c) [0xaaaad0291dc0] src/backend/ir.rs:1225 /ruby/miniruby(arm64_split+0x114) [0xaaaad0287744] src/backend/arm64/mod.rs:359 /ruby/miniruby(compile_with_regs+0x80) [0xaaaad028bf84] src/backend/arm64/mod.rs:1106 /ruby/miniruby(compile+0xc4) [0xaaaad0291ae0] src/backend/ir.rs:1158 /ruby/miniruby(gen_single_block+0xe44) [0xaaaad02b1f88] src/codegen.rs:854 /ruby/miniruby(gen_block_series_body+0x9c) [0xaaaad03b0250] src/core.rs:1698 /ruby/miniruby(gen_block_series+0x50) [0xaaaad03b0100] src/core.rs:1676 /ruby/miniruby(branch_stub_hit_body+0x80c) [0xaaaad03b1f68] src/core.rs:2021 /ruby/miniruby({closure#0}+0x28) [0xaaaad02eb86c] src/core.rs:1924 /ruby/miniruby(do_call<yjit::core::branch_stub_hit::{closure_env#0}, *const u8>+0x98) [0xaaaad035ba3c] /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:492 [0xaaaad035c9b4] ``` (n.b. - I compiled Ruby with `-fasynchronous-unwind-tables –rdynamic –g` in cflags to make sure gcc generates appropriate unwind info & keeps the symbol tables). Likewise, if you attach gdb to a Ruby process with yjit enabled, gdb can't show thread backtraces through yjit-generated code either. My proposal is that YJIT generate sufficient unwinding and debug information on all platforms to allow both `rb_print_backtrace()` and the platform's debugger (gdb/lldb/WinDbg) to show: * Full stack traces all the way back to `main`. That is, it should be possible to see frames _underneath_ `[0xaaaad035c9b4]` from the backtrace above. * Names for the dynamically generated yjit blocks (e.g. instead of `[0xaaaad035c9b4]`, we should see something like `yjit$$name_of_ruby_method`, where `name_of_ruby_method` is the `label` for the iseq this is JIT'd code for). ## Motivation I have a few motivations for wanting this. Firstly, I feel this functionality is independently useful. When Ruby crashes, the more information we can get, the more likely we are to find the root cause. Likewise, the same principle applies to debugging with gdb - you can get a fuller understanding of what the process is doing if you see the whole stack. I have often found attaching gdb to the Ruby interpreter helps in understanding problems in Ruby code or C extensions and is something I do relatively frequently; yjit breaking that will definitely be inconvenient for me! ## Implementation I have a draft implementation here on how I'd implement this: https://github.com/ruby/ruby/pull/7567. It's currently missing tests & platform support (it only works on Linux aarch64). Also, it implements unwind info generation, so unwinding can work _through_ yjit code, but it does not currently emit symbols to give _names_ to those yjit frames. My PR contains a document which explains how the Linux interfaces for registering unwind info for JIT'd code work, so I won't duplicate that information here. The biggest implementation question I had is around the use of Rust crates. Currently, I prototyped my implementation using the gimli & object crates, for generating DWARF info and ELF binaries. However, the yjit build does purposefully does not use cargo & external crates for release builds. There are a few different ways we could go here: * Don't use the gimli & object crates; instead, re-implement all debug info & object file generation code in yjit. * Don't use the crates; instead, link againt C libraries to provide this functionality & call them from Rust (perhaps some combination of libelf, libdw, libbfd, or llvm might do what we need) * Use cargo after all for the release build & download the crates at build-time * Use cargo for the release build, but vendor everything, so the build doesn't need to download anything * Only make unwind info generation available in dev mode where cargo is used, and so mark the gimli/object dependencies as optional in Cargo.toml. We'd need to decide on one of these approaches for this proposal to work. I don't really have a strong sense of the pros/cons of each. (Side note - my PR actually depends on a _fork_ of gimli - I've been discussing adding the needed interfaces upstream here: https://github.com/gimli-rs/gimli/issues/648). ## Benchmarks I ran the yit-bench suite on my branch and compared it to Ruby master: * My branch: https://gist.github.com/KJTsanaktsidis/5741a9f64e5cd75cdf5fedd846091a4f * Ruby master: https://gist.github.com/KJTsanaktsidis/592d3ebcf98f6745dfa3efbd30a25acf This is a (very simple) comparison: ``` -------------- ------------ ------------ --------------- bench yjit (ms) branch (ms) branch/yjit (%) activerecord 97.5 98.5 101.03% hexapdf 2415.3 2458.2 101.78% liquid-c 61.9 63.1 101.94% liquid-render 135.3 135.0 99.78% mail 104.6 105.5 100.86% psych-load 1887.1 1922.0 101.85% railsbench 1544.4 1556.0 100.75% ruby-lsp 88.4 89.5 101.24% sequel 147.5 151.1 102.44% binarytrees 303 305.6 100.86% chunky_png 1075.8 1079.4 100.33% erubi 392.9 392.3 99.85% erubi_rails 14.7 14.7 100.00% etanni 792.3 791.4 99.89% fannkuchredux 3815.9 3813.6 99.94% lee 1030.2 1039.2 100.87% nbody 49.2 49.3 100.20% optcarrot 4142 4143.3 100.03% ruby-json 2860.7 2874.0 100.46% rubykon 7906.6 7904.2 99.97% 30k_ifelse 348.7 345.4 99.05% 30k_methods 828.6 831.8 100.39% cfunc_itself 28.8 28.9 100.35% fib 34.4 34.5 100.29% getivar 115.5 109.7 94.98% keyword_args 37.7 38.0 100.80% respond_to 26 26.1 100.38% setivar 33.8 33.5 99.11% setivar_object 208.7 194.3 93.10% str_concat 52.6 52.2 99.24% throw 23.8 24.1 101.26% -------------- ------------ ------------ --------------- ``` It seems like the performance impact of generating and registering the debug info is marginal. -- https://bugs.ruby-lang.org/

Issue #19541 has been updated by alanwu (Alan Wu). I would explore solutions that involve generating no extra metadata because that's ideal, and may help the Windows port in the future. For example, if we rely on frame pointer unwinding, it'd be incorrect when the PC is in sections of the prologue/epilogue, but would cover most crashes. We could read around the PC to figure out how to unwind from those sections for full robustness later. For this setup, we do need to change codegen for x64 to set up `RBP`, as you mentioned. I don't expect noticeable perf loss either and will benchmark it. Setting up the frame pointer opens up different options when profiling with Linux `perf`, so it seems to be worth it regardless. The GDB Python unwinding interface does seem enticing. Thanks for bringing it up. Maybe though, on x64 GDB already uses the frame pointer as a fallback so will work without extra help? Seems like A64 will need the script, though. The plan with manually unwinding with `libunwind` sounds worth trying. It does seem like the library is not really designed to directly support this; sorry I didn't check that before my previous reply. If it doesn't work, maybe a solution that registers a single global entry of DWARF unwind that assumes frame pointer validity which covers the entire code region could work? That should make complexity and memory consumption more palatable. ---------------------------------------- Feature #19541: Proposal: Generate frame unwinding info for YJIT code https://bugs.ruby-lang.org/issues/19541#change-102516 * Author: kjtsanaktsidis (KJ Tsanaktsidis) * Status: Assigned * Priority: Normal * Assignee: yjit ---------------------------------------- ## What is being propsed? Currently, when Ruby crashes with yjit generated code on the stack, `rb_print_backtrace()` is unable to actually show any frames underneath the yjit code. For example, if you send SIGSEGV to a Ruby process running yjit, this is what you see: ``` /ruby/miniruby(rb_print_backtrace+0xc) [0xaaaad0276884] /ruby/vm_dump.c:785 /ruby/miniruby(rb_vm_bugreport) /ruby/vm_dump.c:1093 /ruby/miniruby(rb_bug_for_fatal_signal+0xd0) [0xaaaad0075580] /ruby/error.c:813 /ruby/miniruby(sigsegv+0x5c) [0xaaaad01bedac] /ruby/signal.c:919 linux-vdso.so.1(__kernel_rt_sigreturn+0x0) [0xffff91a3e8bc] /ruby/miniruby(map<(usize, yjit::backend::ir::Insn), (usize, yjit::backend::ir::Insn), yjit::backend::ir::{impl#17}::next_mapped::{closure_env#0}>+0x8c) [0xaaaad03b8b00] /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/option.rs:929 /ruby/miniruby(next_mapped+0x3c) [0xaaaad0291dc0] src/backend/ir.rs:1225 /ruby/miniruby(arm64_split+0x114) [0xaaaad0287744] src/backend/arm64/mod.rs:359 /ruby/miniruby(compile_with_regs+0x80) [0xaaaad028bf84] src/backend/arm64/mod.rs:1106 /ruby/miniruby(compile+0xc4) [0xaaaad0291ae0] src/backend/ir.rs:1158 /ruby/miniruby(gen_single_block+0xe44) [0xaaaad02b1f88] src/codegen.rs:854 /ruby/miniruby(gen_block_series_body+0x9c) [0xaaaad03b0250] src/core.rs:1698 /ruby/miniruby(gen_block_series+0x50) [0xaaaad03b0100] src/core.rs:1676 /ruby/miniruby(branch_stub_hit_body+0x80c) [0xaaaad03b1f68] src/core.rs:2021 /ruby/miniruby({closure#0}+0x28) [0xaaaad02eb86c] src/core.rs:1924 /ruby/miniruby(do_call<yjit::core::branch_stub_hit::{closure_env#0}, *const u8>+0x98) [0xaaaad035ba3c] /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:492 [0xaaaad035c9b4] ``` (n.b. - I compiled Ruby with `-fasynchronous-unwind-tables –rdynamic –g` in cflags to make sure gcc generates appropriate unwind info & keeps the symbol tables). Likewise, if you attach gdb to a Ruby process with yjit enabled, gdb can't show thread backtraces through yjit-generated code either. My proposal is that YJIT generate sufficient unwinding and debug information on all platforms to allow both `rb_print_backtrace()` and the platform's debugger (gdb/lldb/WinDbg) to show: * Full stack traces all the way back to `main`. That is, it should be possible to see frames _underneath_ `[0xaaaad035c9b4]` from the backtrace above. * Names for the dynamically generated yjit blocks (e.g. instead of `[0xaaaad035c9b4]`, we should see something like `yjit$$name_of_ruby_method`, where `name_of_ruby_method` is the `label` for the iseq this is JIT'd code for). ## Motivation I have a few motivations for wanting this. Firstly, I feel this functionality is independently useful. When Ruby crashes, the more information we can get, the more likely we are to find the root cause. Likewise, the same principle applies to debugging with gdb - you can get a fuller understanding of what the process is doing if you see the whole stack. I have often found attaching gdb to the Ruby interpreter helps in understanding problems in Ruby code or C extensions and is something I do relatively frequently; yjit breaking that will definitely be inconvenient for me! ## Implementation I have a draft implementation here on how I'd implement this: https://github.com/ruby/ruby/pull/7567. It's currently missing tests & platform support (it only works on Linux aarch64). Also, it implements unwind info generation, so unwinding can work _through_ yjit code, but it does not currently emit symbols to give _names_ to those yjit frames. My PR contains a document which explains how the Linux interfaces for registering unwind info for JIT'd code work, so I won't duplicate that information here. The biggest implementation question I had is around the use of Rust crates. Currently, I prototyped my implementation using the gimli & object crates, for generating DWARF info and ELF binaries. However, the yjit build does purposefully does not use cargo & external crates for release builds. There are a few different ways we could go here: * Don't use the gimli & object crates; instead, re-implement all debug info & object file generation code in yjit. * Don't use the crates; instead, link againt C libraries to provide this functionality & call them from Rust (perhaps some combination of libelf, libdw, libbfd, or llvm might do what we need) * Use cargo after all for the release build & download the crates at build-time * Use cargo for the release build, but vendor everything, so the build doesn't need to download anything * Only make unwind info generation available in dev mode where cargo is used, and so mark the gimli/object dependencies as optional in Cargo.toml. We'd need to decide on one of these approaches for this proposal to work. I don't really have a strong sense of the pros/cons of each. (Side note - my PR actually depends on a _fork_ of gimli - I've been discussing adding the needed interfaces upstream here: https://github.com/gimli-rs/gimli/issues/648). ## Benchmarks I ran the yit-bench suite on my branch and compared it to Ruby master: * My branch: https://gist.github.com/KJTsanaktsidis/5741a9f64e5cd75cdf5fedd846091a4f * Ruby master: https://gist.github.com/KJTsanaktsidis/592d3ebcf98f6745dfa3efbd30a25acf This is a (very simple) comparison: ``` -------------- ------------ ------------ --------------- bench yjit (ms) branch (ms) branch/yjit (%) activerecord 97.5 98.5 101.03% hexapdf 2415.3 2458.2 101.78% liquid-c 61.9 63.1 101.94% liquid-render 135.3 135.0 99.78% mail 104.6 105.5 100.86% psych-load 1887.1 1922.0 101.85% railsbench 1544.4 1556.0 100.75% ruby-lsp 88.4 89.5 101.24% sequel 147.5 151.1 102.44% binarytrees 303 305.6 100.86% chunky_png 1075.8 1079.4 100.33% erubi 392.9 392.3 99.85% erubi_rails 14.7 14.7 100.00% etanni 792.3 791.4 99.89% fannkuchredux 3815.9 3813.6 99.94% lee 1030.2 1039.2 100.87% nbody 49.2 49.3 100.20% optcarrot 4142 4143.3 100.03% ruby-json 2860.7 2874.0 100.46% rubykon 7906.6 7904.2 99.97% 30k_ifelse 348.7 345.4 99.05% 30k_methods 828.6 831.8 100.39% cfunc_itself 28.8 28.9 100.35% fib 34.4 34.5 100.29% getivar 115.5 109.7 94.98% keyword_args 37.7 38.0 100.80% respond_to 26 26.1 100.38% setivar 33.8 33.5 99.11% setivar_object 208.7 194.3 93.10% str_concat 52.6 52.2 99.24% throw 23.8 24.1 101.26% -------------- ------------ ------------ --------------- ``` It seems like the performance impact of generating and registering the debug info is marginal. -- https://bugs.ruby-lang.org/

Issue #19541 has been updated by kjtsanaktsidis (KJ Tsanaktsidis).
For example, if we rely on frame pointer unwinding, it'd be incorrect when the PC is in sections of the prologue/epilogue, but would cover most crashes
I do agree that this will work pretty much all of the time yeah. I _want_ to make it work in the prologue/epilogue, but I guess that's more for completeness's sake rather than any real utility, so yeah it may not be worth generating metadata for this.
We could read around the PC to figure out how to unwind from those sections for full robustness later.
Oh interesting - I guess if we can rely on YJIT _not_ generating opcodes like `push %rbp; mov %rbp, %rsp` and `stp x29, x30, [sp,#-0x10]!; mov x29, sp` anywhere else _except_ the prologue, then yeah the unwinder (both the in-process one for crash reporting, and the out-of-process one in GDB's python interface) can nose around the PC and work out if it's inside the prologue/epilogue or not. It seems I might be able to spike this out by writing a GDB python unwinder entirely outside the Ruby tree (for aarch64; need to add the frame pointers for x86_64 first before it'd work there). Maybe the way to go is for me to write that, share it around, and once it's mostly working, _then_ port its logic into the Ruby crash reporter as well. This does leave the question open of how to get some kind of sensible name for the yjit frames that isn't just a random address. I suppose if we're going with an approach of "smart unwinders that understand how YJIT lays out code", maybe I can get the unwinder to figure something out based on the CFP pointer. It's in a callee-saved register, and most unwinding schemes generally make it possible to recover these (I think it might be required for C++ exception unwinding to work). Otherwise perhaps we can spill it to the stack as well - I'll play around. ---------------------------------------- Feature #19541: Proposal: Generate frame unwinding info for YJIT code https://bugs.ruby-lang.org/issues/19541#change-102517 * Author: kjtsanaktsidis (KJ Tsanaktsidis) * Status: Assigned * Priority: Normal * Assignee: yjit ---------------------------------------- ## What is being propsed? Currently, when Ruby crashes with yjit generated code on the stack, `rb_print_backtrace()` is unable to actually show any frames underneath the yjit code. For example, if you send SIGSEGV to a Ruby process running yjit, this is what you see: ``` /ruby/miniruby(rb_print_backtrace+0xc) [0xaaaad0276884] /ruby/vm_dump.c:785 /ruby/miniruby(rb_vm_bugreport) /ruby/vm_dump.c:1093 /ruby/miniruby(rb_bug_for_fatal_signal+0xd0) [0xaaaad0075580] /ruby/error.c:813 /ruby/miniruby(sigsegv+0x5c) [0xaaaad01bedac] /ruby/signal.c:919 linux-vdso.so.1(__kernel_rt_sigreturn+0x0) [0xffff91a3e8bc] /ruby/miniruby(map<(usize, yjit::backend::ir::Insn), (usize, yjit::backend::ir::Insn), yjit::backend::ir::{impl#17}::next_mapped::{closure_env#0}>+0x8c) [0xaaaad03b8b00] /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/option.rs:929 /ruby/miniruby(next_mapped+0x3c) [0xaaaad0291dc0] src/backend/ir.rs:1225 /ruby/miniruby(arm64_split+0x114) [0xaaaad0287744] src/backend/arm64/mod.rs:359 /ruby/miniruby(compile_with_regs+0x80) [0xaaaad028bf84] src/backend/arm64/mod.rs:1106 /ruby/miniruby(compile+0xc4) [0xaaaad0291ae0] src/backend/ir.rs:1158 /ruby/miniruby(gen_single_block+0xe44) [0xaaaad02b1f88] src/codegen.rs:854 /ruby/miniruby(gen_block_series_body+0x9c) [0xaaaad03b0250] src/core.rs:1698 /ruby/miniruby(gen_block_series+0x50) [0xaaaad03b0100] src/core.rs:1676 /ruby/miniruby(branch_stub_hit_body+0x80c) [0xaaaad03b1f68] src/core.rs:2021 /ruby/miniruby({closure#0}+0x28) [0xaaaad02eb86c] src/core.rs:1924 /ruby/miniruby(do_call<yjit::core::branch_stub_hit::{closure_env#0}, *const u8>+0x98) [0xaaaad035ba3c] /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:492 [0xaaaad035c9b4] ``` (n.b. - I compiled Ruby with `-fasynchronous-unwind-tables –rdynamic –g` in cflags to make sure gcc generates appropriate unwind info & keeps the symbol tables). Likewise, if you attach gdb to a Ruby process with yjit enabled, gdb can't show thread backtraces through yjit-generated code either. My proposal is that YJIT generate sufficient unwinding and debug information on all platforms to allow both `rb_print_backtrace()` and the platform's debugger (gdb/lldb/WinDbg) to show: * Full stack traces all the way back to `main`. That is, it should be possible to see frames _underneath_ `[0xaaaad035c9b4]` from the backtrace above. * Names for the dynamically generated yjit blocks (e.g. instead of `[0xaaaad035c9b4]`, we should see something like `yjit$$name_of_ruby_method`, where `name_of_ruby_method` is the `label` for the iseq this is JIT'd code for). ## Motivation I have a few motivations for wanting this. Firstly, I feel this functionality is independently useful. When Ruby crashes, the more information we can get, the more likely we are to find the root cause. Likewise, the same principle applies to debugging with gdb - you can get a fuller understanding of what the process is doing if you see the whole stack. I have often found attaching gdb to the Ruby interpreter helps in understanding problems in Ruby code or C extensions and is something I do relatively frequently; yjit breaking that will definitely be inconvenient for me! ## Implementation I have a draft implementation here on how I'd implement this: https://github.com/ruby/ruby/pull/7567. It's currently missing tests & platform support (it only works on Linux aarch64). Also, it implements unwind info generation, so unwinding can work _through_ yjit code, but it does not currently emit symbols to give _names_ to those yjit frames. My PR contains a document which explains how the Linux interfaces for registering unwind info for JIT'd code work, so I won't duplicate that information here. The biggest implementation question I had is around the use of Rust crates. Currently, I prototyped my implementation using the gimli & object crates, for generating DWARF info and ELF binaries. However, the yjit build does purposefully does not use cargo & external crates for release builds. There are a few different ways we could go here: * Don't use the gimli & object crates; instead, re-implement all debug info & object file generation code in yjit. * Don't use the crates; instead, link againt C libraries to provide this functionality & call them from Rust (perhaps some combination of libelf, libdw, libbfd, or llvm might do what we need) * Use cargo after all for the release build & download the crates at build-time * Use cargo for the release build, but vendor everything, so the build doesn't need to download anything * Only make unwind info generation available in dev mode where cargo is used, and so mark the gimli/object dependencies as optional in Cargo.toml. We'd need to decide on one of these approaches for this proposal to work. I don't really have a strong sense of the pros/cons of each. (Side note - my PR actually depends on a _fork_ of gimli - I've been discussing adding the needed interfaces upstream here: https://github.com/gimli-rs/gimli/issues/648). ## Benchmarks I ran the yit-bench suite on my branch and compared it to Ruby master: * My branch: https://gist.github.com/KJTsanaktsidis/5741a9f64e5cd75cdf5fedd846091a4f * Ruby master: https://gist.github.com/KJTsanaktsidis/592d3ebcf98f6745dfa3efbd30a25acf This is a (very simple) comparison: ``` -------------- ------------ ------------ --------------- bench yjit (ms) branch (ms) branch/yjit (%) activerecord 97.5 98.5 101.03% hexapdf 2415.3 2458.2 101.78% liquid-c 61.9 63.1 101.94% liquid-render 135.3 135.0 99.78% mail 104.6 105.5 100.86% psych-load 1887.1 1922.0 101.85% railsbench 1544.4 1556.0 100.75% ruby-lsp 88.4 89.5 101.24% sequel 147.5 151.1 102.44% binarytrees 303 305.6 100.86% chunky_png 1075.8 1079.4 100.33% erubi 392.9 392.3 99.85% erubi_rails 14.7 14.7 100.00% etanni 792.3 791.4 99.89% fannkuchredux 3815.9 3813.6 99.94% lee 1030.2 1039.2 100.87% nbody 49.2 49.3 100.20% optcarrot 4142 4143.3 100.03% ruby-json 2860.7 2874.0 100.46% rubykon 7906.6 7904.2 99.97% 30k_ifelse 348.7 345.4 99.05% 30k_methods 828.6 831.8 100.39% cfunc_itself 28.8 28.9 100.35% fib 34.4 34.5 100.29% getivar 115.5 109.7 94.98% keyword_args 37.7 38.0 100.80% respond_to 26 26.1 100.38% setivar 33.8 33.5 99.11% setivar_object 208.7 194.3 93.10% str_concat 52.6 52.2 99.24% throw 23.8 24.1 101.26% -------------- ------------ ------------ --------------- ``` It seems like the performance impact of generating and registering the debug info is marginal. -- https://bugs.ruby-lang.org/

Issue #19541 has been updated by Eregon (Benoit Daloze). I think supporting this could also help better profiling with YJIT enabled: https://github.com/tmm1/stackprof/pull/180#issuecomment-1556139533 ---------------------------------------- Feature #19541: Proposal: Generate frame unwinding info for YJIT code https://bugs.ruby-lang.org/issues/19541#change-103231 * Author: kjtsanaktsidis (KJ Tsanaktsidis) * Status: Assigned * Priority: Normal * Assignee: yjit ---------------------------------------- ## What is being propsed? Currently, when Ruby crashes with yjit generated code on the stack, `rb_print_backtrace()` is unable to actually show any frames underneath the yjit code. For example, if you send SIGSEGV to a Ruby process running yjit, this is what you see: ``` /ruby/miniruby(rb_print_backtrace+0xc) [0xaaaad0276884] /ruby/vm_dump.c:785 /ruby/miniruby(rb_vm_bugreport) /ruby/vm_dump.c:1093 /ruby/miniruby(rb_bug_for_fatal_signal+0xd0) [0xaaaad0075580] /ruby/error.c:813 /ruby/miniruby(sigsegv+0x5c) [0xaaaad01bedac] /ruby/signal.c:919 linux-vdso.so.1(__kernel_rt_sigreturn+0x0) [0xffff91a3e8bc] /ruby/miniruby(map<(usize, yjit::backend::ir::Insn), (usize, yjit::backend::ir::Insn), yjit::backend::ir::{impl#17}::next_mapped::{closure_env#0}>+0x8c) [0xaaaad03b8b00] /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/option.rs:929 /ruby/miniruby(next_mapped+0x3c) [0xaaaad0291dc0] src/backend/ir.rs:1225 /ruby/miniruby(arm64_split+0x114) [0xaaaad0287744] src/backend/arm64/mod.rs:359 /ruby/miniruby(compile_with_regs+0x80) [0xaaaad028bf84] src/backend/arm64/mod.rs:1106 /ruby/miniruby(compile+0xc4) [0xaaaad0291ae0] src/backend/ir.rs:1158 /ruby/miniruby(gen_single_block+0xe44) [0xaaaad02b1f88] src/codegen.rs:854 /ruby/miniruby(gen_block_series_body+0x9c) [0xaaaad03b0250] src/core.rs:1698 /ruby/miniruby(gen_block_series+0x50) [0xaaaad03b0100] src/core.rs:1676 /ruby/miniruby(branch_stub_hit_body+0x80c) [0xaaaad03b1f68] src/core.rs:2021 /ruby/miniruby({closure#0}+0x28) [0xaaaad02eb86c] src/core.rs:1924 /ruby/miniruby(do_call<yjit::core::branch_stub_hit::{closure_env#0}, *const u8>+0x98) [0xaaaad035ba3c] /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:492 [0xaaaad035c9b4] ``` (n.b. - I compiled Ruby with `-fasynchronous-unwind-tables –rdynamic –g` in cflags to make sure gcc generates appropriate unwind info & keeps the symbol tables). Likewise, if you attach gdb to a Ruby process with yjit enabled, gdb can't show thread backtraces through yjit-generated code either. My proposal is that YJIT generate sufficient unwinding and debug information on all platforms to allow both `rb_print_backtrace()` and the platform's debugger (gdb/lldb/WinDbg) to show: * Full stack traces all the way back to `main`. That is, it should be possible to see frames _underneath_ `[0xaaaad035c9b4]` from the backtrace above. * Names for the dynamically generated yjit blocks (e.g. instead of `[0xaaaad035c9b4]`, we should see something like `yjit$$name_of_ruby_method`, where `name_of_ruby_method` is the `label` for the iseq this is JIT'd code for). ## Motivation I have a few motivations for wanting this. Firstly, I feel this functionality is independently useful. When Ruby crashes, the more information we can get, the more likely we are to find the root cause. Likewise, the same principle applies to debugging with gdb - you can get a fuller understanding of what the process is doing if you see the whole stack. I have often found attaching gdb to the Ruby interpreter helps in understanding problems in Ruby code or C extensions and is something I do relatively frequently; yjit breaking that will definitely be inconvenient for me! ## Implementation I have a draft implementation here on how I'd implement this: https://github.com/ruby/ruby/pull/7567. It's currently missing tests & platform support (it only works on Linux aarch64). Also, it implements unwind info generation, so unwinding can work _through_ yjit code, but it does not currently emit symbols to give _names_ to those yjit frames. My PR contains a document which explains how the Linux interfaces for registering unwind info for JIT'd code work, so I won't duplicate that information here. The biggest implementation question I had is around the use of Rust crates. Currently, I prototyped my implementation using the gimli & object crates, for generating DWARF info and ELF binaries. However, the yjit build does purposefully does not use cargo & external crates for release builds. There are a few different ways we could go here: * Don't use the gimli & object crates; instead, re-implement all debug info & object file generation code in yjit. * Don't use the crates; instead, link againt C libraries to provide this functionality & call them from Rust (perhaps some combination of libelf, libdw, libbfd, or llvm might do what we need) * Use cargo after all for the release build & download the crates at build-time * Use cargo for the release build, but vendor everything, so the build doesn't need to download anything * Only make unwind info generation available in dev mode where cargo is used, and so mark the gimli/object dependencies as optional in Cargo.toml. We'd need to decide on one of these approaches for this proposal to work. I don't really have a strong sense of the pros/cons of each. (Side note - my PR actually depends on a _fork_ of gimli - I've been discussing adding the needed interfaces upstream here: https://github.com/gimli-rs/gimli/issues/648). ## Benchmarks I ran the yit-bench suite on my branch and compared it to Ruby master: * My branch: https://gist.github.com/KJTsanaktsidis/5741a9f64e5cd75cdf5fedd846091a4f * Ruby master: https://gist.github.com/KJTsanaktsidis/592d3ebcf98f6745dfa3efbd30a25acf This is a (very simple) comparison: ``` -------------- ------------ ------------ --------------- bench yjit (ms) branch (ms) branch/yjit (%) activerecord 97.5 98.5 101.03% hexapdf 2415.3 2458.2 101.78% liquid-c 61.9 63.1 101.94% liquid-render 135.3 135.0 99.78% mail 104.6 105.5 100.86% psych-load 1887.1 1922.0 101.85% railsbench 1544.4 1556.0 100.75% ruby-lsp 88.4 89.5 101.24% sequel 147.5 151.1 102.44% binarytrees 303 305.6 100.86% chunky_png 1075.8 1079.4 100.33% erubi 392.9 392.3 99.85% erubi_rails 14.7 14.7 100.00% etanni 792.3 791.4 99.89% fannkuchredux 3815.9 3813.6 99.94% lee 1030.2 1039.2 100.87% nbody 49.2 49.3 100.20% optcarrot 4142 4143.3 100.03% ruby-json 2860.7 2874.0 100.46% rubykon 7906.6 7904.2 99.97% 30k_ifelse 348.7 345.4 99.05% 30k_methods 828.6 831.8 100.39% cfunc_itself 28.8 28.9 100.35% fib 34.4 34.5 100.29% getivar 115.5 109.7 94.98% keyword_args 37.7 38.0 100.80% respond_to 26 26.1 100.38% setivar 33.8 33.5 99.11% setivar_object 208.7 194.3 93.10% str_concat 52.6 52.2 99.24% throw 23.8 24.1 101.26% -------------- ------------ ------------ --------------- ``` It seems like the performance impact of generating and registering the debug info is marginal. -- https://bugs.ruby-lang.org/

Issue #19541 has been updated by k0kubun (Takashi Kokubun). Status changed from Assigned to Feedback I added `--yjit-perf` on Ruby master https://github.com/ruby/ruby/pull/8697. It does not unwind Ruby frames in a single YJIT frame like DWARF would be able to do. But I think it's similar to how C functions are profiled with `--call-graph fp` and it's a fair choice under the trade-off: `--call-graph dwarf` can unwind inlined functions but is slower than `--call-graph fp`. Given your PoC needed +1282 lines while our PR was +133 lines, this seems to have better maintainability while still being practical. ---------------------------------------- Feature #19541: Proposal: Generate frame unwinding info for YJIT code https://bugs.ruby-lang.org/issues/19541#change-105002 * Author: kjtsanaktsidis (KJ Tsanaktsidis) * Status: Feedback * Priority: Normal * Assignee: yjit ---------------------------------------- ## What is being propsed? Currently, when Ruby crashes with yjit generated code on the stack, `rb_print_backtrace()` is unable to actually show any frames underneath the yjit code. For example, if you send SIGSEGV to a Ruby process running yjit, this is what you see: ``` /ruby/miniruby(rb_print_backtrace+0xc) [0xaaaad0276884] /ruby/vm_dump.c:785 /ruby/miniruby(rb_vm_bugreport) /ruby/vm_dump.c:1093 /ruby/miniruby(rb_bug_for_fatal_signal+0xd0) [0xaaaad0075580] /ruby/error.c:813 /ruby/miniruby(sigsegv+0x5c) [0xaaaad01bedac] /ruby/signal.c:919 linux-vdso.so.1(__kernel_rt_sigreturn+0x0) [0xffff91a3e8bc] /ruby/miniruby(map<(usize, yjit::backend::ir::Insn), (usize, yjit::backend::ir::Insn), yjit::backend::ir::{impl#17}::next_mapped::{closure_env#0}>+0x8c) [0xaaaad03b8b00] /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/option.rs:929 /ruby/miniruby(next_mapped+0x3c) [0xaaaad0291dc0] src/backend/ir.rs:1225 /ruby/miniruby(arm64_split+0x114) [0xaaaad0287744] src/backend/arm64/mod.rs:359 /ruby/miniruby(compile_with_regs+0x80) [0xaaaad028bf84] src/backend/arm64/mod.rs:1106 /ruby/miniruby(compile+0xc4) [0xaaaad0291ae0] src/backend/ir.rs:1158 /ruby/miniruby(gen_single_block+0xe44) [0xaaaad02b1f88] src/codegen.rs:854 /ruby/miniruby(gen_block_series_body+0x9c) [0xaaaad03b0250] src/core.rs:1698 /ruby/miniruby(gen_block_series+0x50) [0xaaaad03b0100] src/core.rs:1676 /ruby/miniruby(branch_stub_hit_body+0x80c) [0xaaaad03b1f68] src/core.rs:2021 /ruby/miniruby({closure#0}+0x28) [0xaaaad02eb86c] src/core.rs:1924 /ruby/miniruby(do_call<yjit::core::branch_stub_hit::{closure_env#0}, *const u8>+0x98) [0xaaaad035ba3c] /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:492 [0xaaaad035c9b4] ``` (n.b. - I compiled Ruby with `-fasynchronous-unwind-tables –rdynamic –g` in cflags to make sure gcc generates appropriate unwind info & keeps the symbol tables). Likewise, if you attach gdb to a Ruby process with yjit enabled, gdb can't show thread backtraces through yjit-generated code either. My proposal is that YJIT generate sufficient unwinding and debug information on all platforms to allow both `rb_print_backtrace()` and the platform's debugger (gdb/lldb/WinDbg) to show: * Full stack traces all the way back to `main`. That is, it should be possible to see frames _underneath_ `[0xaaaad035c9b4]` from the backtrace above. * Names for the dynamically generated yjit blocks (e.g. instead of `[0xaaaad035c9b4]`, we should see something like `yjit$$name_of_ruby_method`, where `name_of_ruby_method` is the `label` for the iseq this is JIT'd code for). ## Motivation I have a few motivations for wanting this. Firstly, I feel this functionality is independently useful. When Ruby crashes, the more information we can get, the more likely we are to find the root cause. Likewise, the same principle applies to debugging with gdb - you can get a fuller understanding of what the process is doing if you see the whole stack. I have often found attaching gdb to the Ruby interpreter helps in understanding problems in Ruby code or C extensions and is something I do relatively frequently; yjit breaking that will definitely be inconvenient for me! ## Implementation I have a draft implementation here on how I'd implement this: https://github.com/ruby/ruby/pull/7567. It's currently missing tests & platform support (it only works on Linux aarch64). Also, it implements unwind info generation, so unwinding can work _through_ yjit code, but it does not currently emit symbols to give _names_ to those yjit frames. My PR contains a document which explains how the Linux interfaces for registering unwind info for JIT'd code work, so I won't duplicate that information here. The biggest implementation question I had is around the use of Rust crates. Currently, I prototyped my implementation using the gimli & object crates, for generating DWARF info and ELF binaries. However, the yjit build does purposefully does not use cargo & external crates for release builds. There are a few different ways we could go here: * Don't use the gimli & object crates; instead, re-implement all debug info & object file generation code in yjit. * Don't use the crates; instead, link againt C libraries to provide this functionality & call them from Rust (perhaps some combination of libelf, libdw, libbfd, or llvm might do what we need) * Use cargo after all for the release build & download the crates at build-time * Use cargo for the release build, but vendor everything, so the build doesn't need to download anything * Only make unwind info generation available in dev mode where cargo is used, and so mark the gimli/object dependencies as optional in Cargo.toml. We'd need to decide on one of these approaches for this proposal to work. I don't really have a strong sense of the pros/cons of each. (Side note - my PR actually depends on a _fork_ of gimli - I've been discussing adding the needed interfaces upstream here: https://github.com/gimli-rs/gimli/issues/648). ## Benchmarks I ran the yit-bench suite on my branch and compared it to Ruby master: * My branch: https://gist.github.com/KJTsanaktsidis/5741a9f64e5cd75cdf5fedd846091a4f * Ruby master: https://gist.github.com/KJTsanaktsidis/592d3ebcf98f6745dfa3efbd30a25acf This is a (very simple) comparison: ``` -------------- ------------ ------------ --------------- bench yjit (ms) branch (ms) branch/yjit (%) activerecord 97.5 98.5 101.03% hexapdf 2415.3 2458.2 101.78% liquid-c 61.9 63.1 101.94% liquid-render 135.3 135.0 99.78% mail 104.6 105.5 100.86% psych-load 1887.1 1922.0 101.85% railsbench 1544.4 1556.0 100.75% ruby-lsp 88.4 89.5 101.24% sequel 147.5 151.1 102.44% binarytrees 303 305.6 100.86% chunky_png 1075.8 1079.4 100.33% erubi 392.9 392.3 99.85% erubi_rails 14.7 14.7 100.00% etanni 792.3 791.4 99.89% fannkuchredux 3815.9 3813.6 99.94% lee 1030.2 1039.2 100.87% nbody 49.2 49.3 100.20% optcarrot 4142 4143.3 100.03% ruby-json 2860.7 2874.0 100.46% rubykon 7906.6 7904.2 99.97% 30k_ifelse 348.7 345.4 99.05% 30k_methods 828.6 831.8 100.39% cfunc_itself 28.8 28.9 100.35% fib 34.4 34.5 100.29% getivar 115.5 109.7 94.98% keyword_args 37.7 38.0 100.80% respond_to 26 26.1 100.38% setivar 33.8 33.5 99.11% setivar_object 208.7 194.3 93.10% str_concat 52.6 52.2 99.24% throw 23.8 24.1 101.26% -------------- ------------ ------------ --------------- ``` It seems like the performance impact of generating and registering the debug info is marginal. -- https://bugs.ruby-lang.org/

Issue #19541 has been updated by kjtsanaktsidis (KJ Tsanaktsidis). That's awesome and I'm super looking forward to trying to profile our apps with `perf` once I finally get YJIT enabled! Assuming most of the hot code turns out to be JIT'd, this should definitely give a fairly good picture of the Ruby stack. One question - the perf map file is going to grow without bound for a long-running process, right? I guess there's no real way around this based on how the file is specified though... I'll have a look at that problem when I actually run into it anyway. This also doesn't address debugger support, but maybe for that I might go poke at GDB (ick) and see if it can a) be made to try frame pointer unwinding unconditionally, and b) use the perf map file as a source of symbols. Anyway thanks so much for working on this, it's going to be really useful I think. ---------------------------------------- Feature #19541: Proposal: Generate frame unwinding info for YJIT code https://bugs.ruby-lang.org/issues/19541#change-105003 * Author: kjtsanaktsidis (KJ Tsanaktsidis) * Status: Feedback * Priority: Normal * Assignee: yjit ---------------------------------------- ## What is being propsed? Currently, when Ruby crashes with yjit generated code on the stack, `rb_print_backtrace()` is unable to actually show any frames underneath the yjit code. For example, if you send SIGSEGV to a Ruby process running yjit, this is what you see: ``` /ruby/miniruby(rb_print_backtrace+0xc) [0xaaaad0276884] /ruby/vm_dump.c:785 /ruby/miniruby(rb_vm_bugreport) /ruby/vm_dump.c:1093 /ruby/miniruby(rb_bug_for_fatal_signal+0xd0) [0xaaaad0075580] /ruby/error.c:813 /ruby/miniruby(sigsegv+0x5c) [0xaaaad01bedac] /ruby/signal.c:919 linux-vdso.so.1(__kernel_rt_sigreturn+0x0) [0xffff91a3e8bc] /ruby/miniruby(map<(usize, yjit::backend::ir::Insn), (usize, yjit::backend::ir::Insn), yjit::backend::ir::{impl#17}::next_mapped::{closure_env#0}>+0x8c) [0xaaaad03b8b00] /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/option.rs:929 /ruby/miniruby(next_mapped+0x3c) [0xaaaad0291dc0] src/backend/ir.rs:1225 /ruby/miniruby(arm64_split+0x114) [0xaaaad0287744] src/backend/arm64/mod.rs:359 /ruby/miniruby(compile_with_regs+0x80) [0xaaaad028bf84] src/backend/arm64/mod.rs:1106 /ruby/miniruby(compile+0xc4) [0xaaaad0291ae0] src/backend/ir.rs:1158 /ruby/miniruby(gen_single_block+0xe44) [0xaaaad02b1f88] src/codegen.rs:854 /ruby/miniruby(gen_block_series_body+0x9c) [0xaaaad03b0250] src/core.rs:1698 /ruby/miniruby(gen_block_series+0x50) [0xaaaad03b0100] src/core.rs:1676 /ruby/miniruby(branch_stub_hit_body+0x80c) [0xaaaad03b1f68] src/core.rs:2021 /ruby/miniruby({closure#0}+0x28) [0xaaaad02eb86c] src/core.rs:1924 /ruby/miniruby(do_call<yjit::core::branch_stub_hit::{closure_env#0}, *const u8>+0x98) [0xaaaad035ba3c] /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:492 [0xaaaad035c9b4] ``` (n.b. - I compiled Ruby with `-fasynchronous-unwind-tables –rdynamic –g` in cflags to make sure gcc generates appropriate unwind info & keeps the symbol tables). Likewise, if you attach gdb to a Ruby process with yjit enabled, gdb can't show thread backtraces through yjit-generated code either. My proposal is that YJIT generate sufficient unwinding and debug information on all platforms to allow both `rb_print_backtrace()` and the platform's debugger (gdb/lldb/WinDbg) to show: * Full stack traces all the way back to `main`. That is, it should be possible to see frames _underneath_ `[0xaaaad035c9b4]` from the backtrace above. * Names for the dynamically generated yjit blocks (e.g. instead of `[0xaaaad035c9b4]`, we should see something like `yjit$$name_of_ruby_method`, where `name_of_ruby_method` is the `label` for the iseq this is JIT'd code for). ## Motivation I have a few motivations for wanting this. Firstly, I feel this functionality is independently useful. When Ruby crashes, the more information we can get, the more likely we are to find the root cause. Likewise, the same principle applies to debugging with gdb - you can get a fuller understanding of what the process is doing if you see the whole stack. I have often found attaching gdb to the Ruby interpreter helps in understanding problems in Ruby code or C extensions and is something I do relatively frequently; yjit breaking that will definitely be inconvenient for me! ## Implementation I have a draft implementation here on how I'd implement this: https://github.com/ruby/ruby/pull/7567. It's currently missing tests & platform support (it only works on Linux aarch64). Also, it implements unwind info generation, so unwinding can work _through_ yjit code, but it does not currently emit symbols to give _names_ to those yjit frames. My PR contains a document which explains how the Linux interfaces for registering unwind info for JIT'd code work, so I won't duplicate that information here. The biggest implementation question I had is around the use of Rust crates. Currently, I prototyped my implementation using the gimli & object crates, for generating DWARF info and ELF binaries. However, the yjit build does purposefully does not use cargo & external crates for release builds. There are a few different ways we could go here: * Don't use the gimli & object crates; instead, re-implement all debug info & object file generation code in yjit. * Don't use the crates; instead, link againt C libraries to provide this functionality & call them from Rust (perhaps some combination of libelf, libdw, libbfd, or llvm might do what we need) * Use cargo after all for the release build & download the crates at build-time * Use cargo for the release build, but vendor everything, so the build doesn't need to download anything * Only make unwind info generation available in dev mode where cargo is used, and so mark the gimli/object dependencies as optional in Cargo.toml. We'd need to decide on one of these approaches for this proposal to work. I don't really have a strong sense of the pros/cons of each. (Side note - my PR actually depends on a _fork_ of gimli - I've been discussing adding the needed interfaces upstream here: https://github.com/gimli-rs/gimli/issues/648). ## Benchmarks I ran the yit-bench suite on my branch and compared it to Ruby master: * My branch: https://gist.github.com/KJTsanaktsidis/5741a9f64e5cd75cdf5fedd846091a4f * Ruby master: https://gist.github.com/KJTsanaktsidis/592d3ebcf98f6745dfa3efbd30a25acf This is a (very simple) comparison: ``` -------------- ------------ ------------ --------------- bench yjit (ms) branch (ms) branch/yjit (%) activerecord 97.5 98.5 101.03% hexapdf 2415.3 2458.2 101.78% liquid-c 61.9 63.1 101.94% liquid-render 135.3 135.0 99.78% mail 104.6 105.5 100.86% psych-load 1887.1 1922.0 101.85% railsbench 1544.4 1556.0 100.75% ruby-lsp 88.4 89.5 101.24% sequel 147.5 151.1 102.44% binarytrees 303 305.6 100.86% chunky_png 1075.8 1079.4 100.33% erubi 392.9 392.3 99.85% erubi_rails 14.7 14.7 100.00% etanni 792.3 791.4 99.89% fannkuchredux 3815.9 3813.6 99.94% lee 1030.2 1039.2 100.87% nbody 49.2 49.3 100.20% optcarrot 4142 4143.3 100.03% ruby-json 2860.7 2874.0 100.46% rubykon 7906.6 7904.2 99.97% 30k_ifelse 348.7 345.4 99.05% 30k_methods 828.6 831.8 100.39% cfunc_itself 28.8 28.9 100.35% fib 34.4 34.5 100.29% getivar 115.5 109.7 94.98% keyword_args 37.7 38.0 100.80% respond_to 26 26.1 100.38% setivar 33.8 33.5 99.11% setivar_object 208.7 194.3 93.10% str_concat 52.6 52.2 99.24% throw 23.8 24.1 101.26% -------------- ------------ ------------ --------------- ``` It seems like the performance impact of generating and registering the debug info is marginal. -- https://bugs.ruby-lang.org/

Issue #19541 has been updated by k0kubun (Takashi Kokubun).
One question - the perf map file is going to grow without bound for a long-running process, right? I guess there's no real way around this based on how the file is specified though... I'll have a look at that problem when I actually run into it anyway.
We plan to add an option to disable Code GC (compilation stops when it reaches the code size limit) in Ruby 3.3, and using that option should fix that problem. Given that perf reads map files after execution, it's inherently incompatible with Code GC. We might want to let `--yjit-perf` automatically enable that option too.
This also doesn't address debugger support, but maybe for that I might go poke at GDB (ick) and see if it can a) be made to try frame pointer unwinding unconditionally, and b) use the perf map file as a source of symbols.
Does GDB not use frame pointer unwinding at all, even for frames with no debug information in the address? I was hoping `--yjit-perf=fp` (which enables only frame pointers) can be sometimes used for helping GDB unwind frames. ---------------------------------------- Feature #19541: Proposal: Generate frame unwinding info for YJIT code https://bugs.ruby-lang.org/issues/19541#change-105014 * Author: kjtsanaktsidis (KJ Tsanaktsidis) * Status: Feedback * Priority: Normal * Assignee: yjit ---------------------------------------- ## What is being propsed? Currently, when Ruby crashes with yjit generated code on the stack, `rb_print_backtrace()` is unable to actually show any frames underneath the yjit code. For example, if you send SIGSEGV to a Ruby process running yjit, this is what you see: ``` /ruby/miniruby(rb_print_backtrace+0xc) [0xaaaad0276884] /ruby/vm_dump.c:785 /ruby/miniruby(rb_vm_bugreport) /ruby/vm_dump.c:1093 /ruby/miniruby(rb_bug_for_fatal_signal+0xd0) [0xaaaad0075580] /ruby/error.c:813 /ruby/miniruby(sigsegv+0x5c) [0xaaaad01bedac] /ruby/signal.c:919 linux-vdso.so.1(__kernel_rt_sigreturn+0x0) [0xffff91a3e8bc] /ruby/miniruby(map<(usize, yjit::backend::ir::Insn), (usize, yjit::backend::ir::Insn), yjit::backend::ir::{impl#17}::next_mapped::{closure_env#0}>+0x8c) [0xaaaad03b8b00] /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/option.rs:929 /ruby/miniruby(next_mapped+0x3c) [0xaaaad0291dc0] src/backend/ir.rs:1225 /ruby/miniruby(arm64_split+0x114) [0xaaaad0287744] src/backend/arm64/mod.rs:359 /ruby/miniruby(compile_with_regs+0x80) [0xaaaad028bf84] src/backend/arm64/mod.rs:1106 /ruby/miniruby(compile+0xc4) [0xaaaad0291ae0] src/backend/ir.rs:1158 /ruby/miniruby(gen_single_block+0xe44) [0xaaaad02b1f88] src/codegen.rs:854 /ruby/miniruby(gen_block_series_body+0x9c) [0xaaaad03b0250] src/core.rs:1698 /ruby/miniruby(gen_block_series+0x50) [0xaaaad03b0100] src/core.rs:1676 /ruby/miniruby(branch_stub_hit_body+0x80c) [0xaaaad03b1f68] src/core.rs:2021 /ruby/miniruby({closure#0}+0x28) [0xaaaad02eb86c] src/core.rs:1924 /ruby/miniruby(do_call<yjit::core::branch_stub_hit::{closure_env#0}, *const u8>+0x98) [0xaaaad035ba3c] /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:492 [0xaaaad035c9b4] ``` (n.b. - I compiled Ruby with `-fasynchronous-unwind-tables –rdynamic –g` in cflags to make sure gcc generates appropriate unwind info & keeps the symbol tables). Likewise, if you attach gdb to a Ruby process with yjit enabled, gdb can't show thread backtraces through yjit-generated code either. My proposal is that YJIT generate sufficient unwinding and debug information on all platforms to allow both `rb_print_backtrace()` and the platform's debugger (gdb/lldb/WinDbg) to show: * Full stack traces all the way back to `main`. That is, it should be possible to see frames _underneath_ `[0xaaaad035c9b4]` from the backtrace above. * Names for the dynamically generated yjit blocks (e.g. instead of `[0xaaaad035c9b4]`, we should see something like `yjit$$name_of_ruby_method`, where `name_of_ruby_method` is the `label` for the iseq this is JIT'd code for). ## Motivation I have a few motivations for wanting this. Firstly, I feel this functionality is independently useful. When Ruby crashes, the more information we can get, the more likely we are to find the root cause. Likewise, the same principle applies to debugging with gdb - you can get a fuller understanding of what the process is doing if you see the whole stack. I have often found attaching gdb to the Ruby interpreter helps in understanding problems in Ruby code or C extensions and is something I do relatively frequently; yjit breaking that will definitely be inconvenient for me! ## Implementation I have a draft implementation here on how I'd implement this: https://github.com/ruby/ruby/pull/7567. It's currently missing tests & platform support (it only works on Linux aarch64). Also, it implements unwind info generation, so unwinding can work _through_ yjit code, but it does not currently emit symbols to give _names_ to those yjit frames. My PR contains a document which explains how the Linux interfaces for registering unwind info for JIT'd code work, so I won't duplicate that information here. The biggest implementation question I had is around the use of Rust crates. Currently, I prototyped my implementation using the gimli & object crates, for generating DWARF info and ELF binaries. However, the yjit build does purposefully does not use cargo & external crates for release builds. There are a few different ways we could go here: * Don't use the gimli & object crates; instead, re-implement all debug info & object file generation code in yjit. * Don't use the crates; instead, link againt C libraries to provide this functionality & call them from Rust (perhaps some combination of libelf, libdw, libbfd, or llvm might do what we need) * Use cargo after all for the release build & download the crates at build-time * Use cargo for the release build, but vendor everything, so the build doesn't need to download anything * Only make unwind info generation available in dev mode where cargo is used, and so mark the gimli/object dependencies as optional in Cargo.toml. We'd need to decide on one of these approaches for this proposal to work. I don't really have a strong sense of the pros/cons of each. (Side note - my PR actually depends on a _fork_ of gimli - I've been discussing adding the needed interfaces upstream here: https://github.com/gimli-rs/gimli/issues/648). ## Benchmarks I ran the yit-bench suite on my branch and compared it to Ruby master: * My branch: https://gist.github.com/KJTsanaktsidis/5741a9f64e5cd75cdf5fedd846091a4f * Ruby master: https://gist.github.com/KJTsanaktsidis/592d3ebcf98f6745dfa3efbd30a25acf This is a (very simple) comparison: ``` -------------- ------------ ------------ --------------- bench yjit (ms) branch (ms) branch/yjit (%) activerecord 97.5 98.5 101.03% hexapdf 2415.3 2458.2 101.78% liquid-c 61.9 63.1 101.94% liquid-render 135.3 135.0 99.78% mail 104.6 105.5 100.86% psych-load 1887.1 1922.0 101.85% railsbench 1544.4 1556.0 100.75% ruby-lsp 88.4 89.5 101.24% sequel 147.5 151.1 102.44% binarytrees 303 305.6 100.86% chunky_png 1075.8 1079.4 100.33% erubi 392.9 392.3 99.85% erubi_rails 14.7 14.7 100.00% etanni 792.3 791.4 99.89% fannkuchredux 3815.9 3813.6 99.94% lee 1030.2 1039.2 100.87% nbody 49.2 49.3 100.20% optcarrot 4142 4143.3 100.03% ruby-json 2860.7 2874.0 100.46% rubykon 7906.6 7904.2 99.97% 30k_ifelse 348.7 345.4 99.05% 30k_methods 828.6 831.8 100.39% cfunc_itself 28.8 28.9 100.35% fib 34.4 34.5 100.29% getivar 115.5 109.7 94.98% keyword_args 37.7 38.0 100.80% respond_to 26 26.1 100.38% setivar 33.8 33.5 99.11% setivar_object 208.7 194.3 93.10% str_concat 52.6 52.2 99.24% throw 23.8 24.1 101.26% -------------- ------------ ------------ --------------- ``` It seems like the performance impact of generating and registering the debug info is marginal. -- https://bugs.ruby-lang.org/

Issue #19541 has been updated by k0kubun (Takashi Kokubun).
b) use the perf map file as a source of symbols.
So this is a post I found with quick googling: https://stackoverflow.com/questions/42739893/force-gdb-to-use-frame-pointer-... They say you can at least define a routine in `~/.gdbinit` that unwinds frames using frame pointers, which seems to help. It also says:
with other types of debug info such as .debug_info. Apparently this triggers gdb to stop using frame-pointer (rbp) based unwinding for any functions from that object
JIT code is not functions from an object with debug info. So GDB might choose to use frame pointers for JIT frames? Also, it's kind of hard for me to test the behavior of GDB since it "sometimes" unwinds a backtrace beyond YJIT frames successfully. Right now, I'm not sure when it succeeds and when it fails. ---------------------------------------- Feature #19541: Proposal: Generate frame unwinding info for YJIT code https://bugs.ruby-lang.org/issues/19541#change-105015 * Author: kjtsanaktsidis (KJ Tsanaktsidis) * Status: Feedback * Priority: Normal * Assignee: yjit ---------------------------------------- ## What is being propsed? Currently, when Ruby crashes with yjit generated code on the stack, `rb_print_backtrace()` is unable to actually show any frames underneath the yjit code. For example, if you send SIGSEGV to a Ruby process running yjit, this is what you see: ``` /ruby/miniruby(rb_print_backtrace+0xc) [0xaaaad0276884] /ruby/vm_dump.c:785 /ruby/miniruby(rb_vm_bugreport) /ruby/vm_dump.c:1093 /ruby/miniruby(rb_bug_for_fatal_signal+0xd0) [0xaaaad0075580] /ruby/error.c:813 /ruby/miniruby(sigsegv+0x5c) [0xaaaad01bedac] /ruby/signal.c:919 linux-vdso.so.1(__kernel_rt_sigreturn+0x0) [0xffff91a3e8bc] /ruby/miniruby(map<(usize, yjit::backend::ir::Insn), (usize, yjit::backend::ir::Insn), yjit::backend::ir::{impl#17}::next_mapped::{closure_env#0}>+0x8c) [0xaaaad03b8b00] /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/option.rs:929 /ruby/miniruby(next_mapped+0x3c) [0xaaaad0291dc0] src/backend/ir.rs:1225 /ruby/miniruby(arm64_split+0x114) [0xaaaad0287744] src/backend/arm64/mod.rs:359 /ruby/miniruby(compile_with_regs+0x80) [0xaaaad028bf84] src/backend/arm64/mod.rs:1106 /ruby/miniruby(compile+0xc4) [0xaaaad0291ae0] src/backend/ir.rs:1158 /ruby/miniruby(gen_single_block+0xe44) [0xaaaad02b1f88] src/codegen.rs:854 /ruby/miniruby(gen_block_series_body+0x9c) [0xaaaad03b0250] src/core.rs:1698 /ruby/miniruby(gen_block_series+0x50) [0xaaaad03b0100] src/core.rs:1676 /ruby/miniruby(branch_stub_hit_body+0x80c) [0xaaaad03b1f68] src/core.rs:2021 /ruby/miniruby({closure#0}+0x28) [0xaaaad02eb86c] src/core.rs:1924 /ruby/miniruby(do_call<yjit::core::branch_stub_hit::{closure_env#0}, *const u8>+0x98) [0xaaaad035ba3c] /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:492 [0xaaaad035c9b4] ``` (n.b. - I compiled Ruby with `-fasynchronous-unwind-tables –rdynamic –g` in cflags to make sure gcc generates appropriate unwind info & keeps the symbol tables). Likewise, if you attach gdb to a Ruby process with yjit enabled, gdb can't show thread backtraces through yjit-generated code either. My proposal is that YJIT generate sufficient unwinding and debug information on all platforms to allow both `rb_print_backtrace()` and the platform's debugger (gdb/lldb/WinDbg) to show: * Full stack traces all the way back to `main`. That is, it should be possible to see frames _underneath_ `[0xaaaad035c9b4]` from the backtrace above. * Names for the dynamically generated yjit blocks (e.g. instead of `[0xaaaad035c9b4]`, we should see something like `yjit$$name_of_ruby_method`, where `name_of_ruby_method` is the `label` for the iseq this is JIT'd code for). ## Motivation I have a few motivations for wanting this. Firstly, I feel this functionality is independently useful. When Ruby crashes, the more information we can get, the more likely we are to find the root cause. Likewise, the same principle applies to debugging with gdb - you can get a fuller understanding of what the process is doing if you see the whole stack. I have often found attaching gdb to the Ruby interpreter helps in understanding problems in Ruby code or C extensions and is something I do relatively frequently; yjit breaking that will definitely be inconvenient for me! ## Implementation I have a draft implementation here on how I'd implement this: https://github.com/ruby/ruby/pull/7567. It's currently missing tests & platform support (it only works on Linux aarch64). Also, it implements unwind info generation, so unwinding can work _through_ yjit code, but it does not currently emit symbols to give _names_ to those yjit frames. My PR contains a document which explains how the Linux interfaces for registering unwind info for JIT'd code work, so I won't duplicate that information here. The biggest implementation question I had is around the use of Rust crates. Currently, I prototyped my implementation using the gimli & object crates, for generating DWARF info and ELF binaries. However, the yjit build does purposefully does not use cargo & external crates for release builds. There are a few different ways we could go here: * Don't use the gimli & object crates; instead, re-implement all debug info & object file generation code in yjit. * Don't use the crates; instead, link againt C libraries to provide this functionality & call them from Rust (perhaps some combination of libelf, libdw, libbfd, or llvm might do what we need) * Use cargo after all for the release build & download the crates at build-time * Use cargo for the release build, but vendor everything, so the build doesn't need to download anything * Only make unwind info generation available in dev mode where cargo is used, and so mark the gimli/object dependencies as optional in Cargo.toml. We'd need to decide on one of these approaches for this proposal to work. I don't really have a strong sense of the pros/cons of each. (Side note - my PR actually depends on a _fork_ of gimli - I've been discussing adding the needed interfaces upstream here: https://github.com/gimli-rs/gimli/issues/648). ## Benchmarks I ran the yit-bench suite on my branch and compared it to Ruby master: * My branch: https://gist.github.com/KJTsanaktsidis/5741a9f64e5cd75cdf5fedd846091a4f * Ruby master: https://gist.github.com/KJTsanaktsidis/592d3ebcf98f6745dfa3efbd30a25acf This is a (very simple) comparison: ``` -------------- ------------ ------------ --------------- bench yjit (ms) branch (ms) branch/yjit (%) activerecord 97.5 98.5 101.03% hexapdf 2415.3 2458.2 101.78% liquid-c 61.9 63.1 101.94% liquid-render 135.3 135.0 99.78% mail 104.6 105.5 100.86% psych-load 1887.1 1922.0 101.85% railsbench 1544.4 1556.0 100.75% ruby-lsp 88.4 89.5 101.24% sequel 147.5 151.1 102.44% binarytrees 303 305.6 100.86% chunky_png 1075.8 1079.4 100.33% erubi 392.9 392.3 99.85% erubi_rails 14.7 14.7 100.00% etanni 792.3 791.4 99.89% fannkuchredux 3815.9 3813.6 99.94% lee 1030.2 1039.2 100.87% nbody 49.2 49.3 100.20% optcarrot 4142 4143.3 100.03% ruby-json 2860.7 2874.0 100.46% rubykon 7906.6 7904.2 99.97% 30k_ifelse 348.7 345.4 99.05% 30k_methods 828.6 831.8 100.39% cfunc_itself 28.8 28.9 100.35% fib 34.4 34.5 100.29% getivar 115.5 109.7 94.98% keyword_args 37.7 38.0 100.80% respond_to 26 26.1 100.38% setivar 33.8 33.5 99.11% setivar_object 208.7 194.3 93.10% str_concat 52.6 52.2 99.24% throw 23.8 24.1 101.26% -------------- ------------ ------------ --------------- ``` It seems like the performance impact of generating and registering the debug info is marginal. -- https://bugs.ruby-lang.org/
participants (7)
-
alanwu (Alan Wu)
-
Eregon (Benoit Daloze)
-
hsbt (Hiroshi SHIBATA)
-
k0kubun (Takashi Kokubun)
-
k0kubun (Takashi Kokubun)
-
kjtsanaktsidis (KJ Tsanaktsidis)
-
kjtsanaktsidis (KJ Tsanaktsidis)