New subject: [ruby-core:115357] [Ruby master Bug#20001] Make Ruby work properly with ASAN enabled

12 Nov 2023

      Issue #20001 has been reported by kjtsanaktsidis (KJ Tsanaktsidis).

----------------------------------------
Bug #20001: Make Ruby work properly with ASAN enabled
https://bugs.ruby-lang.org/issues/20001

* Author: kjtsanaktsidis (KJ Tsanaktsidis)
* Status: Open
* Priority: Normal
* Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN
----------------------------------------
This ticket covers some work I want to do to get the ASAN infrastructure working in Ruby working again. I don't know if it ever worked well, but if it did, it appears to have bitrotted. Here are a few of its current problems:

### Stack size calculations are wrong

Ruby takes the address of a local variable as part of the process of working out the top/bottom of the native stack in `native_thread_init_stack`. Because ASAN can end up putting some local variables on a "fake stack", this calculation can wind up producing the wrong result and setting `th->ec->machine.stack_start` incorrectly. This then leads to `stack_check` thinking that the machine stack has overflowed all the time, and thus, leading to programs like the following to fail:

```
ASAN_OPTIONS=use_sigaltstack=0:detect_leaks=0 ./miniruby -e 'Thread.new { puts "hi" }.join'
#<Thread:0x00007fb5d79f3f28 -e:1 run> terminated with exception (report_on_exception is true):
SystemStackError
-e: stack level too deep (SystemStackError)
```

Another consequence of stack size detection not working properly is that the machine stack is not properly marked during GC, so things on the stack which should be considered live get prematurely collected.

ASAN provides the `__asan_addr_is_in_fake_stack` function which can be used to get the address of a local variable on the real stack; I think Ruby's various stack-detecting macros could then make use of this to make it work.

### VALUEs in fake stacks are not marked

Another consequece of ASAN storing local variables in fake stacks is that we don't see them when doing the machine stack mark. Again, the `__asan_addr_is_in_fake_stack` function can help us here. ASAN leaves a pointer to the fake stack on the real stack in every frame. When marking the machine stack, we can check each word to see if it's a pointer to a fake stack frame, and then use `__asan_addr_is_in_fake_stack` to get the extents of the fake frame and scan that too.

This seems to be e.g. [how V8 does it](https://github.com/v8/v8/blob/1c7d3a94fc051e0fd69584b930faab448026941e/src/h...)

### Doesn't work with GCC

Our ASAN implementation doesn't work with GCC, even though GCC supports ASAN. This is because we use the `__has_feature(address_sanitizer)` macro in sanitizers.h, which is a clang-ism. The equivalent GCCism is `__SANITIZE_ADDRESS__` and we should check that too.

## Plan of attack

At the moment, I can't even run a full build of ruby to run, because miniruby crashees during the build process. My plan of attack here is to:

* Address those known problems I've already identified above
* Get `make` to actually work with asan
* Try running the test suite through ASAN, and fix any issues that turns up
* I'm thinking we should add an `--enable-asan` or `--enable-address-sanitizer` or some such to our configure script, to make it easy to build Ruby with ASAN without having to poke around with individual CFLAGS/LDFLAGS
* Eventually, it would be great to actually run the tests under ASAN in CI

This is probably a medium term body of work, but I'll try and tackle it in bits.

Also: @HParker and @peterzhu2118 - I know you folks have been working on getting Valgrind to work better with Ruby, for leak detection. I _think_ I see my efforts here as complementary to yours, rather than duplicative. The ASAN infrastructure for poisoning/unpoisoning stuff in the GC already exists and is _close_ to working properly, and it really did help me solve a bug yesterday (https://bugs.ruby-lang.org/issues/19994), so it seems useful and should be made to work. Your work on freeing memory on shutdown (https://bugs.ruby-lang.org/issues/19993) should actually help ASAN usefully detect leaks as well. I think ASAN might be better for eventually running CI checks against the full Ruby test suite, since allegedly it's faster. However, if you think solving these issues with ASAN is a waste of time and Valgrind can catch the same bugs already, please chime in!

-- 
https://bugs.ruby-lang.org/

[ruby-core:115346] [Ruby master Bug#20001] Make Ruby work properly with ASAN enabled

kjtsanaktsidis (KJ Tsanaktsidis)

kjtsanaktsidis (KJ Tsanaktsidis)

HParker (Adam Hess)

kjtsanaktsidis (KJ Tsanaktsidis)

tags

participants (2)