
Issue #19965 has been updated by mame (Yusuke Endoh). Actually, I saw the same problem with CI on RedHat on s390x. https://rubyci.s3.amazonaws.com/rhel_zlinux/ruby-master/log/20231025T093302Z... ``` -- C level backtrace information ------------------------------------------- unknown address_size:0/home/chkbuild/build/20231025T093302Z/ruby/ruby(rb_print_backtrace+0x10) [0x2aa22b5eb06] vm_dump.c:812 /home/chkbuild/build/20231025T093302Z/ruby/ruby(rb_vm_bugreport) vm_dump.c:1143 /home/chkbuild/build/20231025T093302Z/ruby/ruby(rb_bug_for_fatal_signal+0xc2) [0x2aa22c62da2] error.c:1065 /home/chkbuild/build/20231025T093302Z/ruby/ruby(sigill+0x0) [0x2aa22a9f000] signal.c:920 /home/chkbuild/build/20231025T093302Z/ruby/ruby(sigsegv) (null):0 [0x3fef1782718] /lib64/libpthread.so.0(pthread_setaffinity_np+0x44) [0x3ff8031103c] /home/chkbuild/build/20231025T093302Z/ruby/.ext/s390x-linux/socket.so(rb_getnameinfo+0x290) [0x3ff567a3340] ``` I thought it might be specific to glibc on s390x, and I stopped using `pthread_setaffinity_np` on only s390x. But if it appears on other environments as well (especially x86_64), I'll have to do something. ---------------------------------------- Feature #19965: Make the name resolution interruptible https://bugs.ruby-lang.org/issues/19965#change-105179 * Author: mame (Yusuke Endoh) * Status: Assigned * Priority: Normal * Assignee: mame (Yusuke Endoh) ---------------------------------------- ## Problem Currently, Ruby name resolution is not interruptible. ``` $ cat /etc/resolv.conf nameserver 198.51.100.1 $ ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C^C^C^C ``` If you set a non-responsive IP as the nameserver, you cannot stop `Addrinfo.getaddrinfo` by pressing Ctrl+C. Note that `Timeout.timeout` does not work either. This is because there is no way to cancel `getaddrinfo(3)`. ## Proposal I wrote a patch to make `getaddrinfo(3)` work in a separate pthread. https://github.com/ruby/ruby/pull/8695 Whenever it needs name resolution, it creates a worker pthread, and executes `getaddrinfo(3)` in it. The caller thread waits for the worker to complete. When an interrupt occurs, the caller thread leaves stop waiting and leaves the worker pthread. The detached worker pthread will exit after `getaddrinfo(3)` completes (or name resolution times out). ## Evaluation By applying this patch, name resolution is now interruptible. ``` $ ./local/bin/ruby -rsocket -e 'pp Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C-e:1:in `getaddrinfo': Interrupt from -e:1:in `<main>' ``` As a drawback, name resolution performance will be degraded. ``` 10000.times { Addrinfo.getaddrinfo("www.ruby-lang.org", 80) } # Before patch: 2.3 sec. # After ptach: 3.0 sec. ``` However, I think that name resolution is typically short enough for the application's runtime. For example, the difference is small for the performance of `URI.open`. ``` 100.times { URI.open("https://www.ruby-lang.org").read } # Before patch: 3.36 sec. # After ptach: 3.40 sec. ``` ## Alternative approaches I proposed using c-ares to resolve this issue (#19430). However, there was an opinion that it would be a problem that c-ares does not respect the platform-dependent own name resolution. ## Room for improvement * Currently, this patch works only when pthread is available. * It might be possible to force to stop the worker threads by using `pthread_cancel`. However, `pthread_cancel` with `getaddrinfo(3)` seems still premature; there seems to be a bug in glibc until recently: https://bugzilla.redhat.com/show_bug.cgi?id=1405071 https://sourceware.org/bugzilla/show_bug.cgi?id=20975 * It would be more efficient to pool worker pthreads instead of creating them each time. -- https://bugs.ruby-lang.org/