[ruby-core:115104] [Ruby master Feature#19965] Make the name resolution interruptible

Issue #19965 has been reported by mame (Yusuke Endoh). ---------------------------------------- Feature #19965: Make the name resolution interruptible https://bugs.ruby-lang.org/issues/19965 * Author: mame (Yusuke Endoh) * Status: Open * Priority: Normal ---------------------------------------- ## Problem Currently, Ruby name resolution is not interruptible. ``` $ cat /etc/resolv.conf nameserver 198.51.100.1 $ ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C^C^C^C ``` If you set a non-responsive IP as the nameserver, you cannot stop `Addrinfo.getaddrinfo` by pressing Ctrl+C. Note that `Timeout.timeout` does not work either. This is because there is no way to cancel `getaddrinfo(3)`. ## Proposal I wrote a patch to make `getaddrinfo(3)` work in a separate pthread. https://github.com/ruby/ruby/pull/8695 Whenever it needs name resolution, it creates a worker pthread, and executes `getaddrinfo(3)` in it. The caller thread waits for the worker to complete. When an interrupt occurs, the caller thread leaves stop waiting and leaves the worker pthread. The detached worker pthread will exit after `getaddrinfo(3)` completes (or name resolution times out). ## Evaluation By applying this patch, name resolution is now interruptible. ``` $ ./local/bin/ruby -rsocket -e 'pp Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C-e:1:in `getaddrinfo': Interrupt from -e:1:in `<main>' ``` As a drawback, name resolution performance will be degraded. ``` 10000.times { Addrinfo.getaddrinfo("www.ruby-lang.org", 80) } # Before patch: 2.3 sec. # After ptach: 3.0 sec. ``` However, I think that name resolution is typically short enough for the application's runtime. For example, the difference is small for the performance of `URI.open`. ``` 100.times { URI.open("https://www.ruby-lang.org").read } # Before patch: 3.36 sec. # After ptach: 3.40 sec. ``` ## Alternative approaches I proposed using c-ares to resolve this issue (#19430). However, there was an opinion that it would be a problem that c-ares does not respect the platform-dependent own name resolution. ## Room for improvement * Currently, this patch works only when pthread is available. * It might be possible to force to stop the worker threads by using `pthread_cancel`. However, `pthread_cancel` with `getaddrinfo(3)` seems still premature; there seems to be a bug in glibc until recently: https://bugzilla.redhat.com/show_bug.cgi?id=1405071 https://sourceware.org/bugzilla/show_bug.cgi?id=20975 * It would be more efficient to pool worker pthreads instead of creating them each time. -- https://bugs.ruby-lang.org/

Issue #19965 has been updated by byroot (Jean Boussier). @mame we just ran into a crash on our `ruby-head` nightly CI that seem related: ``` /app/components/platform/essentials/lib/http_host_restriction.rb:50: [BUG] Segmentation fault at 0x00007ff23f795910 ruby 3.3.0dev (2023-11-06T03:01:06Z shopify a763d085e4) +YJIT [x86_64-linux] -- Control frame information ----------------------------------------------- c:0139 p:---- s:0684 e:000683 CFUNC :ip_address c:0138 p:0006 s:0680 e:000679 METHOD /app/components/platform/essentials/lib/http_host_restriction.rb:50 c:0137 p:0024 s:0672 e:000671 METHOD /app/components/platform/essentials/lib/http_host_restriction.rb:86 -- C level backtrace information ------------------------------------------- /usr/local/ruby/bin/ruby(rb_print_backtrace+0x14) [0x557302ccae51] /tmp/ruby-build/ruby-3.3.0-a763d085e446d4a3cb09bd5f6bcaffc30484e804/vm_dump.c:812 /usr/local/ruby/bin/ruby(rb_vm_bugreport) /tmp/ruby-build/ruby-3.3.0-a763d085e446d4a3cb09bd5f6bcaffc30484e804/vm_dump.c:1143 /usr/local/ruby/bin/ruby(rb_bug_for_fatal_signal+0xfc) [0x557302e77a7c] /tmp/ruby-build/ruby-3.3.0-a763d085e446d4a3cb09bd5f6bcaffc30484e804/error.c:1065 /usr/local/ruby/bin/ruby(sigsegv+0x4d) [0x557302c1763d] /tmp/ruby-build/ruby-3.3.0-a763d085e446d4a3cb09bd5f6bcaffc30484e804/signal.c:920 /lib/x86_64-linux-gnu/libc.so.6(0x7ff42d333520) [0x7ff42d333520] /lib/x86_64-linux-gnu/libc.so.6(pthread_setaffinity_np+0x4) [0x7ff42d38c524] /usr/local/ruby/lib/ruby/3.3.0+0/x86_64-linux/socket.so(rb_getnameinfo+0xf2) [0x7ff40f1c8f92] /tmp/ruby-build/ruby-3.3.0-a763d085e446d4a3cb09bd5f6bcaffc30484e804/ext/socket/raddrinfo.c:711 /usr/local/ruby/lib/ruby/3.3.0+0/x86_64-linux/socket.so(rb_getnameinfo) (null):0 /usr/local/ruby/lib/ruby/3.3.0+0/x86_64-linux/socket.so(addrinfo_getnameinfo+0x88) [0x7ff40f1c94e8] /tmp/ruby-build/ruby-3.3.0-a763d085e446d4a3cb09bd5f6bcaffc30484e804/ext/socket/raddrinfo.c:2372 /usr/local/ruby/lib/ruby/3.3.0+0/x86_64-linux/socket.so(addrinfo_ip_address+0x59) [0x7ff40f1c95f9] /tmp/ruby-build/ruby-3.3.0-a763d085e446d4a3cb09bd5f6bcaffc30484e804/ext/socket/raddrinfo.c:2430 /usr/local/ruby/bin/ruby(vm_call_cfunc_with_frame_+0x117) [0x557302ca77e7] /tmp/ruby-build/ruby-3.3.0-a763d085e446d4a3cb09bd5f6bcaffc30484e804/vm_insnhelper.c:3503 /usr/local/ruby/bin/ruby(vm_sendish+0x9e) [0x557302cbafc7] /tmp/ruby-build/ruby-3.3.0-a763d085e446d4a3cb09bd5f6bcaffc30484e804/vm_insnhelper.c:5581 /usr/local/ruby/bin/ruby(vm_exec_core) /tmp/ruby-build/ruby-3.3.0-a763d085e446d4a3cb09bd5f6bcaffc30484e804/insns.def:822 /usr/local/ruby/bin/ruby(rb_vm_exec+0x18e) [0x557302cab87e] /tmp/ruby-build/ruby-3.3.0-a763d085e446d4a3cb09bd5f6bcaffc30484e804/vm.c:2472 ``` Let me know if I can provide more information. ---------------------------------------- Feature #19965: Make the name resolution interruptible https://bugs.ruby-lang.org/issues/19965#change-105173 * Author: mame (Yusuke Endoh) * Status: Closed * Priority: Normal ---------------------------------------- ## Problem Currently, Ruby name resolution is not interruptible. ``` $ cat /etc/resolv.conf nameserver 198.51.100.1 $ ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C^C^C^C ``` If you set a non-responsive IP as the nameserver, you cannot stop `Addrinfo.getaddrinfo` by pressing Ctrl+C. Note that `Timeout.timeout` does not work either. This is because there is no way to cancel `getaddrinfo(3)`. ## Proposal I wrote a patch to make `getaddrinfo(3)` work in a separate pthread. https://github.com/ruby/ruby/pull/8695 Whenever it needs name resolution, it creates a worker pthread, and executes `getaddrinfo(3)` in it. The caller thread waits for the worker to complete. When an interrupt occurs, the caller thread leaves stop waiting and leaves the worker pthread. The detached worker pthread will exit after `getaddrinfo(3)` completes (or name resolution times out). ## Evaluation By applying this patch, name resolution is now interruptible. ``` $ ./local/bin/ruby -rsocket -e 'pp Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C-e:1:in `getaddrinfo': Interrupt from -e:1:in `<main>' ``` As a drawback, name resolution performance will be degraded. ``` 10000.times { Addrinfo.getaddrinfo("www.ruby-lang.org", 80) } # Before patch: 2.3 sec. # After ptach: 3.0 sec. ``` However, I think that name resolution is typically short enough for the application's runtime. For example, the difference is small for the performance of `URI.open`. ``` 100.times { URI.open("https://www.ruby-lang.org").read } # Before patch: 3.36 sec. # After ptach: 3.40 sec. ``` ## Alternative approaches I proposed using c-ares to resolve this issue (#19430). However, there was an opinion that it would be a problem that c-ares does not respect the platform-dependent own name resolution. ## Room for improvement * Currently, this patch works only when pthread is available. * It might be possible to force to stop the worker threads by using `pthread_cancel`. However, `pthread_cancel` with `getaddrinfo(3)` seems still premature; there seems to be a bug in glibc until recently: https://bugzilla.redhat.com/show_bug.cgi?id=1405071 https://sourceware.org/bugzilla/show_bug.cgi?id=20975 * It would be more efficient to pool worker pthreads instead of creating them each time. -- https://bugs.ruby-lang.org/

Issue #19965 has been updated by mame (Yusuke Endoh). Status changed from Closed to Assigned Assignee set to mame (Yusuke Endoh) @byroot Thanks! Maybe I'm misunderstanding the usage of `pthread_setaffinity_np`. I'll check it out. If I don't understand it, I'll stop using `pthread_setaffinity_np`. ---------------------------------------- Feature #19965: Make the name resolution interruptible https://bugs.ruby-lang.org/issues/19965#change-105175 * Author: mame (Yusuke Endoh) * Status: Assigned * Priority: Normal * Assignee: mame (Yusuke Endoh) ---------------------------------------- ## Problem Currently, Ruby name resolution is not interruptible. ``` $ cat /etc/resolv.conf nameserver 198.51.100.1 $ ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C^C^C^C ``` If you set a non-responsive IP as the nameserver, you cannot stop `Addrinfo.getaddrinfo` by pressing Ctrl+C. Note that `Timeout.timeout` does not work either. This is because there is no way to cancel `getaddrinfo(3)`. ## Proposal I wrote a patch to make `getaddrinfo(3)` work in a separate pthread. https://github.com/ruby/ruby/pull/8695 Whenever it needs name resolution, it creates a worker pthread, and executes `getaddrinfo(3)` in it. The caller thread waits for the worker to complete. When an interrupt occurs, the caller thread leaves stop waiting and leaves the worker pthread. The detached worker pthread will exit after `getaddrinfo(3)` completes (or name resolution times out). ## Evaluation By applying this patch, name resolution is now interruptible. ``` $ ./local/bin/ruby -rsocket -e 'pp Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C-e:1:in `getaddrinfo': Interrupt from -e:1:in `<main>' ``` As a drawback, name resolution performance will be degraded. ``` 10000.times { Addrinfo.getaddrinfo("www.ruby-lang.org", 80) } # Before patch: 2.3 sec. # After ptach: 3.0 sec. ``` However, I think that name resolution is typically short enough for the application's runtime. For example, the difference is small for the performance of `URI.open`. ``` 100.times { URI.open("https://www.ruby-lang.org").read } # Before patch: 3.36 sec. # After ptach: 3.40 sec. ``` ## Alternative approaches I proposed using c-ares to resolve this issue (#19430). However, there was an opinion that it would be a problem that c-ares does not respect the platform-dependent own name resolution. ## Room for improvement * Currently, this patch works only when pthread is available. * It might be possible to force to stop the worker threads by using `pthread_cancel`. However, `pthread_cancel` with `getaddrinfo(3)` seems still premature; there seems to be a bug in glibc until recently: https://bugzilla.redhat.com/show_bug.cgi?id=1405071 https://sourceware.org/bugzilla/show_bug.cgi?id=20975 * It would be more efficient to pool worker pthreads instead of creating them each time. -- https://bugs.ruby-lang.org/

Issue #19965 has been updated by byroot (Jean Boussier). Important note: in our environment we do fork a lot, so it's not impossible that the cause may be the that the thread in dead. ---------------------------------------- Feature #19965: Make the name resolution interruptible https://bugs.ruby-lang.org/issues/19965#change-105176 * Author: mame (Yusuke Endoh) * Status: Assigned * Priority: Normal * Assignee: mame (Yusuke Endoh) ---------------------------------------- ## Problem Currently, Ruby name resolution is not interruptible. ``` $ cat /etc/resolv.conf nameserver 198.51.100.1 $ ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C^C^C^C ``` If you set a non-responsive IP as the nameserver, you cannot stop `Addrinfo.getaddrinfo` by pressing Ctrl+C. Note that `Timeout.timeout` does not work either. This is because there is no way to cancel `getaddrinfo(3)`. ## Proposal I wrote a patch to make `getaddrinfo(3)` work in a separate pthread. https://github.com/ruby/ruby/pull/8695 Whenever it needs name resolution, it creates a worker pthread, and executes `getaddrinfo(3)` in it. The caller thread waits for the worker to complete. When an interrupt occurs, the caller thread leaves stop waiting and leaves the worker pthread. The detached worker pthread will exit after `getaddrinfo(3)` completes (or name resolution times out). ## Evaluation By applying this patch, name resolution is now interruptible. ``` $ ./local/bin/ruby -rsocket -e 'pp Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C-e:1:in `getaddrinfo': Interrupt from -e:1:in `<main>' ``` As a drawback, name resolution performance will be degraded. ``` 10000.times { Addrinfo.getaddrinfo("www.ruby-lang.org", 80) } # Before patch: 2.3 sec. # After ptach: 3.0 sec. ``` However, I think that name resolution is typically short enough for the application's runtime. For example, the difference is small for the performance of `URI.open`. ``` 100.times { URI.open("https://www.ruby-lang.org").read } # Before patch: 3.36 sec. # After ptach: 3.40 sec. ``` ## Alternative approaches I proposed using c-ares to resolve this issue (#19430). However, there was an opinion that it would be a problem that c-ares does not respect the platform-dependent own name resolution. ## Room for improvement * Currently, this patch works only when pthread is available. * It might be possible to force to stop the worker threads by using `pthread_cancel`. However, `pthread_cancel` with `getaddrinfo(3)` seems still premature; there seems to be a bug in glibc until recently: https://bugzilla.redhat.com/show_bug.cgi?id=1405071 https://sourceware.org/bugzilla/show_bug.cgi?id=20975 * It would be more efficient to pool worker pthreads instead of creating them each time. -- https://bugs.ruby-lang.org/

Issue #19965 has been updated by mame (Yusuke Endoh). Actually, I saw the same problem with CI on RedHat on s390x. https://rubyci.s3.amazonaws.com/rhel_zlinux/ruby-master/log/20231025T093302Z... ``` -- C level backtrace information ------------------------------------------- unknown address_size:0/home/chkbuild/build/20231025T093302Z/ruby/ruby(rb_print_backtrace+0x10) [0x2aa22b5eb06] vm_dump.c:812 /home/chkbuild/build/20231025T093302Z/ruby/ruby(rb_vm_bugreport) vm_dump.c:1143 /home/chkbuild/build/20231025T093302Z/ruby/ruby(rb_bug_for_fatal_signal+0xc2) [0x2aa22c62da2] error.c:1065 /home/chkbuild/build/20231025T093302Z/ruby/ruby(sigill+0x0) [0x2aa22a9f000] signal.c:920 /home/chkbuild/build/20231025T093302Z/ruby/ruby(sigsegv) (null):0 [0x3fef1782718] /lib64/libpthread.so.0(pthread_setaffinity_np+0x44) [0x3ff8031103c] /home/chkbuild/build/20231025T093302Z/ruby/.ext/s390x-linux/socket.so(rb_getnameinfo+0x290) [0x3ff567a3340] ``` I thought it might be specific to glibc on s390x, and I stopped using `pthread_setaffinity_np` on only s390x. But if it appears on other environments as well (especially x86_64), I'll have to do something. ---------------------------------------- Feature #19965: Make the name resolution interruptible https://bugs.ruby-lang.org/issues/19965#change-105179 * Author: mame (Yusuke Endoh) * Status: Assigned * Priority: Normal * Assignee: mame (Yusuke Endoh) ---------------------------------------- ## Problem Currently, Ruby name resolution is not interruptible. ``` $ cat /etc/resolv.conf nameserver 198.51.100.1 $ ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C^C^C^C ``` If you set a non-responsive IP as the nameserver, you cannot stop `Addrinfo.getaddrinfo` by pressing Ctrl+C. Note that `Timeout.timeout` does not work either. This is because there is no way to cancel `getaddrinfo(3)`. ## Proposal I wrote a patch to make `getaddrinfo(3)` work in a separate pthread. https://github.com/ruby/ruby/pull/8695 Whenever it needs name resolution, it creates a worker pthread, and executes `getaddrinfo(3)` in it. The caller thread waits for the worker to complete. When an interrupt occurs, the caller thread leaves stop waiting and leaves the worker pthread. The detached worker pthread will exit after `getaddrinfo(3)` completes (or name resolution times out). ## Evaluation By applying this patch, name resolution is now interruptible. ``` $ ./local/bin/ruby -rsocket -e 'pp Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C-e:1:in `getaddrinfo': Interrupt from -e:1:in `<main>' ``` As a drawback, name resolution performance will be degraded. ``` 10000.times { Addrinfo.getaddrinfo("www.ruby-lang.org", 80) } # Before patch: 2.3 sec. # After ptach: 3.0 sec. ``` However, I think that name resolution is typically short enough for the application's runtime. For example, the difference is small for the performance of `URI.open`. ``` 100.times { URI.open("https://www.ruby-lang.org").read } # Before patch: 3.36 sec. # After ptach: 3.40 sec. ``` ## Alternative approaches I proposed using c-ares to resolve this issue (#19430). However, there was an opinion that it would be a problem that c-ares does not respect the platform-dependent own name resolution. ## Room for improvement * Currently, this patch works only when pthread is available. * It might be possible to force to stop the worker threads by using `pthread_cancel`. However, `pthread_cancel` with `getaddrinfo(3)` seems still premature; there seems to be a bug in glibc until recently: https://bugzilla.redhat.com/show_bug.cgi?id=1405071 https://sourceware.org/bugzilla/show_bug.cgi?id=20975 * It would be more efficient to pool worker pthreads instead of creating them each time. -- https://bugs.ruby-lang.org/

Issue #19965 has been updated by mame (Yusuke Endoh). I guess https://github.com/ruby/ruby/pull/8852 will solve the issue. ---------------------------------------- Feature #19965: Make the name resolution interruptible https://bugs.ruby-lang.org/issues/19965#change-105184 * Author: mame (Yusuke Endoh) * Status: Assigned * Priority: Normal * Assignee: mame (Yusuke Endoh) ---------------------------------------- ## Problem Currently, Ruby name resolution is not interruptible. ``` $ cat /etc/resolv.conf nameserver 198.51.100.1 $ ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C^C^C^C ``` If you set a non-responsive IP as the nameserver, you cannot stop `Addrinfo.getaddrinfo` by pressing Ctrl+C. Note that `Timeout.timeout` does not work either. This is because there is no way to cancel `getaddrinfo(3)`. ## Proposal I wrote a patch to make `getaddrinfo(3)` work in a separate pthread. https://github.com/ruby/ruby/pull/8695 Whenever it needs name resolution, it creates a worker pthread, and executes `getaddrinfo(3)` in it. The caller thread waits for the worker to complete. When an interrupt occurs, the caller thread leaves stop waiting and leaves the worker pthread. The detached worker pthread will exit after `getaddrinfo(3)` completes (or name resolution times out). ## Evaluation By applying this patch, name resolution is now interruptible. ``` $ ./local/bin/ruby -rsocket -e 'pp Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C-e:1:in `getaddrinfo': Interrupt from -e:1:in `<main>' ``` As a drawback, name resolution performance will be degraded. ``` 10000.times { Addrinfo.getaddrinfo("www.ruby-lang.org", 80) } # Before patch: 2.3 sec. # After ptach: 3.0 sec. ``` However, I think that name resolution is typically short enough for the application's runtime. For example, the difference is small for the performance of `URI.open`. ``` 100.times { URI.open("https://www.ruby-lang.org").read } # Before patch: 3.36 sec. # After ptach: 3.40 sec. ``` ## Alternative approaches I proposed using c-ares to resolve this issue (#19430). However, there was an opinion that it would be a problem that c-ares does not respect the platform-dependent own name resolution. ## Room for improvement * Currently, this patch works only when pthread is available. * It might be possible to force to stop the worker threads by using `pthread_cancel`. However, `pthread_cancel` with `getaddrinfo(3)` seems still premature; there seems to be a bug in glibc until recently: https://bugzilla.redhat.com/show_bug.cgi?id=1405071 https://sourceware.org/bugzilla/show_bug.cgi?id=20975 * It would be more efficient to pool worker pthreads instead of creating them each time. -- https://bugs.ruby-lang.org/

Issue #19965 has been updated by mwaldvogel (Michael Waldvogel). I've recently updated one of my linux systems (Gentoo) to glibc 2.38 (that was the only change). After the update most of the time the below error happens. Among other things this breaks rubygems for me. I've reinstalled ruby 3.2.2 with rvm and didn't encounter the issue. The issue however remained even after reinstalling ruby 3.3.0 and even with ruby master. Since this goes back to getaddrinfo (which is working without any issues outside of ruby) and as this seems to be the only bigger change to stdlib socket, I'm assuming the problem was introduced with this feature. ``` 3.3.0 :001 > require 'socket' => true 3.3.0 :002 > Socket.getaddrinfo('rubygems.org', 443) (irb):2:in `getaddrinfo': getaddrinfo: Temporary failure in name resolution (Socket::ResolutionError) from (irb):2:in `<main>' from <internal:kernel>:187:in `loop' from /usr/local/rvm/rubies/ruby-3.3.0/lib/ruby/gems/3.3.0/gems/irb-1.11.0/exe/irb:9:in `<top (required)>' from /usr/local/rvm/rubies/ruby-3.3.0/bin/irb:25:in `load' from /usr/local/rvm/rubies/ruby-3.3.0/bin/irb:25:in `<main>' 3.3.0 :003 > Socket.getaddrinfo('rubygems.org', 443) (irb):3:in `getaddrinfo': getaddrinfo: Temporary failure in name resolution (Socket::ResolutionError) from (irb):3:in `<main>' from <internal:kernel>:187:in `loop' from /usr/local/rvm/rubies/ruby-3.3.0/lib/ruby/gems/3.3.0/gems/irb-1.11.0/exe/irb:9:in `<top (required)>' from /usr/local/rvm/rubies/ruby-3.3.0/bin/irb:25:in `load' from /usr/local/rvm/rubies/ruby-3.3.0/bin/irb:25:in `<main>' 3.3.0 :004 > Socket.getaddrinfo('rubygems.org', 443) => [["AF_INET", 443, "151.101.193.227", "151.101.193.227", 2, 1, 6], ["AF_INET", 443, "151.101.193.227", "151.101.193.227", 2, 2, 17], ["AF_INET", 443, "151.101.193.227", "151.101.193.227", 2, 3, 0], ["AF_INET", 443, "151.101.65.227", "151.101.65.227", 2, 1, 6], ["AF_INET", 443, "151.101.65.227", "151.101.65.227", 2, 2, 17], ["AF_INET", 443, "151.101.65.227", "151.101.65.227", 2, 3, 0], ["AF_INET", 443, "151.101.129.227", "151.101.129.227", 2, 1, 6], ["AF_INET", 443, "151.101.129.227", "151.101.129.227", 2, 2, 17], ["AF_INET", 443, "151.101.129.227", "151.101.129.227", 2, 3, 0], ["AF_INET", 443, "151.101.1.227", "151.101.1.227", 2, 1, 6], ["AF_INET", 443, "151.101.1.227", "151.101.1.227", 2, 2, 17], ["AF_INET", 443, "151.101.1.227", "151.101.1.227", 2, 3, 0], ["AF_INET6", 443, "2a04:4e42:400::483", "2a04:4e42:400::483", 10, 1, 6], ["AF_INET6", 443, "2a04:4e42:400::483", "2a04:4e42:400::483", 10, 2, 17], ["AF_INET6", 443, "2a04:4e42:400::483", "2a04:4e42:400::483", 10, 3, 0], ["AF_INET6", 443, "2a04:4e42:600::483", "2a04:4e42:600::483", 10, 1, 6], ["AF_INET6", 443, "2a04:4e42:600::483", "2a04:4e42:600::483", 10, 2, 17], ["AF_INET6", 443, "2a04:4e42:600::483", "2a04:4e42:600::483", 10, 3, 0], ["AF_INET6", 443, "2a04:4e42:200::483", "2a04:4e42:200::483", 10, 1, 6], ["AF_INET6", 443, "2a04:4e42:200::483", "2a04:4e42:200::483", 10, 2, 17], ["AF_INET6", 443, "2a04:4e42:200::483", "2a04:4e42:200::483", 10, 3, 0], ["AF_INET6", 443, "2a04:4e42::483", "2a04:4e42::483", 10, 1, 6], ["AF_INET6", 443, "2a04:4e42::483", "2a04:4e42::483", 10, 2, 17], ["AF_INET6", 443, "2a04:4e42::483", "2a04:4e42::483", 10, 3, 0]] 3.3.0 :005 > ``` ---------------------------------------- Feature #19965: Make the name resolution interruptible https://bugs.ruby-lang.org/issues/19965#change-106133 * Author: mame (Yusuke Endoh) * Status: Closed * Priority: Normal * Assignee: mame (Yusuke Endoh) ---------------------------------------- ## Problem Currently, Ruby name resolution is not interruptible. ``` $ cat /etc/resolv.conf nameserver 198.51.100.1 $ ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C^C^C^C ``` If you set a non-responsive IP as the nameserver, you cannot stop `Addrinfo.getaddrinfo` by pressing Ctrl+C. Note that `Timeout.timeout` does not work either. This is because there is no way to cancel `getaddrinfo(3)`. ## Proposal I wrote a patch to make `getaddrinfo(3)` work in a separate pthread. https://github.com/ruby/ruby/pull/8695 Whenever it needs name resolution, it creates a worker pthread, and executes `getaddrinfo(3)` in it. The caller thread waits for the worker to complete. When an interrupt occurs, the caller thread leaves stop waiting and leaves the worker pthread. The detached worker pthread will exit after `getaddrinfo(3)` completes (or name resolution times out). ## Evaluation By applying this patch, name resolution is now interruptible. ``` $ ./local/bin/ruby -rsocket -e 'pp Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C-e:1:in `getaddrinfo': Interrupt from -e:1:in `<main>' ``` As a drawback, name resolution performance will be degraded. ``` 10000.times { Addrinfo.getaddrinfo("www.ruby-lang.org", 80) } # Before patch: 2.3 sec. # After ptach: 3.0 sec. ``` However, I think that name resolution is typically short enough for the application's runtime. For example, the difference is small for the performance of `URI.open`. ``` 100.times { URI.open("https://www.ruby-lang.org").read } # Before patch: 3.36 sec. # After ptach: 3.40 sec. ``` ## Alternative approaches I proposed using c-ares to resolve this issue (#19430). However, there was an opinion that it would be a problem that c-ares does not respect the platform-dependent own name resolution. ## Room for improvement * Currently, this patch works only when pthread is available. * It might be possible to force to stop the worker threads by using `pthread_cancel`. However, `pthread_cancel` with `getaddrinfo(3)` seems still premature; there seems to be a bug in glibc until recently: https://bugzilla.redhat.com/show_bug.cgi?id=1405071 https://sourceware.org/bugzilla/show_bug.cgi?id=20975 * It would be more efficient to pool worker pthreads instead of creating them each time. -- https://bugs.ruby-lang.org/
participants (3)
-
byroot (Jean Boussier)
-
mame (Yusuke Endoh)
-
mwaldvogel (Michael Waldvogel)