
Issue #19965 has been updated by mwaldvogel (Michael Waldvogel). I've recently updated one of my linux systems (Gentoo) to glibc 2.38 (that was the only change). After the update most of the time the below error happens. Among other things this breaks rubygems for me. I've reinstalled ruby 3.2.2 with rvm and didn't encounter the issue. The issue however remained even after reinstalling ruby 3.3.0 and even with ruby master. Since this goes back to getaddrinfo (which is working without any issues outside of ruby) and as this seems to be the only bigger change to stdlib socket, I'm assuming the problem was introduced with this feature. ``` 3.3.0 :001 > require 'socket' => true 3.3.0 :002 > Socket.getaddrinfo('rubygems.org', 443) (irb):2:in `getaddrinfo': getaddrinfo: Temporary failure in name resolution (Socket::ResolutionError) from (irb):2:in `<main>' from <internal:kernel>:187:in `loop' from /usr/local/rvm/rubies/ruby-3.3.0/lib/ruby/gems/3.3.0/gems/irb-1.11.0/exe/irb:9:in `<top (required)>' from /usr/local/rvm/rubies/ruby-3.3.0/bin/irb:25:in `load' from /usr/local/rvm/rubies/ruby-3.3.0/bin/irb:25:in `<main>' 3.3.0 :003 > Socket.getaddrinfo('rubygems.org', 443) (irb):3:in `getaddrinfo': getaddrinfo: Temporary failure in name resolution (Socket::ResolutionError) from (irb):3:in `<main>' from <internal:kernel>:187:in `loop' from /usr/local/rvm/rubies/ruby-3.3.0/lib/ruby/gems/3.3.0/gems/irb-1.11.0/exe/irb:9:in `<top (required)>' from /usr/local/rvm/rubies/ruby-3.3.0/bin/irb:25:in `load' from /usr/local/rvm/rubies/ruby-3.3.0/bin/irb:25:in `<main>' 3.3.0 :004 > Socket.getaddrinfo('rubygems.org', 443) => [["AF_INET", 443, "151.101.193.227", "151.101.193.227", 2, 1, 6], ["AF_INET", 443, "151.101.193.227", "151.101.193.227", 2, 2, 17], ["AF_INET", 443, "151.101.193.227", "151.101.193.227", 2, 3, 0], ["AF_INET", 443, "151.101.65.227", "151.101.65.227", 2, 1, 6], ["AF_INET", 443, "151.101.65.227", "151.101.65.227", 2, 2, 17], ["AF_INET", 443, "151.101.65.227", "151.101.65.227", 2, 3, 0], ["AF_INET", 443, "151.101.129.227", "151.101.129.227", 2, 1, 6], ["AF_INET", 443, "151.101.129.227", "151.101.129.227", 2, 2, 17], ["AF_INET", 443, "151.101.129.227", "151.101.129.227", 2, 3, 0], ["AF_INET", 443, "151.101.1.227", "151.101.1.227", 2, 1, 6], ["AF_INET", 443, "151.101.1.227", "151.101.1.227", 2, 2, 17], ["AF_INET", 443, "151.101.1.227", "151.101.1.227", 2, 3, 0], ["AF_INET6", 443, "2a04:4e42:400::483", "2a04:4e42:400::483", 10, 1, 6], ["AF_INET6", 443, "2a04:4e42:400::483", "2a04:4e42:400::483", 10, 2, 17], ["AF_INET6", 443, "2a04:4e42:400::483", "2a04:4e42:400::483", 10, 3, 0], ["AF_INET6", 443, "2a04:4e42:600::483", "2a04:4e42:600::483", 10, 1, 6], ["AF_INET6", 443, "2a04:4e42:600::483", "2a04:4e42:600::483", 10, 2, 17], ["AF_INET6", 443, "2a04:4e42:600::483", "2a04:4e42:600::483", 10, 3, 0], ["AF_INET6", 443, "2a04:4e42:200::483", "2a04:4e42:200::483", 10, 1, 6], ["AF_INET6", 443, "2a04:4e42:200::483", "2a04:4e42:200::483", 10, 2, 17], ["AF_INET6", 443, "2a04:4e42:200::483", "2a04:4e42:200::483", 10, 3, 0], ["AF_INET6", 443, "2a04:4e42::483", "2a04:4e42::483", 10, 1, 6], ["AF_INET6", 443, "2a04:4e42::483", "2a04:4e42::483", 10, 2, 17], ["AF_INET6", 443, "2a04:4e42::483", "2a04:4e42::483", 10, 3, 0]] 3.3.0 :005 > ``` ---------------------------------------- Feature #19965: Make the name resolution interruptible https://bugs.ruby-lang.org/issues/19965#change-106133 * Author: mame (Yusuke Endoh) * Status: Closed * Priority: Normal * Assignee: mame (Yusuke Endoh) ---------------------------------------- ## Problem Currently, Ruby name resolution is not interruptible. ``` $ cat /etc/resolv.conf nameserver 198.51.100.1 $ ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C^C^C^C ``` If you set a non-responsive IP as the nameserver, you cannot stop `Addrinfo.getaddrinfo` by pressing Ctrl+C. Note that `Timeout.timeout` does not work either. This is because there is no way to cancel `getaddrinfo(3)`. ## Proposal I wrote a patch to make `getaddrinfo(3)` work in a separate pthread. https://github.com/ruby/ruby/pull/8695 Whenever it needs name resolution, it creates a worker pthread, and executes `getaddrinfo(3)` in it. The caller thread waits for the worker to complete. When an interrupt occurs, the caller thread leaves stop waiting and leaves the worker pthread. The detached worker pthread will exit after `getaddrinfo(3)` completes (or name resolution times out). ## Evaluation By applying this patch, name resolution is now interruptible. ``` $ ./local/bin/ruby -rsocket -e 'pp Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C-e:1:in `getaddrinfo': Interrupt from -e:1:in `<main>' ``` As a drawback, name resolution performance will be degraded. ``` 10000.times { Addrinfo.getaddrinfo("www.ruby-lang.org", 80) } # Before patch: 2.3 sec. # After ptach: 3.0 sec. ``` However, I think that name resolution is typically short enough for the application's runtime. For example, the difference is small for the performance of `URI.open`. ``` 100.times { URI.open("https://www.ruby-lang.org").read } # Before patch: 3.36 sec. # After ptach: 3.40 sec. ``` ## Alternative approaches I proposed using c-ares to resolve this issue (#19430). However, there was an opinion that it would be a problem that c-ares does not respect the platform-dependent own name resolution. ## Room for improvement * Currently, this patch works only when pthread is available. * It might be possible to force to stop the worker threads by using `pthread_cancel`. However, `pthread_cancel` with `getaddrinfo(3)` seems still premature; there seems to be a bug in glibc until recently: https://bugzilla.redhat.com/show_bug.cgi?id=1405071 https://sourceware.org/bugzilla/show_bug.cgi?id=20975 * It would be more efficient to pool worker pthreads instead of creating them each time. -- https://bugs.ruby-lang.org/