[ruby-core:112326] [Ruby master Feature#19430] Contribution wanted: DNS lookup by c-ares library

Issue #19430 has been reported by mame (Yusuke Endoh). ---------------------------------------- Feature #19430: Contribution wanted: DNS lookup by c-ares library https://bugs.ruby-lang.org/issues/19430 * Author: mame (Yusuke Endoh) * Status: Open * Priority: Normal ---------------------------------------- ## Problem At the present time, Ruby uses `getaddrinfo(3)` to resolve names. Because this function is synchronous, we cannot interrupt the thread performing name resolution until the DNS server returns a response. We can see this behavior by setting blackhole.webpagetest.org (72.66.115.13) as a DNS server, which swallows all packets, and resolving any name: ``` # cat /etc/resolv.conf nameserver 72.66.115.13 # ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C^C^C^C ``` As we see, Ctrl+C does not stop ruby. The current workaround that users can take is to do name resolution in a Ruby thread. ```ruby Thread.new { Addrinfo.getaddrinfo("www.ruby-lang.org", 80) }.value ``` The thread that calls this code is interruptible. (Note that the newly created thread itself will be stuck until the DNS lookup exceeds the time out.) ## Proposal We can solve this problem by using c-ares, which is an asynchronous name resolver, as a backend of `Addrinfo.getaddrinfo`, etc. (@sorah told me about this library, thanks!) https://c-ares.org/ I have created a PoC patch. https://github.com/mame/ruby/commit/547806146993bbc25984011d423dcc0f913b211c By applying this patch, we can interrupt `Addrinfo.getaddrinfo` by Ctrl+C. ``` # cat /etc/resolv.conf nameserver 72.66.115.13 # ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C-e:1:in `getaddrinfo': Interrupt from -e:1:in `<main>' ``` ## Discussion ### About c-ares According to the site of c-ares, some major tools including libcurl, Wireshark, and Apache Arrow are already using c-ares. In the language interpreter, node.js seems to be using c-ares. I am honestly not sure about the compatibility of c-ares with `getaddrinfo(3)`. I guess there is no major incompatibility because I have not experienced any name resolution problem of curl. @akr (who is the author and maintainer of Ruby's socket library) suggested to check if OS-specific name resolution, e.g., WINS on Windows, NIS on Solaris, etc., is supported. He also said that it may be acceptable even if they are not supported. Whether to bundle c-ares source code with ruby would require further discussion. If this proposal is accepted, then c-ares will become a de facto essential dependency for practical use, like gmp, in my opinion. Incidentally, node.js bundles c-ares: https://github.com/nodejs/node/tree/main/deps/cares ### Alternative approaches Recent glibc provides `getaddrinfo_a(3)` which performs asynchronous name resolution. However, this function has a fatal problem of being incompatible with `fork(2)`, which is heavily used in the Ruby ecosystem. In fact, the attempt to use `getaddrinfo_a(3)` (#17134) has been revert because it fails rails tests. (#17220) Another alternative is to have a worker pthread inside Ruby that calls getaddrinfo(3). Instead of calling getaddrinfo(3) directly, `Addrinfo.getaddrinfo` would ask the worker to resolve a name and wait for a response. This method should be able to implement cancellation. (Simply put, this means reimplementation of getaddrinfo_a(3) on our own, taking into account of `fork(2).) This has the advantages: not adding dependencies on external libraries and not having compatibility issues with `getaddrinfo(3)`. However, it is considerably more difficult to implement and maintain. An internal pthread may have a non-trivial impact on the execution efficiency and memory usage. Also, we may need to implement a mechanism to dynamically change the number of workers depending on the load. It would be ideal if we could try and evaluate both approaches. But my current impression is that using c-ares is the quickest and best compromise. ## Contribution wanted I have made it up to the PoC, but don't have much time to complete this. @naruse suggested me to create a ticket asking for contributions. Is anyone interested in this? * This patch changes `rsock_getaddrinfo` to accept a timeout argument. There are several places where Qnil is passed as a timeout (where I add `// TODO` in the PoC). We need to consider what timeout we should pass. * This cares only `getaddrinfo`, but we also need to care `getnameinfo` (and something else if any). There may be some issues I'm not aware of. * I have not yet tested this PoC seriously. It would be great if we could evaluate it with some real apps. Also, it would be great to hear from someone who knows more about c-ares. -- https://bugs.ruby-lang.org/

Issue #19430 has been updated by alanwu (Alan Wu). Here is a potential consideration for macOS. Go has its own custom DNS resolver and can also use the system resolver. It used to switch between the two depending on build configuration. Roughly speaking, when cross-compiling and targeting macOS from Linux, it used its own custom resolver. This caused issues for some VPN users since they rely on configurations that only the system resolver understands. Here is a post that discusses the issue in more detail: https://danp.net/posts/macos-dns-change-in-go-1-20 In any case, it's very valuable to have a resolver that's well-behaved with respect to interrupts across all platforms. Don't let this detail discourage you if you're looking to contribute. ---------------------------------------- Feature #19430: Contribution wanted: DNS lookup by c-ares library https://bugs.ruby-lang.org/issues/19430#change-101776 * Author: mame (Yusuke Endoh) * Status: Open * Priority: Normal ---------------------------------------- ## Problem At the present time, Ruby uses `getaddrinfo(3)` to resolve names. Because this function is synchronous, we cannot interrupt the thread performing name resolution until the DNS server returns a response. We can see this behavior by setting blackhole.webpagetest.org (72.66.115.13) as a DNS server, which swallows all packets, and resolving any name: ``` # cat /etc/resolv.conf nameserver 72.66.115.13 # ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C^C^C^C ``` As we see, Ctrl+C does not stop ruby. The current workaround that users can take is to do name resolution in a Ruby thread. ```ruby Thread.new { Addrinfo.getaddrinfo("www.ruby-lang.org", 80) }.value ``` The thread that calls this code is interruptible. (Note that the newly created thread itself will be stuck until the DNS lookup exceeds the time out.) ## Proposal We can solve this problem by using c-ares, which is an asynchronous name resolver, as a backend of `Addrinfo.getaddrinfo`, etc. (@sorah told me about this library, thanks!) https://c-ares.org/ I have created a PoC patch. https://github.com/mame/ruby/commit/547806146993bbc25984011d423dcc0f913b211c By applying this patch, we can interrupt `Addrinfo.getaddrinfo` by Ctrl+C. ``` # cat /etc/resolv.conf nameserver 72.66.115.13 # ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C-e:1:in `getaddrinfo': Interrupt from -e:1:in `<main>' ``` ## Discussion ### About c-ares According to the site of c-ares, some major tools including libcurl, Wireshark, and Apache Arrow are already using c-ares. In the language interpreter, node.js seems to be using c-ares. I am honestly not sure about the compatibility of c-ares with `getaddrinfo(3)`. I guess there is no major incompatibility because I have not experienced any name resolution problem of curl. @akr (who is the author and maintainer of Ruby's socket library) suggested to check if OS-specific name resolution, e.g., WINS on Windows, NIS on Solaris, etc., is supported. He also said that it may be acceptable even if they are not supported. Whether to bundle c-ares source code with ruby would require further discussion. If this proposal is accepted, then c-ares will become a de facto essential dependency for practical use, like gmp, in my opinion. Incidentally, node.js bundles c-ares: https://github.com/nodejs/node/tree/main/deps/cares ### Alternative approaches Recent glibc provides `getaddrinfo_a(3)` which performs asynchronous name resolution. However, this function has a fatal problem of being incompatible with `fork(2)`, which is heavily used in the Ruby ecosystem. In fact, the attempt to use `getaddrinfo_a(3)` (#17134) has been revert because it fails rails tests. (#17220) Another alternative is to have a worker pthread inside Ruby that calls getaddrinfo(3). Instead of calling getaddrinfo(3) directly, `Addrinfo.getaddrinfo` would ask the worker to resolve a name and wait for a response. This method should be able to implement cancellation. (Simply put, this means reimplementation of getaddrinfo_a(3) on our own, taking into account of `fork(2).) This has the advantages: not adding dependencies on external libraries and not having compatibility issues with `getaddrinfo(3)`. However, it is considerably more difficult to implement and maintain. An internal pthread may have a non-trivial impact on the execution efficiency and memory usage. Also, we may need to implement a mechanism to dynamically change the number of workers depending on the load. It would be ideal if we could try and evaluate both approaches. But my current impression is that using c-ares is the quickest and best compromise. ## Contribution wanted I have made it up to the PoC, but don't have much time to complete this. @naruse suggested me to create a ticket asking for contributions. Is anyone interested in this? * This patch changes `rsock_getaddrinfo` to accept a timeout argument. There are several places where Qnil is passed as a timeout (where I add `// TODO` in the PoC). We need to consider what timeout we should pass. * This cares only `getaddrinfo`, but we also need to care `getnameinfo` (and something else if any). There may be some issues I'm not aware of. * I have not yet tested this PoC seriously. It would be great if we could evaluate it with some real apps. Also, it would be great to hear from someone who knows more about c-ares. -- https://bugs.ruby-lang.org/

Issue #19430 has been updated by kjtsanaktsidis (KJ Tsanaktsidis). I totally agree that this problem is worth solving, but moving away from platform-native libc-based DNS lookups will definitely have an impact on our use of Ruby (at Zendesk). I'd like to share a bit of information about our use-case. We have a development environment on MacOS based on Docker for Mac. We have a dnsmasq container running inside the Docker for Mac VM, and we configure the MacOS DNS resolver (via editing files in `/etc/resolver`) to forward queries for `.docker` (and some other domains, like `.consul` and `.zd-dev`) to the dnsmasq container (which is listening on a port forwarded to the host by Docker). This makes it possible to connect to things like `mysql.docker` etc from a console on macOS, including from Ruby scripts. As is, if c-ares is used for DNS resolution, then this kind of domain-specific DNS routing will stop working. c-ares will simply forward all queries to the DNS servers returned by `res_getservers` from `resolv.h`, and they won't know how to handle `.docker` etc. I'd be very curious to hear from other Docker for Mac users if they do something similar, or if our setup at Zendesk is just crazy. I can think of two ways to fix this problem: * Implement support for reading the DNS configuration out of the system configuration framework in c-ares, and using that information to implement per-domain DNS dispatch. * Implement some kind of external DNS resolver which just satisfies all queries by calling `gethostbyname(3)`, and point c-ares at that. This is essentially how systemd-resolved handles this problem; apps _can_ talk to resolved directly over its DBus API (or by using the `nss_resolve` NSS module). However, it also exposes a stub DNS resolver at `127.0.0.53:53` and publishes that in `/etc/resolv.conf`; apps which do their own DNS by looking for servers in `/etc/resolv.conf` will thus be sending their queries to resolved and get the same answers as everybody else. (A few Q's about the stub resolver - would it need to be a system or user service? Is it something the Ruby interpreter could fork off itself?) Anyway - I'm sharing this not to try and suggest we shouldn't do a switch to c-ares, but rather to see how we can keep some use-cases working while we do it :) ---------------------------------------- Feature #19430: Contribution wanted: DNS lookup by c-ares library https://bugs.ruby-lang.org/issues/19430#change-101799 * Author: mame (Yusuke Endoh) * Status: Open * Priority: Normal ---------------------------------------- ## Problem At the present time, Ruby uses `getaddrinfo(3)` to resolve names. Because this function is synchronous, we cannot interrupt the thread performing name resolution until the DNS server returns a response. We can see this behavior by setting blackhole.webpagetest.org (72.66.115.13) as a DNS server, which swallows all packets, and resolving any name: ``` # cat /etc/resolv.conf nameserver 72.66.115.13 # ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C^C^C^C ``` As we see, Ctrl+C does not stop ruby. The current workaround that users can take is to do name resolution in a Ruby thread. ```ruby Thread.new { Addrinfo.getaddrinfo("www.ruby-lang.org", 80) }.value ``` The thread that calls this code is interruptible. (Note that the newly created thread itself will be stuck until the DNS lookup exceeds the time out.) ## Proposal We can solve this problem by using c-ares, which is an asynchronous name resolver, as a backend of `Addrinfo.getaddrinfo`, etc. (@sorah told me about this library, thanks!) https://c-ares.org/ I have created a PoC patch. https://github.com/mame/ruby/commit/547806146993bbc25984011d423dcc0f913b211c By applying this patch, we can interrupt `Addrinfo.getaddrinfo` by Ctrl+C. ``` # cat /etc/resolv.conf nameserver 72.66.115.13 # ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C-e:1:in `getaddrinfo': Interrupt from -e:1:in `<main>' ``` ## Discussion ### About c-ares According to the site of c-ares, some major tools including libcurl, Wireshark, and Apache Arrow are already using c-ares. In the language interpreter, node.js seems to be using c-ares. I am honestly not sure about the compatibility of c-ares with `getaddrinfo(3)`. I guess there is no major incompatibility because I have not experienced any name resolution problem of curl. @akr (who is the author and maintainer of Ruby's socket library) suggested to check if OS-specific name resolution, e.g., WINS on Windows, NIS on Solaris, etc., is supported. He also said that it may be acceptable even if they are not supported. Whether to bundle c-ares source code with ruby would require further discussion. If this proposal is accepted, then c-ares will become a de facto essential dependency for practical use, like gmp, in my opinion. Incidentally, node.js bundles c-ares: https://github.com/nodejs/node/tree/main/deps/cares ### Alternative approaches Recent glibc provides `getaddrinfo_a(3)` which performs asynchronous name resolution. However, this function has a fatal problem of being incompatible with `fork(2)`, which is heavily used in the Ruby ecosystem. In fact, the attempt to use `getaddrinfo_a(3)` (#17134) has been revert because it fails rails tests. (#17220) Another alternative is to have a worker pthread inside Ruby that calls getaddrinfo(3). Instead of calling getaddrinfo(3) directly, `Addrinfo.getaddrinfo` would ask the worker to resolve a name and wait for a response. This method should be able to implement cancellation. (Simply put, this means reimplementation of getaddrinfo_a(3) on our own, taking into account of `fork(2).) This has the advantages: not adding dependencies on external libraries and not having compatibility issues with `getaddrinfo(3)`. However, it is considerably more difficult to implement and maintain. An internal pthread may have a non-trivial impact on the execution efficiency and memory usage. Also, we may need to implement a mechanism to dynamically change the number of workers depending on the load. It would be ideal if we could try and evaluate both approaches. But my current impression is that using c-ares is the quickest and best compromise. ## Contribution wanted I have made it up to the PoC, but don't have much time to complete this. @naruse suggested me to create a ticket asking for contributions. Is anyone interested in this? * This patch changes `rsock_getaddrinfo` to accept a timeout argument. There are several places where Qnil is passed as a timeout (where I add `// TODO` in the PoC). We need to consider what timeout we should pass. * This cares only `getaddrinfo`, but we also need to care `getnameinfo` (and something else if any). There may be some issues I'm not aware of. * I have not yet tested this PoC seriously. It would be great if we could evaluate it with some real apps. Also, it would be great to hear from someone who knows more about c-ares. -- https://bugs.ruby-lang.org/

Issue #19430 has been updated by mame (Yusuke Endoh). alanwu (Alan Wu) wrote in #note-1:
Here is a potential consideration for macOS. *snip*
kjtsanaktsidis (KJ Tsanaktsidis) wrote in #note-2:
We have a development environment on MacOS based on Docker for Mac. *snip*
Thanks for your comments. As far as a little research, it looks like c-ares supports /etc/resolver setting. The library uses libresolv to identify the nameserver on macOS and iOS. (Sorry if I'm lying, I haven't tried it myself.) https://github.com/c-ares/c-ares/issues/330 https://github.com/c-ares/c-ares/blob/38b30bc922c21faa156939bde15ea35332c30e... If my research is correct, we don't have to worry about the case. Considering that curl is so widely used, I believe that the common problems with the most commonly used operating systems have already been taken care of. ---------------------------------------- Feature #19430: Contribution wanted: DNS lookup by c-ares library https://bugs.ruby-lang.org/issues/19430#change-101811 * Author: mame (Yusuke Endoh) * Status: Open * Priority: Normal ---------------------------------------- ## Problem At the present time, Ruby uses `getaddrinfo(3)` to resolve names. Because this function is synchronous, we cannot interrupt the thread performing name resolution until the DNS server returns a response. We can see this behavior by setting blackhole.webpagetest.org (72.66.115.13) as a DNS server, which swallows all packets, and resolving any name: ``` # cat /etc/resolv.conf nameserver 72.66.115.13 # ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C^C^C^C ``` As we see, Ctrl+C does not stop ruby. The current workaround that users can take is to do name resolution in a Ruby thread. ```ruby Thread.new { Addrinfo.getaddrinfo("www.ruby-lang.org", 80) }.value ``` The thread that calls this code is interruptible. (Note that the newly created thread itself will be stuck until the DNS lookup exceeds the time out.) ## Proposal We can solve this problem by using c-ares, which is an asynchronous name resolver, as a backend of `Addrinfo.getaddrinfo`, etc. (@sorah told me about this library, thanks!) https://c-ares.org/ I have created a PoC patch. https://github.com/mame/ruby/commit/547806146993bbc25984011d423dcc0f913b211c By applying this patch, we can interrupt `Addrinfo.getaddrinfo` by Ctrl+C. ``` # cat /etc/resolv.conf nameserver 72.66.115.13 # ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C-e:1:in `getaddrinfo': Interrupt from -e:1:in `<main>' ``` ## Discussion ### About c-ares According to the site of c-ares, some major tools including libcurl, Wireshark, and Apache Arrow are already using c-ares. In the language interpreter, node.js seems to be using c-ares. I am honestly not sure about the compatibility of c-ares with `getaddrinfo(3)`. I guess there is no major incompatibility because I have not experienced any name resolution problem of curl. @akr (who is the author and maintainer of Ruby's socket library) suggested to check if OS-specific name resolution, e.g., WINS on Windows, NIS on Solaris, etc., is supported. He also said that it may be acceptable even if they are not supported. Whether to bundle c-ares source code with ruby would require further discussion. If this proposal is accepted, then c-ares will become a de facto essential dependency for practical use, like gmp, in my opinion. Incidentally, node.js bundles c-ares: https://github.com/nodejs/node/tree/main/deps/cares ### Alternative approaches Recent glibc provides `getaddrinfo_a(3)` which performs asynchronous name resolution. However, this function has a fatal problem of being incompatible with `fork(2)`, which is heavily used in the Ruby ecosystem. In fact, the attempt to use `getaddrinfo_a(3)` (#17134) has been revert because it fails rails tests. (#17220) Another alternative is to have a worker pthread inside Ruby that calls getaddrinfo(3). Instead of calling getaddrinfo(3) directly, `Addrinfo.getaddrinfo` would ask the worker to resolve a name and wait for a response. This method should be able to implement cancellation. (Simply put, this means reimplementation of getaddrinfo_a(3) on our own, taking into account of `fork(2).) This has the advantages: not adding dependencies on external libraries and not having compatibility issues with `getaddrinfo(3)`. However, it is considerably more difficult to implement and maintain. An internal pthread may have a non-trivial impact on the execution efficiency and memory usage. Also, we may need to implement a mechanism to dynamically change the number of workers depending on the load. It would be ideal if we could try and evaluate both approaches. But my current impression is that using c-ares is the quickest and best compromise. ## Contribution wanted I have made it up to the PoC, but don't have much time to complete this. @naruse suggested me to create a ticket asking for contributions. Is anyone interested in this? * This patch changes `rsock_getaddrinfo` to accept a timeout argument. There are several places where Qnil is passed as a timeout (where I add `// TODO` in the PoC). We need to consider what timeout we should pass. * This cares only `getaddrinfo`, but we also need to care `getnameinfo` (and something else if any). There may be some issues I'm not aware of. * I have not yet tested this PoC seriously. It would be great if we could evaluate it with some real apps. Also, it would be great to hear from someone who knows more about c-ares. -- https://bugs.ruby-lang.org/

Issue #19430 has been updated by kjtsanaktsidis (KJ Tsanaktsidis).
Thanks for your comments. As far as a little research, it looks like c-ares supports /etc/resolver setting. The library uses libresolv to identify the nameserver on macOS and iOS. (Sorry if I'm lying, I haven't tried it myself.)
Not quite, unfortunately. It does link against libresolv and call `res_getservers` to look up DNS servers, but that only returns the servers for the _first_ resolver. For example, if I run `scutil --dns` on my macbook, I get this: ``` % scutil --dns DNS configuration resolver #1 nameserver[0] : 1.1.1.1 nameserver[1] : 1.0.0.1 flags : Request A records reach : 0x00000002 (Reachable) resolver #2 domain : local options : mdns timeout : 5 flags : Request A records reach : 0x00000000 (Not Reachable) order : 300000 resolver #3 domain : 254.169.in-addr.arpa options : mdns timeout : 5 flags : Request A records reach : 0x00000000 (Not Reachable) order : 300200 resolver #4 domain : 8.e.f.ip6.arpa options : mdns timeout : 5 flags : Request A records reach : 0x00000000 (Not Reachable) order : 300400 resolver #5 domain : 9.e.f.ip6.arpa options : mdns timeout : 5 flags : Request A records reach : 0x00000000 (Not Reachable) order : 300600 resolver #6 domain : a.e.f.ip6.arpa options : mdns timeout : 5 flags : Request A records reach : 0x00000000 (Not Reachable) order : 300800 resolver #7 domain : b.e.f.ip6.arpa options : mdns timeout : 5 flags : Request A records reach : 0x00000000 (Not Reachable) order : 301000 resolver #8 domain : getacmeapp-dev.com nameserver[0] : 127.0.0.1 port : 1054 flags : Request A records, Request AAAA records reach : 0x00030002 (Reachable,Local Address,Directly Reachable Address) resolver #9 domain : zd-dev.com nameserver[0] : 127.0.0.1 port : 1054 flags : Request A records, Request AAAA records reach : 0x00030002 (Reachable,Local Address,Directly Reachable Address) resolver #10 domain : docker nameserver[0] : 127.0.0.1 port : 1054 flags : Request A records, Request AAAA records reach : 0x00030002 (Reachable,Local Address,Directly Reachable Address) resolver #11 domain : ob-dev.com nameserver[0] : 127.0.0.1 port : 1054 flags : Request A records, Request AAAA records reach : 0x00030002 (Reachable,Local Address,Directly Reachable Address) resolver #12 domain : consul nameserver[0] : 127.0.0.1 port : 1054 flags : Request A records, Request AAAA records reach : 0x00030002 (Reachable,Local Address,Directly Reachable Address) resolver #13 domain : bime-development.com nameserver[0] : 127.0.0.1 port : 1054 flags : Request A records, Request AAAA records reach : 0x00030002 (Reachable,Local Address,Directly Reachable Address) DNS configuration (for scoped queries) resolver #1 nameserver[0] : 1.1.1.1 nameserver[1] : 1.0.0.1 if_index : 15 (en0) flags : Scoped, Request A records reach : 0x00000002 (Reachable) ``` If I run `adig mysql.docker` (adig is c-ares's CLI dig tool) under lldb, and put a breakpoint [here](https://github.com/c-ares/c-ares/blob/38b30bc922c21faa156939bde15ea35332c30e..., I can see the `res` variable only contains `1.1.1.1` and `1.0.0.1` - i.e. the _first_ resolver from `scutil --dns`. It doens't contain any information about the other resolvers. Running `adig mysql.docker` therefore sends the query to cloudflare and of course it doesn't work.
Considering that curl is so widely used, I believe that the common problems with the most commonly used operating systems have already been taken care of.
Actually it seems whether or not curl uses c-ares depends on how it's configured. If it's configured with `--enable-ares --disable-threaded-resolver`, c-ares is used, and I can't make it resolve `.docker` etc domains. If it's configured with `--disable-ares --enable-threaded-resolver`, c-ares is not used and `.docker` resolution works. It seems that system curl on macOS and also homebrew curl both use the threaded resolver and not c-ares, which is why this works for us today.
Actually this is a good find - after the issue was closed, the original reporter said routing requests for particular domains to particular servers still didn't work. The maintainers then said:
The issue is that the configuration is actively trying to route different domains to different name servers. That's not a standardized thing. It appears to be a very mac-specific thing. It is not something c-ares has any concept of. You typically reach out to your configured nameservers to look up the entirety of your domains, not route different top levels to different servers.
We'd probably accept patches to add such support for MacOS, but it is definitely not on any development roadmap.
So it sounds like perhaps the way to go here for us at Zendesk is to contribute support for this into c-ares itself, hopefully before c-ares makes its way into Ruby :) ------------------------- Postscript: Some notes about mdns Another issue that came to mind while looking at the output of `scutil --dns` is mdns, which is commonly used to handle `.local` domains on e.g. home LANs. C-ares has no support for it, so on MacOS it would not be able to resolve such hsotnames. This would also the case on Linux systems using the [mdns nss module](https://github.com/lathiat/nss-mdns) to handle mdns directly from `getaddrinfo(3)`. However, the kind of Linux systems that use mdns these days I think would be more likely to be using `systemd-resolved` today, where c-ares will work (because its DNS queries will be sent to the resolved stub resolver at `127.0.0.53`, which will itself do the mdns query) I guess it might be possible to implement mdns inside c-ares too, maybe by linking against libavahi, but I haven't really looked into this. ---------------------------------------- Feature #19430: Contribution wanted: DNS lookup by c-ares library https://bugs.ruby-lang.org/issues/19430#change-101814 * Author: mame (Yusuke Endoh) * Status: Open * Priority: Normal ---------------------------------------- ## Problem At the present time, Ruby uses `getaddrinfo(3)` to resolve names. Because this function is synchronous, we cannot interrupt the thread performing name resolution until the DNS server returns a response. We can see this behavior by setting blackhole.webpagetest.org (72.66.115.13) as a DNS server, which swallows all packets, and resolving any name: ``` # cat /etc/resolv.conf nameserver 72.66.115.13 # ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C^C^C^C ``` As we see, Ctrl+C does not stop ruby. The current workaround that users can take is to do name resolution in a Ruby thread. ```ruby Thread.new { Addrinfo.getaddrinfo("www.ruby-lang.org", 80) }.value ``` The thread that calls this code is interruptible. (Note that the newly created thread itself will be stuck until the DNS lookup exceeds the time out.) ## Proposal We can solve this problem by using c-ares, which is an asynchronous name resolver, as a backend of `Addrinfo.getaddrinfo`, etc. (@sorah told me about this library, thanks!) https://c-ares.org/ I have created a PoC patch. https://github.com/mame/ruby/commit/547806146993bbc25984011d423dcc0f913b211c By applying this patch, we can interrupt `Addrinfo.getaddrinfo` by Ctrl+C. ``` # cat /etc/resolv.conf nameserver 72.66.115.13 # ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C-e:1:in `getaddrinfo': Interrupt from -e:1:in `<main>' ``` ## Discussion ### About c-ares According to the site of c-ares, some major tools including libcurl, Wireshark, and Apache Arrow are already using c-ares. In the language interpreter, node.js seems to be using c-ares. I am honestly not sure about the compatibility of c-ares with `getaddrinfo(3)`. I guess there is no major incompatibility because I have not experienced any name resolution problem of curl. @akr (who is the author and maintainer of Ruby's socket library) suggested to check if OS-specific name resolution, e.g., WINS on Windows, NIS on Solaris, etc., is supported. He also said that it may be acceptable even if they are not supported. Whether to bundle c-ares source code with ruby would require further discussion. If this proposal is accepted, then c-ares will become a de facto essential dependency for practical use, like gmp, in my opinion. Incidentally, node.js bundles c-ares: https://github.com/nodejs/node/tree/main/deps/cares ### Alternative approaches Recent glibc provides `getaddrinfo_a(3)` which performs asynchronous name resolution. However, this function has a fatal problem of being incompatible with `fork(2)`, which is heavily used in the Ruby ecosystem. In fact, the attempt to use `getaddrinfo_a(3)` (#17134) has been revert because it fails rails tests. (#17220) Another alternative is to have a worker pthread inside Ruby that calls getaddrinfo(3). Instead of calling getaddrinfo(3) directly, `Addrinfo.getaddrinfo` would ask the worker to resolve a name and wait for a response. This method should be able to implement cancellation. (Simply put, this means reimplementation of getaddrinfo_a(3) on our own, taking into account of `fork(2).) This has the advantages: not adding dependencies on external libraries and not having compatibility issues with `getaddrinfo(3)`. However, it is considerably more difficult to implement and maintain. An internal pthread may have a non-trivial impact on the execution efficiency and memory usage. Also, we may need to implement a mechanism to dynamically change the number of workers depending on the load. It would be ideal if we could try and evaluate both approaches. But my current impression is that using c-ares is the quickest and best compromise. ## Contribution wanted I have made it up to the PoC, but don't have much time to complete this. @naruse suggested me to create a ticket asking for contributions. Is anyone interested in this? * This patch changes `rsock_getaddrinfo` to accept a timeout argument. There are several places where Qnil is passed as a timeout (where I add `// TODO` in the PoC). We need to consider what timeout we should pass. * This cares only `getaddrinfo`, but we also need to care `getnameinfo` (and something else if any). There may be some issues I'm not aware of. * I have not yet tested this PoC seriously. It would be great if we could evaluate it with some real apps. Also, it would be great to hear from someone who knows more about c-ares. -- https://bugs.ruby-lang.org/

"mame (Yusuke Endoh) via ruby-core" <ruby-core@ml.ruby-lang.org> wrote:
Feature #19430: Contribution wanted: DNS lookup by c-ares library https://bugs.ruby-lang.org/issues/19430
### Alternative approaches
I prefer to avoid overhead from using threads for DNS lookup, so I dislike getaddrinfo_a || (getaddrinfo + pthread), too.
It would be ideal if we could try and evaluate both approaches. But my current impression is that using c-ares is the quickest and best compromise.
What about resolv.rb? I don't believe CPU/memory use in DNS lookup is a bottleneck for most users (network latency is); and prefer motivating improvements in Ruby VM + socket ext instead of linking against another C library.

Issue #19430 has been updated by mame (Yusuke Endoh). kjtsanaktsidis (KJ Tsanaktsidis) wrote in #note-4:
Actually it seems whether or not curl uses c-ares depends on how it's configured. If it's configured with `--enable-ares --disable-threaded-resolver`, c-ares is used, and I can't make it resolve `.docker` etc domains. If it's configured with `--disable-ares --enable-threaded-resolver`, c-ares is not used and `.docker` resolution works.
Hmmm, I see, thank you for your info. The name `--enable-threaded-resolver` sounds like exactly what I was suggesting as an alternative approach.
We'd probably accept patches to add such support for MacOS, but it is definitely not on any development roadmap.
So it sounds like perhaps the way to go here for us at Zendesk is to contribute support for this into c-ares itself, hopefully before c-ares makes its way into Ruby :)
By all means :-) In any way, the current resolver by getaddrinfo(3) should remain even if c-ares resolver is introduced. I think the problem of not being able to interrupt is only a problem in a production environment, so selecting the existing resolver is sufficient for macOS. ---------------------------------------- Feature #19430: Contribution wanted: DNS lookup by c-ares library https://bugs.ruby-lang.org/issues/19430#change-101816 * Author: mame (Yusuke Endoh) * Status: Open * Priority: Normal ---------------------------------------- ## Problem At the present time, Ruby uses `getaddrinfo(3)` to resolve names. Because this function is synchronous, we cannot interrupt the thread performing name resolution until the DNS server returns a response. We can see this behavior by setting blackhole.webpagetest.org (72.66.115.13) as a DNS server, which swallows all packets, and resolving any name: ``` # cat /etc/resolv.conf nameserver 72.66.115.13 # ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C^C^C^C ``` As we see, Ctrl+C does not stop ruby. The current workaround that users can take is to do name resolution in a Ruby thread. ```ruby Thread.new { Addrinfo.getaddrinfo("www.ruby-lang.org", 80) }.value ``` The thread that calls this code is interruptible. (Note that the newly created thread itself will be stuck until the DNS lookup exceeds the time out.) ## Proposal We can solve this problem by using c-ares, which is an asynchronous name resolver, as a backend of `Addrinfo.getaddrinfo`, etc. (@sorah told me about this library, thanks!) https://c-ares.org/ I have created a PoC patch. https://github.com/mame/ruby/commit/547806146993bbc25984011d423dcc0f913b211c By applying this patch, we can interrupt `Addrinfo.getaddrinfo` by Ctrl+C. ``` # cat /etc/resolv.conf nameserver 72.66.115.13 # ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C-e:1:in `getaddrinfo': Interrupt from -e:1:in `<main>' ``` ## Discussion ### About c-ares According to the site of c-ares, some major tools including libcurl, Wireshark, and Apache Arrow are already using c-ares. In the language interpreter, node.js seems to be using c-ares. I am honestly not sure about the compatibility of c-ares with `getaddrinfo(3)`. I guess there is no major incompatibility because I have not experienced any name resolution problem of curl. @akr (who is the author and maintainer of Ruby's socket library) suggested to check if OS-specific name resolution, e.g., WINS on Windows, NIS on Solaris, etc., is supported. He also said that it may be acceptable even if they are not supported. Whether to bundle c-ares source code with ruby would require further discussion. If this proposal is accepted, then c-ares will become a de facto essential dependency for practical use, like gmp, in my opinion. Incidentally, node.js bundles c-ares: https://github.com/nodejs/node/tree/main/deps/cares ### Alternative approaches Recent glibc provides `getaddrinfo_a(3)` which performs asynchronous name resolution. However, this function has a fatal problem of being incompatible with `fork(2)`, which is heavily used in the Ruby ecosystem. In fact, the attempt to use `getaddrinfo_a(3)` (#17134) has been revert because it fails rails tests. (#17220) Another alternative is to have a worker pthread inside Ruby that calls getaddrinfo(3). Instead of calling getaddrinfo(3) directly, `Addrinfo.getaddrinfo` would ask the worker to resolve a name and wait for a response. This method should be able to implement cancellation. (Simply put, this means reimplementation of getaddrinfo_a(3) on our own, taking into account of `fork(2).) This has the advantages: not adding dependencies on external libraries and not having compatibility issues with `getaddrinfo(3)`. However, it is considerably more difficult to implement and maintain. An internal pthread may have a non-trivial impact on the execution efficiency and memory usage. Also, we may need to implement a mechanism to dynamically change the number of workers depending on the load. It would be ideal if we could try and evaluate both approaches. But my current impression is that using c-ares is the quickest and best compromise. ## Contribution wanted I have made it up to the PoC, but don't have much time to complete this. @naruse suggested me to create a ticket asking for contributions. Is anyone interested in this? * This patch changes `rsock_getaddrinfo` to accept a timeout argument. There are several places where Qnil is passed as a timeout (where I add `// TODO` in the PoC). We need to consider what timeout we should pass. * This cares only `getaddrinfo`, but we also need to care `getnameinfo` (and something else if any). There may be some issues I'm not aware of. * I have not yet tested this PoC seriously. It would be great if we could evaluate it with some real apps. Also, it would be great to hear from someone who knows more about c-ares. -- https://bugs.ruby-lang.org/

Issue #19430 has been updated by mame (Yusuke Endoh). I copy and paste the email sent by Erig Wong. --- "mame (Yusuke Endoh) via ruby-core" <ruby-core@ml.ruby-lang.org> wrote:
Feature #19430: Contribution wanted: DNS lookup by c-ares library https://bugs.ruby-lang.org/issues/19430
### Alternative approaches
I prefer to avoid overhead from using threads for DNS lookup, so I dislike getaddrinfo_a || (getaddrinfo + pthread), too.
It would be ideal if we could try and evaluate both approaches. But my current impression is that using c-ares is the quickest and best compromise.
What about resolv.rb? I don't believe CPU/memory use in DNS lookup is a bottleneck for most users (network latency is); and prefer motivating improvements in Ruby VM + socket ext instead of linking against another C library. ---------------------------------------- Feature #19430: Contribution wanted: DNS lookup by c-ares library https://bugs.ruby-lang.org/issues/19430#change-101817 * Author: mame (Yusuke Endoh) * Status: Open * Priority: Normal ---------------------------------------- ## Problem At the present time, Ruby uses `getaddrinfo(3)` to resolve names. Because this function is synchronous, we cannot interrupt the thread performing name resolution until the DNS server returns a response. We can see this behavior by setting blackhole.webpagetest.org (72.66.115.13) as a DNS server, which swallows all packets, and resolving any name: ``` # cat /etc/resolv.conf nameserver 72.66.115.13 # ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C^C^C^C ``` As we see, Ctrl+C does not stop ruby. The current workaround that users can take is to do name resolution in a Ruby thread. ```ruby Thread.new { Addrinfo.getaddrinfo("www.ruby-lang.org", 80) }.value ``` The thread that calls this code is interruptible. (Note that the newly created thread itself will be stuck until the DNS lookup exceeds the time out.) ## Proposal We can solve this problem by using c-ares, which is an asynchronous name resolver, as a backend of `Addrinfo.getaddrinfo`, etc. (@sorah told me about this library, thanks!) https://c-ares.org/ I have created a PoC patch. https://github.com/mame/ruby/commit/547806146993bbc25984011d423dcc0f913b211c By applying this patch, we can interrupt `Addrinfo.getaddrinfo` by Ctrl+C. ``` # cat /etc/resolv.conf nameserver 72.66.115.13 # ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C-e:1:in `getaddrinfo': Interrupt from -e:1:in `<main>' ``` ## Discussion ### About c-ares According to the site of c-ares, some major tools including libcurl, Wireshark, and Apache Arrow are already using c-ares. In the language interpreter, node.js seems to be using c-ares. I am honestly not sure about the compatibility of c-ares with `getaddrinfo(3)`. I guess there is no major incompatibility because I have not experienced any name resolution problem of curl. @akr (who is the author and maintainer of Ruby's socket library) suggested to check if OS-specific name resolution, e.g., WINS on Windows, NIS on Solaris, etc., is supported. He also said that it may be acceptable even if they are not supported. Whether to bundle c-ares source code with ruby would require further discussion. If this proposal is accepted, then c-ares will become a de facto essential dependency for practical use, like gmp, in my opinion. Incidentally, node.js bundles c-ares: https://github.com/nodejs/node/tree/main/deps/cares ### Alternative approaches Recent glibc provides `getaddrinfo_a(3)` which performs asynchronous name resolution. However, this function has a fatal problem of being incompatible with `fork(2)`, which is heavily used in the Ruby ecosystem. In fact, the attempt to use `getaddrinfo_a(3)` (#17134) has been revert because it fails rails tests. (#17220) Another alternative is to have a worker pthread inside Ruby that calls getaddrinfo(3). Instead of calling getaddrinfo(3) directly, `Addrinfo.getaddrinfo` would ask the worker to resolve a name and wait for a response. This method should be able to implement cancellation. (Simply put, this means reimplementation of getaddrinfo_a(3) on our own, taking into account of `fork(2).) This has the advantages: not adding dependencies on external libraries and not having compatibility issues with `getaddrinfo(3)`. However, it is considerably more difficult to implement and maintain. An internal pthread may have a non-trivial impact on the execution efficiency and memory usage. Also, we may need to implement a mechanism to dynamically change the number of workers depending on the load. It would be ideal if we could try and evaluate both approaches. But my current impression is that using c-ares is the quickest and best compromise. ## Contribution wanted I have made it up to the PoC, but don't have much time to complete this. @naruse suggested me to create a ticket asking for contributions. Is anyone interested in this? * This patch changes `rsock_getaddrinfo` to accept a timeout argument. There are several places where Qnil is passed as a timeout (where I add `// TODO` in the PoC). We need to consider what timeout we should pass. * This cares only `getaddrinfo`, but we also need to care `getnameinfo` (and something else if any). There may be some issues I'm not aware of. * I have not yet tested this PoC seriously. It would be great if we could evaluate it with some real apps. Also, it would be great to hear from someone who knows more about c-ares. -- https://bugs.ruby-lang.org/

Issue #19430 has been updated by mame (Yusuke Endoh). Eric, thank you for your comment.
### Alternative approaches
I prefer to avoid overhead from using threads for DNS lookup, so I dislike getaddrinfo_a || (getaddrinfo + pthread), too.
I agree with you. I don't want to implement it in terms of maintenance either.
What about resolv.rb?
Before making this proposal, I talked about c-ares with @akr, the author of resolv.rb. While resolv.rb is still a workaround for this problem, akr-san did not specifically recommend it. I did not explicitly ask about the problem with resolv.rb, but from our previous conversation, it appeared that akr-san thought resolv.rb was overkill and not something he wanted to maintain very well. In fact resolv-replace.rb has the problem of not monkey-patching the recently introduced `Socket.tcp` (which is currently recommended by akr-san). I'll discuss this with him next time. ---------------------------------------- Feature #19430: Contribution wanted: DNS lookup by c-ares library https://bugs.ruby-lang.org/issues/19430#change-101818 * Author: mame (Yusuke Endoh) * Status: Open * Priority: Normal ---------------------------------------- ## Problem At the present time, Ruby uses `getaddrinfo(3)` to resolve names. Because this function is synchronous, we cannot interrupt the thread performing name resolution until the DNS server returns a response. We can see this behavior by setting blackhole.webpagetest.org (72.66.115.13) as a DNS server, which swallows all packets, and resolving any name: ``` # cat /etc/resolv.conf nameserver 72.66.115.13 # ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C^C^C^C ``` As we see, Ctrl+C does not stop ruby. The current workaround that users can take is to do name resolution in a Ruby thread. ```ruby Thread.new { Addrinfo.getaddrinfo("www.ruby-lang.org", 80) }.value ``` The thread that calls this code is interruptible. (Note that the newly created thread itself will be stuck until the DNS lookup exceeds the time out.) ## Proposal We can solve this problem by using c-ares, which is an asynchronous name resolver, as a backend of `Addrinfo.getaddrinfo`, etc. (@sorah told me about this library, thanks!) https://c-ares.org/ I have created a PoC patch. https://github.com/mame/ruby/commit/547806146993bbc25984011d423dcc0f913b211c By applying this patch, we can interrupt `Addrinfo.getaddrinfo` by Ctrl+C. ``` # cat /etc/resolv.conf nameserver 72.66.115.13 # ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C-e:1:in `getaddrinfo': Interrupt from -e:1:in `<main>' ``` ## Discussion ### About c-ares According to the site of c-ares, some major tools including libcurl, Wireshark, and Apache Arrow are already using c-ares. In the language interpreter, node.js seems to be using c-ares. I am honestly not sure about the compatibility of c-ares with `getaddrinfo(3)`. I guess there is no major incompatibility because I have not experienced any name resolution problem of curl. @akr (who is the author and maintainer of Ruby's socket library) suggested to check if OS-specific name resolution, e.g., WINS on Windows, NIS on Solaris, etc., is supported. He also said that it may be acceptable even if they are not supported. Whether to bundle c-ares source code with ruby would require further discussion. If this proposal is accepted, then c-ares will become a de facto essential dependency for practical use, like gmp, in my opinion. Incidentally, node.js bundles c-ares: https://github.com/nodejs/node/tree/main/deps/cares ### Alternative approaches Recent glibc provides `getaddrinfo_a(3)` which performs asynchronous name resolution. However, this function has a fatal problem of being incompatible with `fork(2)`, which is heavily used in the Ruby ecosystem. In fact, the attempt to use `getaddrinfo_a(3)` (#17134) has been revert because it fails rails tests. (#17220) Another alternative is to have a worker pthread inside Ruby that calls getaddrinfo(3). Instead of calling getaddrinfo(3) directly, `Addrinfo.getaddrinfo` would ask the worker to resolve a name and wait for a response. This method should be able to implement cancellation. (Simply put, this means reimplementation of getaddrinfo_a(3) on our own, taking into account of `fork(2).) This has the advantages: not adding dependencies on external libraries and not having compatibility issues with `getaddrinfo(3)`. However, it is considerably more difficult to implement and maintain. An internal pthread may have a non-trivial impact on the execution efficiency and memory usage. Also, we may need to implement a mechanism to dynamically change the number of workers depending on the load. It would be ideal if we could try and evaluate both approaches. But my current impression is that using c-ares is the quickest and best compromise. ## Contribution wanted I have made it up to the PoC, but don't have much time to complete this. @naruse suggested me to create a ticket asking for contributions. Is anyone interested in this? * This patch changes `rsock_getaddrinfo` to accept a timeout argument. There are several places where Qnil is passed as a timeout (where I add `// TODO` in the PoC). We need to consider what timeout we should pass. * This cares only `getaddrinfo`, but we also need to care `getnameinfo` (and something else if any). There may be some issues I'm not aware of. * I have not yet tested this PoC seriously. It would be great if we could evaluate it with some real apps. Also, it would be great to hear from someone who knows more about c-ares. -- https://bugs.ruby-lang.org/

Issue #19430 has been updated by kjtsanaktsidis (KJ Tsanaktsidis).
I think the problem of not being able to interrupt is only a problem in a production environment
I think this could simplify things around Fibers as well. Naively, if a fiber makes a call to `getaddrinfo(3)`, it would block the whole thread and stop any other fibers from running while the DNS lookup is happening. It seems a hook was implemented in the fiber scheduler interface to allow the fiber scheduler to intercept the call and do something non-blocking instead - https://bugs.ruby-lang.org/issues/17370. If Ruby's built-in DNS resolver was asynchronous, we could get rid of this hook and avoid fiber schedulers needing to bring their own async dns resolution support. ---------------------------------------- Feature #19430: Contribution wanted: DNS lookup by c-ares library https://bugs.ruby-lang.org/issues/19430#change-101827 * Author: mame (Yusuke Endoh) * Status: Open * Priority: Normal ---------------------------------------- ## Problem At the present time, Ruby uses `getaddrinfo(3)` to resolve names. Because this function is synchronous, we cannot interrupt the thread performing name resolution until the DNS server returns a response. We can see this behavior by setting blackhole.webpagetest.org (72.66.115.13) as a DNS server, which swallows all packets, and resolving any name: ``` # cat /etc/resolv.conf nameserver 72.66.115.13 # ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C^C^C^C ``` As we see, Ctrl+C does not stop ruby. The current workaround that users can take is to do name resolution in a Ruby thread. ```ruby Thread.new { Addrinfo.getaddrinfo("www.ruby-lang.org", 80) }.value ``` The thread that calls this code is interruptible. (Note that the newly created thread itself will be stuck until the DNS lookup exceeds the time out.) ## Proposal We can solve this problem by using c-ares, which is an asynchronous name resolver, as a backend of `Addrinfo.getaddrinfo`, etc. (@sorah told me about this library, thanks!) https://c-ares.org/ I have created a PoC patch. https://github.com/mame/ruby/commit/547806146993bbc25984011d423dcc0f913b211c By applying this patch, we can interrupt `Addrinfo.getaddrinfo` by Ctrl+C. ``` # cat /etc/resolv.conf nameserver 72.66.115.13 # ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C-e:1:in `getaddrinfo': Interrupt from -e:1:in `<main>' ``` ## Discussion ### About c-ares According to the site of c-ares, some major tools including libcurl, Wireshark, and Apache Arrow are already using c-ares. In the language interpreter, node.js seems to be using c-ares. I am honestly not sure about the compatibility of c-ares with `getaddrinfo(3)`. I guess there is no major incompatibility because I have not experienced any name resolution problem of curl. @akr (who is the author and maintainer of Ruby's socket library) suggested to check if OS-specific name resolution, e.g., WINS on Windows, NIS on Solaris, etc., is supported. He also said that it may be acceptable even if they are not supported. Whether to bundle c-ares source code with ruby would require further discussion. If this proposal is accepted, then c-ares will become a de facto essential dependency for practical use, like gmp, in my opinion. Incidentally, node.js bundles c-ares: https://github.com/nodejs/node/tree/main/deps/cares ### Alternative approaches Recent glibc provides `getaddrinfo_a(3)` which performs asynchronous name resolution. However, this function has a fatal problem of being incompatible with `fork(2)`, which is heavily used in the Ruby ecosystem. In fact, the attempt to use `getaddrinfo_a(3)` (#17134) has been revert because it fails rails tests. (#17220) Another alternative is to have a worker pthread inside Ruby that calls getaddrinfo(3). Instead of calling getaddrinfo(3) directly, `Addrinfo.getaddrinfo` would ask the worker to resolve a name and wait for a response. This method should be able to implement cancellation. (Simply put, this means reimplementation of getaddrinfo_a(3) on our own, taking into account of `fork(2).) This has the advantages: not adding dependencies on external libraries and not having compatibility issues with `getaddrinfo(3)`. However, it is considerably more difficult to implement and maintain. An internal pthread may have a non-trivial impact on the execution efficiency and memory usage. Also, we may need to implement a mechanism to dynamically change the number of workers depending on the load. It would be ideal if we could try and evaluate both approaches. But my current impression is that using c-ares is the quickest and best compromise. ## Contribution wanted I have made it up to the PoC, but don't have much time to complete this. @naruse suggested me to create a ticket asking for contributions. Is anyone interested in this? * This patch changes `rsock_getaddrinfo` to accept a timeout argument. There are several places where Qnil is passed as a timeout (where I add `// TODO` in the PoC). We need to consider what timeout we should pass. * This cares only `getaddrinfo`, but we also need to care `getnameinfo` (and something else if any). There may be some issues I'm not aware of. * I have not yet tested this PoC seriously. It would be great if we could evaluate it with some real apps. Also, it would be great to hear from someone who knows more about c-ares. -- https://bugs.ruby-lang.org/

Issue #19430 has been updated by akr (Akira Tanaka). mame (Yusuke Endoh) wrote in #note-7:
What about resolv.rb?
Before making this proposal, I talked about c-ares with @akr, the author of resolv.rb. While resolv.rb is still a workaround for this problem, akr-san did not specifically recommend it. I did not explicitly ask about the problem with resolv.rb, but from our previous conversation, it appeared that akr-san thought resolv.rb was overkill and not something he wanted to maintain very well. In fact resolv-replace.rb has the problem of not monkey-patching the recently introduced `Socket.tcp` (which is currently recommended by akr-san). I'll discuss this with him next time.
I hadn't thought of resolv.rb at the meeting. I guess that it is because I have less interest in resolv.rb now. My motivation for resolv.rb is solving whole-process blocking at Ruby 1.8. It is solved by native threading at Ruby 1.9 and releasing GVL around getaddrinfo invocation. However, I agree that resolv.rb can be another choice for an interruptible resolver. resolv.rb supports only /etc/hosts and DNS. Thus, it has OS-specific name resolution problems same as c-ares. Also, I'm not sure that performance is good enough. It would be nice that resolv.rb provides getaddrinfo-compatible function and Ruby provides cleaner way to choose getaddrinfo implementation. It makes resolv-replace.rb much simpler. However, we need an asynchronous DNS query API to implement Happy Eye Ball without threading. getaddrinfo function is not enough for that. It seems the scope of this issue doesn't contain it but we need it someday, I guess. ---------------------------------------- Feature #19430: Contribution wanted: DNS lookup by c-ares library https://bugs.ruby-lang.org/issues/19430#change-101868 * Author: mame (Yusuke Endoh) * Status: Open * Priority: Normal ---------------------------------------- ## Problem At the present time, Ruby uses `getaddrinfo(3)` to resolve names. Because this function is synchronous, we cannot interrupt the thread performing name resolution until the DNS server returns a response. We can see this behavior by setting blackhole.webpagetest.org (72.66.115.13) as a DNS server, which swallows all packets, and resolving any name: ``` # cat /etc/resolv.conf nameserver 72.66.115.13 # ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C^C^C^C ``` As we see, Ctrl+C does not stop ruby. The current workaround that users can take is to do name resolution in a Ruby thread. ```ruby Thread.new { Addrinfo.getaddrinfo("www.ruby-lang.org", 80) }.value ``` The thread that calls this code is interruptible. (Note that the newly created thread itself will be stuck until the DNS lookup exceeds the time out.) ## Proposal We can solve this problem by using c-ares, which is an asynchronous name resolver, as a backend of `Addrinfo.getaddrinfo`, etc. (@sorah told me about this library, thanks!) https://c-ares.org/ I have created a PoC patch. https://github.com/mame/ruby/commit/547806146993bbc25984011d423dcc0f913b211c By applying this patch, we can interrupt `Addrinfo.getaddrinfo` by Ctrl+C. ``` # cat /etc/resolv.conf nameserver 72.66.115.13 # ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C-e:1:in `getaddrinfo': Interrupt from -e:1:in `<main>' ``` ## Discussion ### About c-ares According to the site of c-ares, some major tools including libcurl, Wireshark, and Apache Arrow are already using c-ares. In the language interpreter, node.js seems to be using c-ares. I am honestly not sure about the compatibility of c-ares with `getaddrinfo(3)`. I guess there is no major incompatibility because I have not experienced any name resolution problem of curl. @akr (who is the author and maintainer of Ruby's socket library) suggested to check if OS-specific name resolution, e.g., WINS on Windows, NIS on Solaris, etc., is supported. He also said that it may be acceptable even if they are not supported. Whether to bundle c-ares source code with ruby would require further discussion. If this proposal is accepted, then c-ares will become a de facto essential dependency for practical use, like gmp, in my opinion. Incidentally, node.js bundles c-ares: https://github.com/nodejs/node/tree/main/deps/cares ### Alternative approaches Recent glibc provides `getaddrinfo_a(3)` which performs asynchronous name resolution. However, this function has a fatal problem of being incompatible with `fork(2)`, which is heavily used in the Ruby ecosystem. In fact, the attempt to use `getaddrinfo_a(3)` (#17134) has been revert because it fails rails tests. (#17220) Another alternative is to have a worker pthread inside Ruby that calls getaddrinfo(3). Instead of calling getaddrinfo(3) directly, `Addrinfo.getaddrinfo` would ask the worker to resolve a name and wait for a response. This method should be able to implement cancellation. (Simply put, this means reimplementation of getaddrinfo_a(3) on our own, taking into account of `fork(2).) This has the advantages: not adding dependencies on external libraries and not having compatibility issues with `getaddrinfo(3)`. However, it is considerably more difficult to implement and maintain. An internal pthread may have a non-trivial impact on the execution efficiency and memory usage. Also, we may need to implement a mechanism to dynamically change the number of workers depending on the load. It would be ideal if we could try and evaluate both approaches. But my current impression is that using c-ares is the quickest and best compromise. ## Contribution wanted I have made it up to the PoC, but don't have much time to complete this. @naruse suggested me to create a ticket asking for contributions. Is anyone interested in this? * This patch changes `rsock_getaddrinfo` to accept a timeout argument. There are several places where Qnil is passed as a timeout (where I add `// TODO` in the PoC). We need to consider what timeout we should pass. * This cares only `getaddrinfo`, but we also need to care `getnameinfo` (and something else if any). There may be some issues I'm not aware of. * I have not yet tested this PoC seriously. It would be great if we could evaluate it with some real apps. Also, it would be great to hear from someone who knows more about c-ares. -- https://bugs.ruby-lang.org/

Issue #19430 has been updated by mame (Yusuke Endoh). @akr, thank you for your comment. But sorry, I failed to get your point. As you said, the current resolv.rb has the similar pros/cons as my proposal. (Pro: the name resolution will be interruptible. Con: no OS-specific resolution support.) I don't think simplifying resolv-replace.rb will solve the problem, so what is the purpose? Do you plan to support OS-specific resolution in resolv.rb? If so, it will be great. ---------------------------------------- Feature #19430: Contribution wanted: DNS lookup by c-ares library https://bugs.ruby-lang.org/issues/19430#change-101873 * Author: mame (Yusuke Endoh) * Status: Open * Priority: Normal ---------------------------------------- ## Problem At the present time, Ruby uses `getaddrinfo(3)` to resolve names. Because this function is synchronous, we cannot interrupt the thread performing name resolution until the DNS server returns a response. We can see this behavior by setting blackhole.webpagetest.org (72.66.115.13) as a DNS server, which swallows all packets, and resolving any name: ``` # cat /etc/resolv.conf nameserver 72.66.115.13 # ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C^C^C^C ``` As we see, Ctrl+C does not stop ruby. The current workaround that users can take is to do name resolution in a Ruby thread. ```ruby Thread.new { Addrinfo.getaddrinfo("www.ruby-lang.org", 80) }.value ``` The thread that calls this code is interruptible. (Note that the newly created thread itself will be stuck until the DNS lookup exceeds the time out.) ## Proposal We can solve this problem by using c-ares, which is an asynchronous name resolver, as a backend of `Addrinfo.getaddrinfo`, etc. (@sorah told me about this library, thanks!) https://c-ares.org/ I have created a PoC patch. https://github.com/mame/ruby/commit/547806146993bbc25984011d423dcc0f913b211c By applying this patch, we can interrupt `Addrinfo.getaddrinfo` by Ctrl+C. ``` # cat /etc/resolv.conf nameserver 72.66.115.13 # ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C-e:1:in `getaddrinfo': Interrupt from -e:1:in `<main>' ``` ## Discussion ### About c-ares According to the site of c-ares, some major tools including libcurl, Wireshark, and Apache Arrow are already using c-ares. In the language interpreter, node.js seems to be using c-ares. I am honestly not sure about the compatibility of c-ares with `getaddrinfo(3)`. I guess there is no major incompatibility because I have not experienced any name resolution problem of curl. @akr (who is the author and maintainer of Ruby's socket library) suggested to check if OS-specific name resolution, e.g., WINS on Windows, NIS on Solaris, etc., is supported. He also said that it may be acceptable even if they are not supported. Whether to bundle c-ares source code with ruby would require further discussion. If this proposal is accepted, then c-ares will become a de facto essential dependency for practical use, like gmp, in my opinion. Incidentally, node.js bundles c-ares: https://github.com/nodejs/node/tree/main/deps/cares ### Alternative approaches Recent glibc provides `getaddrinfo_a(3)` which performs asynchronous name resolution. However, this function has a fatal problem of being incompatible with `fork(2)`, which is heavily used in the Ruby ecosystem. In fact, the attempt to use `getaddrinfo_a(3)` (#17134) has been revert because it fails rails tests. (#17220) Another alternative is to have a worker pthread inside Ruby that calls getaddrinfo(3). Instead of calling getaddrinfo(3) directly, `Addrinfo.getaddrinfo` would ask the worker to resolve a name and wait for a response. This method should be able to implement cancellation. (Simply put, this means reimplementation of getaddrinfo_a(3) on our own, taking into account of `fork(2).) This has the advantages: not adding dependencies on external libraries and not having compatibility issues with `getaddrinfo(3)`. However, it is considerably more difficult to implement and maintain. An internal pthread may have a non-trivial impact on the execution efficiency and memory usage. Also, we may need to implement a mechanism to dynamically change the number of workers depending on the load. It would be ideal if we could try and evaluate both approaches. But my current impression is that using c-ares is the quickest and best compromise. ## Contribution wanted I have made it up to the PoC, but don't have much time to complete this. @naruse suggested me to create a ticket asking for contributions. Is anyone interested in this? * This patch changes `rsock_getaddrinfo` to accept a timeout argument. There are several places where Qnil is passed as a timeout (where I add `// TODO` in the PoC). We need to consider what timeout we should pass. * This cares only `getaddrinfo`, but we also need to care `getnameinfo` (and something else if any). There may be some issues I'm not aware of. * I have not yet tested this PoC seriously. It would be great if we could evaluate it with some real apps. Also, it would be great to hear from someone who knows more about c-ares. -- https://bugs.ruby-lang.org/

Issue #19430 has been updated by akr (Akira Tanaka). mame (Yusuke Endoh) wrote in #note-10:
@akr, thank you for your comment. But sorry, I failed to get your point.
As you said, the current resolv.rb has the similar pros/cons as my proposal. (Pro: the name resolution will be interruptible. Con: no OS-specific resolution support.)
The benefit of resolv.rb is that it doesn't need new external library, as pointed by Eric Wong.
I don't think simplifying resolv-replace.rb will solve the problem, so what is the purpose?
It is a different issue. I guessed that if Ruby supports OS's getaddrinfo and c-ares, some switching mechanism will be introduced and it may simplify resolv-replace.rb
Do you plan to support OS-specific resolution in resolv.rb? If so, it will be great.
No. I have no such plan. ---------------------------------------- Feature #19430: Contribution wanted: DNS lookup by c-ares library https://bugs.ruby-lang.org/issues/19430#change-101876 * Author: mame (Yusuke Endoh) * Status: Open * Priority: Normal ---------------------------------------- ## Problem At the present time, Ruby uses `getaddrinfo(3)` to resolve names. Because this function is synchronous, we cannot interrupt the thread performing name resolution until the DNS server returns a response. We can see this behavior by setting blackhole.webpagetest.org (72.66.115.13) as a DNS server, which swallows all packets, and resolving any name: ``` # cat /etc/resolv.conf nameserver 72.66.115.13 # ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C^C^C^C ``` As we see, Ctrl+C does not stop ruby. The current workaround that users can take is to do name resolution in a Ruby thread. ```ruby Thread.new { Addrinfo.getaddrinfo("www.ruby-lang.org", 80) }.value ``` The thread that calls this code is interruptible. (Note that the newly created thread itself will be stuck until the DNS lookup exceeds the time out.) ## Proposal We can solve this problem by using c-ares, which is an asynchronous name resolver, as a backend of `Addrinfo.getaddrinfo`, etc. (@sorah told me about this library, thanks!) https://c-ares.org/ I have created a PoC patch. https://github.com/mame/ruby/commit/547806146993bbc25984011d423dcc0f913b211c By applying this patch, we can interrupt `Addrinfo.getaddrinfo` by Ctrl+C. ``` # cat /etc/resolv.conf nameserver 72.66.115.13 # ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C-e:1:in `getaddrinfo': Interrupt from -e:1:in `<main>' ``` ## Discussion ### About c-ares According to the site of c-ares, some major tools including libcurl, Wireshark, and Apache Arrow are already using c-ares. In the language interpreter, node.js seems to be using c-ares. I am honestly not sure about the compatibility of c-ares with `getaddrinfo(3)`. I guess there is no major incompatibility because I have not experienced any name resolution problem of curl. @akr (who is the author and maintainer of Ruby's socket library) suggested to check if OS-specific name resolution, e.g., WINS on Windows, NIS on Solaris, etc., is supported. He also said that it may be acceptable even if they are not supported. Whether to bundle c-ares source code with ruby would require further discussion. If this proposal is accepted, then c-ares will become a de facto essential dependency for practical use, like gmp, in my opinion. Incidentally, node.js bundles c-ares: https://github.com/nodejs/node/tree/main/deps/cares ### Alternative approaches Recent glibc provides `getaddrinfo_a(3)` which performs asynchronous name resolution. However, this function has a fatal problem of being incompatible with `fork(2)`, which is heavily used in the Ruby ecosystem. In fact, the attempt to use `getaddrinfo_a(3)` (#17134) has been revert because it fails rails tests. (#17220) Another alternative is to have a worker pthread inside Ruby that calls getaddrinfo(3). Instead of calling getaddrinfo(3) directly, `Addrinfo.getaddrinfo` would ask the worker to resolve a name and wait for a response. This method should be able to implement cancellation. (Simply put, this means reimplementation of getaddrinfo_a(3) on our own, taking into account of `fork(2).) This has the advantages: not adding dependencies on external libraries and not having compatibility issues with `getaddrinfo(3)`. However, it is considerably more difficult to implement and maintain. An internal pthread may have a non-trivial impact on the execution efficiency and memory usage. Also, we may need to implement a mechanism to dynamically change the number of workers depending on the load. It would be ideal if we could try and evaluate both approaches. But my current impression is that using c-ares is the quickest and best compromise. ## Contribution wanted I have made it up to the PoC, but don't have much time to complete this. @naruse suggested me to create a ticket asking for contributions. Is anyone interested in this? * This patch changes `rsock_getaddrinfo` to accept a timeout argument. There are several places where Qnil is passed as a timeout (where I add `// TODO` in the PoC). We need to consider what timeout we should pass. * This cares only `getaddrinfo`, but we also need to care `getnameinfo` (and something else if any). There may be some issues I'm not aware of. * I have not yet tested this PoC seriously. It would be great if we could evaluate it with some real apps. Also, it would be great to hear from someone who knows more about c-ares. -- https://bugs.ruby-lang.org/

Issue #19430 has been updated by kzys (Kazuyoshi Kato). Hello. I'm interested to take a look, but new to Ruby's development. Is it acceptable to make the PoC merge-able first and work on macOS and per-platform resolvers later? Not really sure changing the c-ares for macOS (as below) is more straightforward, compared to pthread + getaddrinfo approach. The pthread takes more resources (namely threads), but works as a nice fallback for other OSes. https://github.com/c-ares/c-ares/issues/330#issuecomment-655238093
We'd probably accept patches to add such support for MacOS, but it is definitely not on any development roadmap.
---------------------------------------- Feature #19430: Contribution wanted: DNS lookup by c-ares library https://bugs.ruby-lang.org/issues/19430#change-102680 * Author: mame (Yusuke Endoh) * Status: Open * Priority: Normal ---------------------------------------- ## Problem At the present time, Ruby uses `getaddrinfo(3)` to resolve names. Because this function is synchronous, we cannot interrupt the thread performing name resolution until the DNS server returns a response. We can see this behavior by setting blackhole.webpagetest.org (72.66.115.13) as a DNS server, which swallows all packets, and resolving any name: ``` # cat /etc/resolv.conf nameserver 72.66.115.13 # ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C^C^C^C ``` As we see, Ctrl+C does not stop ruby. The current workaround that users can take is to do name resolution in a Ruby thread. ```ruby Thread.new { Addrinfo.getaddrinfo("www.ruby-lang.org", 80) }.value ``` The thread that calls this code is interruptible. (Note that the newly created thread itself will be stuck until the DNS lookup exceeds the time out.) ## Proposal We can solve this problem by using c-ares, which is an asynchronous name resolver, as a backend of `Addrinfo.getaddrinfo`, etc. (@sorah told me about this library, thanks!) https://c-ares.org/ I have created a PoC patch. https://github.com/mame/ruby/commit/547806146993bbc25984011d423dcc0f913b211c By applying this patch, we can interrupt `Addrinfo.getaddrinfo` by Ctrl+C. ``` # cat /etc/resolv.conf nameserver 72.66.115.13 # ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C-e:1:in `getaddrinfo': Interrupt from -e:1:in `<main>' ``` ## Discussion ### About c-ares According to the site of c-ares, some major tools including libcurl, Wireshark, and Apache Arrow are already using c-ares. In the language interpreter, node.js seems to be using c-ares. I am honestly not sure about the compatibility of c-ares with `getaddrinfo(3)`. I guess there is no major incompatibility because I have not experienced any name resolution problem of curl. @akr (who is the author and maintainer of Ruby's socket library) suggested to check if OS-specific name resolution, e.g., WINS on Windows, NIS on Solaris, etc., is supported. He also said that it may be acceptable even if they are not supported. Whether to bundle c-ares source code with ruby would require further discussion. If this proposal is accepted, then c-ares will become a de facto essential dependency for practical use, like gmp, in my opinion. Incidentally, node.js bundles c-ares: https://github.com/nodejs/node/tree/main/deps/cares ### Alternative approaches Recent glibc provides `getaddrinfo_a(3)` which performs asynchronous name resolution. However, this function has a fatal problem of being incompatible with `fork(2)`, which is heavily used in the Ruby ecosystem. In fact, the attempt to use `getaddrinfo_a(3)` (#17134) has been revert because it fails rails tests. (#17220) Another alternative is to have a worker pthread inside Ruby that calls getaddrinfo(3). Instead of calling getaddrinfo(3) directly, `Addrinfo.getaddrinfo` would ask the worker to resolve a name and wait for a response. This method should be able to implement cancellation. (Simply put, this means reimplementation of getaddrinfo_a(3) on our own, taking into account of `fork(2).) This has the advantages: not adding dependencies on external libraries and not having compatibility issues with `getaddrinfo(3)`. However, it is considerably more difficult to implement and maintain. An internal pthread may have a non-trivial impact on the execution efficiency and memory usage. Also, we may need to implement a mechanism to dynamically change the number of workers depending on the load. It would be ideal if we could try and evaluate both approaches. But my current impression is that using c-ares is the quickest and best compromise. ## Contribution wanted I have made it up to the PoC, but don't have much time to complete this. @naruse suggested me to create a ticket asking for contributions. Is anyone interested in this? * This patch changes `rsock_getaddrinfo` to accept a timeout argument. There are several places where Qnil is passed as a timeout (where I add `// TODO` in the PoC). We need to consider what timeout we should pass. * This cares only `getaddrinfo`, but we also need to care `getnameinfo` (and something else if any). There may be some issues I'm not aware of. * I have not yet tested this PoC seriously. It would be great if we could evaluate it with some real apps. Also, it would be great to hear from someone who knows more about c-ares. -- https://bugs.ruby-lang.org/

Issue #19430 has been updated by mame (Yusuke Endoh). I've prototyped a patch that creates a temporal pthread whenever `Addrinfo.getaddrinfo` is called, and runs `getaddrinfo(3)` in the pthread. https://gist.github.com/mame/d7b4d8eaa3b0c3016838c33eea752cbb (This patch is incomplete; there are places where `getnameinfo` is called directly, and those may need to be wrapped by a pthread as well.) The upside is that it uses `getaddrinfo(3)`, so OS-dependent name resolution is respected. The downside is that the name resolution is about 2.5 times slower. Before patch: ``` $ time . /local/bin/ruby -rsocket -e '10000.times { Socket.getaddrinfo("www.ruby-lang.org", "http") }' real 0m3.503s user 0m1.283s sys 0m0.901s ``` After patch: ``` $ time . /local/bin/ruby -rsocket -e '10000.times { Socket.getaddrinfo("www.ruby-lang.org", "http") }' real 0m8.526s user 0m2.267s sys 0m4.766s ``` Name resolution takes very little time compared to HTTP communication, so I guess this is usually not a problem for typical HTTP connection. But it would have an impact on an application that does so many name resolutions. I wonder if this is acceptable. @naruse says that a configure option to make c-ares selectable is a good way. ---------------------------------------- Feature #19430: Contribution wanted: DNS lookup by c-ares library https://bugs.ruby-lang.org/issues/19430#change-104819 * Author: mame (Yusuke Endoh) * Status: Open * Priority: Normal ---------------------------------------- ## Problem At the present time, Ruby uses `getaddrinfo(3)` to resolve names. Because this function is synchronous, we cannot interrupt the thread performing name resolution until the DNS server returns a response. We can see this behavior by setting blackhole.webpagetest.org (72.66.115.13) as a DNS server, which swallows all packets, and resolving any name: ``` # cat /etc/resolv.conf nameserver 72.66.115.13 # ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C^C^C^C ``` As we see, Ctrl+C does not stop ruby. The current workaround that users can take is to do name resolution in a Ruby thread. ```ruby Thread.new { Addrinfo.getaddrinfo("www.ruby-lang.org", 80) }.value ``` The thread that calls this code is interruptible. (Note that the newly created thread itself will be stuck until the DNS lookup exceeds the time out.) ## Proposal We can solve this problem by using c-ares, which is an asynchronous name resolver, as a backend of `Addrinfo.getaddrinfo`, etc. (@sorah told me about this library, thanks!) https://c-ares.org/ I have created a PoC patch. https://github.com/mame/ruby/commit/547806146993bbc25984011d423dcc0f913b211c By applying this patch, we can interrupt `Addrinfo.getaddrinfo` by Ctrl+C. ``` # cat /etc/resolv.conf nameserver 72.66.115.13 # ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C-e:1:in `getaddrinfo': Interrupt from -e:1:in `<main>' ``` ## Discussion ### About c-ares According to the site of c-ares, some major tools including libcurl, Wireshark, and Apache Arrow are already using c-ares. In the language interpreter, node.js seems to be using c-ares. I am honestly not sure about the compatibility of c-ares with `getaddrinfo(3)`. I guess there is no major incompatibility because I have not experienced any name resolution problem of curl. @akr (who is the author and maintainer of Ruby's socket library) suggested to check if OS-specific name resolution, e.g., WINS on Windows, NIS on Solaris, etc., is supported. He also said that it may be acceptable even if they are not supported. Whether to bundle c-ares source code with ruby would require further discussion. If this proposal is accepted, then c-ares will become a de facto essential dependency for practical use, like gmp, in my opinion. Incidentally, node.js bundles c-ares: https://github.com/nodejs/node/tree/main/deps/cares ### Alternative approaches Recent glibc provides `getaddrinfo_a(3)` which performs asynchronous name resolution. However, this function has a fatal problem of being incompatible with `fork(2)`, which is heavily used in the Ruby ecosystem. In fact, the attempt to use `getaddrinfo_a(3)` (#17134) has been revert because it fails rails tests. (#17220) Another alternative is to have a worker pthread inside Ruby that calls getaddrinfo(3). Instead of calling getaddrinfo(3) directly, `Addrinfo.getaddrinfo` would ask the worker to resolve a name and wait for a response. This method should be able to implement cancellation. (Simply put, this means reimplementation of getaddrinfo_a(3) on our own, taking into account of `fork(2).) This has the advantages: not adding dependencies on external libraries and not having compatibility issues with `getaddrinfo(3)`. However, it is considerably more difficult to implement and maintain. An internal pthread may have a non-trivial impact on the execution efficiency and memory usage. Also, we may need to implement a mechanism to dynamically change the number of workers depending on the load. It would be ideal if we could try and evaluate both approaches. But my current impression is that using c-ares is the quickest and best compromise. ## Contribution wanted I have made it up to the PoC, but don't have much time to complete this. @naruse suggested me to create a ticket asking for contributions. Is anyone interested in this? * This patch changes `rsock_getaddrinfo` to accept a timeout argument. There are several places where Qnil is passed as a timeout (where I add `// TODO` in the PoC). We need to consider what timeout we should pass. * This cares only `getaddrinfo`, but we also need to care `getnameinfo` (and something else if any). There may be some issues I'm not aware of. * I have not yet tested this PoC seriously. It would be great if we could evaluate it with some real apps. Also, it would be great to hear from someone who knows more about c-ares. -- https://bugs.ruby-lang.org/

Issue #19430 has been updated by mame (Yusuke Endoh). I have created another ticket for executing `getaddrinfo(3)` in a dedicated pthread: #19965 ---------------------------------------- Feature #19430: Contribution wanted: DNS lookup by c-ares library https://bugs.ruby-lang.org/issues/19430#change-105011 * Author: mame (Yusuke Endoh) * Status: Open * Priority: Normal ---------------------------------------- ## Problem At the present time, Ruby uses `getaddrinfo(3)` to resolve names. Because this function is synchronous, we cannot interrupt the thread performing name resolution until the DNS server returns a response. We can see this behavior by setting blackhole.webpagetest.org (72.66.115.13) as a DNS server, which swallows all packets, and resolving any name: ``` # cat /etc/resolv.conf nameserver 72.66.115.13 # ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C^C^C^C ``` As we see, Ctrl+C does not stop ruby. The current workaround that users can take is to do name resolution in a Ruby thread. ```ruby Thread.new { Addrinfo.getaddrinfo("www.ruby-lang.org", 80) }.value ``` The thread that calls this code is interruptible. (Note that the newly created thread itself will be stuck until the DNS lookup exceeds the time out.) ## Proposal We can solve this problem by using c-ares, which is an asynchronous name resolver, as a backend of `Addrinfo.getaddrinfo`, etc. (@sorah told me about this library, thanks!) https://c-ares.org/ I have created a PoC patch. https://github.com/mame/ruby/commit/547806146993bbc25984011d423dcc0f913b211c By applying this patch, we can interrupt `Addrinfo.getaddrinfo` by Ctrl+C. ``` # cat /etc/resolv.conf nameserver 72.66.115.13 # ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)' ^C-e:1:in `getaddrinfo': Interrupt from -e:1:in `<main>' ``` ## Discussion ### About c-ares According to the site of c-ares, some major tools including libcurl, Wireshark, and Apache Arrow are already using c-ares. In the language interpreter, node.js seems to be using c-ares. I am honestly not sure about the compatibility of c-ares with `getaddrinfo(3)`. I guess there is no major incompatibility because I have not experienced any name resolution problem of curl. @akr (who is the author and maintainer of Ruby's socket library) suggested to check if OS-specific name resolution, e.g., WINS on Windows, NIS on Solaris, etc., is supported. He also said that it may be acceptable even if they are not supported. Whether to bundle c-ares source code with ruby would require further discussion. If this proposal is accepted, then c-ares will become a de facto essential dependency for practical use, like gmp, in my opinion. Incidentally, node.js bundles c-ares: https://github.com/nodejs/node/tree/main/deps/cares ### Alternative approaches Recent glibc provides `getaddrinfo_a(3)` which performs asynchronous name resolution. However, this function has a fatal problem of being incompatible with `fork(2)`, which is heavily used in the Ruby ecosystem. In fact, the attempt to use `getaddrinfo_a(3)` (#17134) has been revert because it fails rails tests. (#17220) Another alternative is to have a worker pthread inside Ruby that calls getaddrinfo(3). Instead of calling getaddrinfo(3) directly, `Addrinfo.getaddrinfo` would ask the worker to resolve a name and wait for a response. This method should be able to implement cancellation. (Simply put, this means reimplementation of getaddrinfo_a(3) on our own, taking into account of `fork(2).) This has the advantages: not adding dependencies on external libraries and not having compatibility issues with `getaddrinfo(3)`. However, it is considerably more difficult to implement and maintain. An internal pthread may have a non-trivial impact on the execution efficiency and memory usage. Also, we may need to implement a mechanism to dynamically change the number of workers depending on the load. It would be ideal if we could try and evaluate both approaches. But my current impression is that using c-ares is the quickest and best compromise. ## Contribution wanted I have made it up to the PoC, but don't have much time to complete this. @naruse suggested me to create a ticket asking for contributions. Is anyone interested in this? * This patch changes `rsock_getaddrinfo` to accept a timeout argument. There are several places where Qnil is passed as a timeout (where I add `// TODO` in the PoC). We need to consider what timeout we should pass. * This cares only `getaddrinfo`, but we also need to care `getnameinfo` (and something else if any). There may be some issues I'm not aware of. * I have not yet tested this PoC seriously. It would be great if we could evaluate it with some real apps. Also, it would be great to hear from someone who knows more about c-ares. -- https://bugs.ruby-lang.org/
participants (6)
-
akr (Akira Tanaka)
-
alanwu (Alan Wu)
-
Eric Wong
-
kjtsanaktsidis (KJ Tsanaktsidis)
-
kzys (Kazuyoshi Kato)
-
mame (Yusuke Endoh)