Issue #19430 has been updated by kjtsanaktsidis (KJ Tsanaktsidis).
I totally agree that this problem is worth solving, but moving away from platform-native
libc-based DNS lookups will definitely have an impact on our use of Ruby (at Zendesk).
I'd like to share a bit of information about our use-case.
We have a development environment on MacOS based on Docker for Mac. We have a dnsmasq
container running inside the Docker for Mac VM, and we configure the MacOS DNS resolver
(via editing files in `/etc/resolver`) to forward queries for `.docker` (and some other
domains, like `.consul` and `.zd-dev`) to the dnsmasq container (which is listening on a
port forwarded to the host by Docker). This makes it possible to connect to things like
`mysql.docker` etc from a console on macOS, including from Ruby scripts.
As is, if c-ares is used for DNS resolution, then this kind of domain-specific DNS routing
will stop working. c-ares will simply forward all queries to the DNS servers returned by
`res_getservers` from `resolv.h`, and they won't know how to handle `.docker` etc.
I'd be very curious to hear from other Docker for Mac users if they do something
similar, or if our setup at Zendesk is just crazy.
I can think of two ways to fix this problem:
* Implement support for reading the DNS configuration out of the system configuration
framework in c-ares, and using that information to implement per-domain DNS dispatch.
* Implement some kind of external DNS resolver which just satisfies all queries by calling
`gethostbyname(3)`, and point c-ares at that. This is essentially how systemd-resolved
handles this problem; apps _can_ talk to resolved directly over its DBus API (or by using
the `nss_resolve` NSS module). However, it also exposes a stub DNS resolver at
`127.0.0.53:53` and publishes that in `/etc/resolv.conf`; apps which do their own DNS by
looking for servers in `/etc/resolv.conf` will thus be sending their queries to resolved
and get the same answers as everybody else.
(A few Q's about the stub resolver - would it need to be a system or user service? Is
it something the Ruby interpreter could fork off itself?)
Anyway - I'm sharing this not to try and suggest we shouldn't do a switch to
c-ares, but rather to see how we can keep some use-cases working while we do it :)
----------------------------------------
Feature #19430: Contribution wanted: DNS lookup by c-ares library
https://bugs.ruby-lang.org/issues/19430#change-101799
* Author: mame (Yusuke Endoh)
* Status: Open
* Priority: Normal
----------------------------------------
## Problem
At the present time, Ruby uses `getaddrinfo(3)` to resolve names. Because this function is
synchronous, we cannot interrupt the thread performing name resolution until the DNS
server returns a response.
We can see this behavior by setting
blackhole.webpagetest.org (72.66.115.13) as a DNS
server, which swallows all packets, and resolving any name:
```
# cat /etc/resolv.conf
nameserver 72.66.115.13
# ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org",
80)'
^C^C^C^C
```
As we see, Ctrl+C does not stop ruby.
The current workaround that users can take is to do name resolution in a Ruby thread.
```ruby
Thread.new { Addrinfo.getaddrinfo("www.ruby-lang.org", 80) }.value
```
The thread that calls this code is interruptible. (Note that the newly created thread
itself will be stuck until the DNS lookup exceeds the time out.)
## Proposal
We can solve this problem by using c-ares, which is an asynchronous name resolver, as a
backend of `Addrinfo.getaddrinfo`, etc. (@sorah told me about this library, thanks!)
https://c-ares.org/
I have created a PoC patch.
https://github.com/mame/ruby/commit/547806146993bbc25984011d423dcc0f913b211c
By applying this patch, we can interrupt `Addrinfo.getaddrinfo` by Ctrl+C.
```
# cat /etc/resolv.conf
nameserver 72.66.115.13
# ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org",
80)'
^C-e:1:in `getaddrinfo': Interrupt
from -e:1:in `<main>'
```
## Discussion
### About c-ares
According to the site of c-ares, some major tools including libcurl, Wireshark, and Apache
Arrow are already using c-ares. In the language interpreter, node.js seems to be using
c-ares.
I am honestly not sure about the compatibility of c-ares with `getaddrinfo(3)`. I guess
there is no major incompatibility because I have not experienced any name resolution
problem of curl. @akr (who is the author and maintainer of Ruby's socket library)
suggested to check if OS-specific name resolution, e.g., WINS on Windows, NIS on Solaris,
etc., is supported. He also said that it may be acceptable even if they are not
supported.
Whether to bundle c-ares source code with ruby would require further discussion. If this
proposal is accepted, then c-ares will become a de facto essential dependency for
practical use, like gmp, in my opinion. Incidentally, node.js bundles c-ares:
https://github.com/nodejs/node/tree/main/deps/cares
### Alternative approaches
Recent glibc provides `getaddrinfo_a(3)` which performs asynchronous name resolution.
However, this function has a fatal problem of being incompatible with `fork(2)`, which is
heavily used in the Ruby ecosystem. In fact, the attempt to use `getaddrinfo_a(3)`
(#17134) has been revert because it fails rails tests. (#17220)
Another alternative is to have a worker pthread inside Ruby that calls getaddrinfo(3).
Instead of calling getaddrinfo(3) directly, `Addrinfo.getaddrinfo` would ask the worker to
resolve a name and wait for a response. This method should be able to implement
cancellation. (Simply put, this means reimplementation of getaddrinfo_a(3) on our own,
taking into account of `fork(2).)
This has the advantages: not adding dependencies on external libraries and not having
compatibility issues with `getaddrinfo(3)`. However, it is considerably more difficult to
implement and maintain. An internal pthread may have a non-trivial impact on the execution
efficiency and memory usage. Also, we may need to implement a mechanism to dynamically
change the number of workers depending on the load.
It would be ideal if we could try and evaluate both approaches. But my current impression
is that using c-ares is the quickest and best compromise.
## Contribution wanted
I have made it up to the PoC, but don't have much time to complete this. @naruse
suggested me to create a ticket asking for contributions. Is anyone interested in this?
* This patch changes `rsock_getaddrinfo` to accept a timeout argument. There are several
places where Qnil is passed as a timeout (where I add `// TODO` in the PoC). We need to
consider what timeout we should pass.
* This cares only `getaddrinfo`, but we also need to care `getnameinfo` (and something
else if any). There may be some issues I'm not aware of.
* I have not yet tested this PoC seriously. It would be great if we could evaluate it with
some real apps.
Also, it would be great to hear from someone who knows more about c-ares.
--
https://bugs.ruby-lang.org/