Issue #20208 has been updated by kjtsanaktsidis (KJ Tsanaktsidis).
Thanks for this report - it was super detailed and made it very easy for me to figure out
what's going on!
Firstly, your bisection is right. The AI_ADDRCONFIG flag is what makes the difference
here. The flag causes glibc to NOT return ipv6 addresses if the system doesn't have
any ipv6 addresses of its own - and the loopback device doesn't count, glibc will
ignore that when asking "does the system have ipv6 addresses?". This is normally
what you want when using the result of `getaddrinfo` for an outbound connection; if you
don't have an ipv6 connection to the world, perfoming AAAA DNS lookups which will
return results you can't possibly use is pointless and AI_ADDRCONFIG skips this.
By default, Ruby will use `AI_ADDRCONFIG` for DNS lookups it performs internally as a
result of connecting to things; so `TCPSocket.new`, etc perform their DNS lookups with
AI_ADDRCONFIG (since it knows the point of this lookup is to make a connection with it),
but other functions like `Addrinfo.getaddrinfo` by default are _not_ made with this flag,
since you might be using the results to do something other than connect to them - maybe
you're writing dig in ruby, for example.
The problem with your reproduction is that you are actually trying to connect to
localhost; so, your loopback ipv6 address is actually relevant here!
Now, on to your reproduction:
```
http = Net::HTTP.new("localhost", 8080)
```
This is going to end up calling into `TCPSocket.open`, which will perform DNS resolution
with AI_ADDRCONFIG. Since your system has no non-loopback IPv6 addresses, this means that
'127.0.0.1' gets returned. Whether or not AI_ADDRCONFIG should return IPv6 results
for localhost if the loopback adapter has an IPv6 address is an interesting question, but
the current implementation in glibc is that it does not:
```
irb(main):010:0> Addrinfo.getaddrinfo("localhost", 8080, nil, :STREAM, nil,
Socket::AI_ADDRCONFIG)
=> [#<Addrinfo: 127.0.0.1:8080 TCP (localhost)>, #<Addrinfo: 127.0.0.1:8080
TCP (localhost)>]
irb(main):011:0> system 'ip addr list'
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host noprefixroute
valid_lft forever preferred_lft forever
2: enp0s31f6: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN
group default qlen 1000
link/ether 84:a9:38:35:ea:56 brd ff:ff:ff:ff:ff:ff
3: wlp0s20f3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
group default qlen 1000
link/ether a0:e7:0b:22:fc:ea brd ff:ff:ff:ff:ff:ff
inet 192.168.2.249/24 brd 192.168.2.255 scope global dynamic noprefixroute wlp0s20f3
valid_lft 83114sec preferred_lft 83114sec
```
So, because `getaddrinfo` returned '127.0.0.1', we proceed to create a IPv4 socket
for the connection (this is the `AF_INET` socket you see in the strace output).
Then, the next line of your reproduction:
```
http.local_host = Addrinfo.tcp("localhost", 8080).ip_address
```
This is calling `getaddrinfo` to resolve "localhost" for us to use it as the
_local_ side of the connection. Because Ruby does not know what you intend to do with this
IP address, it does not make the request with AI_ADDRCONFIG. Thus, you get an IPv6 result
returned, since there is an IPv6 addres for localhost!
This results in the call to `bind(AF_INET6)` in your strace output, and hence the error.
---
I think the problem here is that the test `TestNetHTTPLocalBind#test_bind_to_local_host`
(and friends) is wrong. It _should_ be perforning the following sequence of actions (in
pseudocode):
* Do `remote_addr = getaddrinfo("host to connect to", AF_UNSPEC,
AI_ADDRCONFIG)`
* Then, do `local_bind_addr = getaddrinfo("localhost",
remote_addr.address_family)`
* Then, do `socket(remote_addr)`, `bind(local_bind_addr)`, and `connect(remote_addr)`.
i.e. we should be explicitly specifying the address family when looking up the local
address, so that it's the same as the address family we're going to use in
remote_address.
However what it's actually doing is
* Do `remote_addr = getaddrinfo("host to connect to", AF_UNSPEC,
AI_ADDRCONFIG)`
* Then, do `local_bind_addr = getaddrinfo("localhost", AF_UNSPEC)`
* Then, do `socket(remote_addr)`, `bind(local_bind_addr)`, and `connect(remote_addr)`.
So there's no guarnatee that the local_host it looks up is in the same address family
as what it's going to connect to.
Fortunately, `#local_host=` accepts a string, which will be looked up during the
connection. So this program _does_ work properly:
```
http = Net::HTTP.new("localhost", 8080)
http.local_host = "localhost"
p http.get("/")
```
If it connects to `::1` (for _whatever_ reason), it will use `::1` as the local addr; and
if it connects to `127.0.0.1`, it will use `127.0.0.1` as the local addr.
So tl;dr: I'm going to fix the tests here, i think the implementation behaviour is
correct.
----------------------------------------
Bug #20208: Net::HTTP errors with Errno::EAFNOSUPPORT when setting local_host with
Addrinfo
https://bugs.ruby-lang.org/issues/20208#change-106455
* Author: jprokop (Jarek Prokop)
* Status: Assigned
* Priority: Normal
* Assignee: kjtsanaktsidis (KJ Tsanaktsidis)
* ruby -v: ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-linux]
* Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
A bug was found when dealing with Ruby tests downstream. One of our builders has a
specific networking configuration, resulting in Ruby incorrectly binding a socket,
resulting in exception Errno::EAFNOSUPPORT,
despite localhost being IPv6 capable.
It is reproducible with Ruby 3.3, and reasonably current master (git hash
a846d391d38b34fcc4f90adef967c166c923bd56).
Reproduction environment:
The networking configuration has to be in a specific state. The regular interface (such as
eth0) has to have ipv6 disabled while localhost is IPv6 enabled.
I have tracked the problem to a commit adding AI_ADDRCONFIG flag:
https://github.com/ruby/ruby/commit/d2ba8ea54a4089959afdeecdd963e3c4ff39174…
If I revert the commit or just simply set 2 ifdefs that are present in the diff with
`HAVE_CONST_AI_ADDRCONFIG` to 0, the problem no longer occurs.
I have used vagrant with fedora/39-cloud-base box with the above mentioned git hash.
However, I'd note that I reproduced it also on RHEL 8 and RHEL 9.
The VM has the following interfaces:
~~~
$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host noprefixroute
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group
default qlen 1000
link/ether 52:54:00:e3:aa:c1 brd ff:ff:ff:ff:ff:ff
altname enp0s5
altname ens5
inet 192.168.122.209/24 brd 192.168.122.255 scope global dynamic noprefixroute eth0
valid_lft 2099sec preferred_lft 2099sec
inet6 fe80::f5fe:e8a4:8f83:4a8f/64 scope link tentative noprefixroute
valid_lft forever preferred_lft forever
~~~
Disable IPv6 of eth0 and leave only lo with IPv6:
~~~
$ sudo sysctl "net.ipv6.conf.eth0.disable_ipv6=1"
~~~
Confirm the result:
~~~
$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host noprefixroute
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group
default qlen 1000
link/ether 52:54:00:e3:aa:c1 brd ff:ff:ff:ff:ff:ff
altname enp0s5
altname ens5
inet 192.168.122.209/24 brd 192.168.122.255 scope global dynamic noprefixroute eth0
valid_lft 3587sec preferred_lft 3587sec
~~~
inet6 is no longer present on eth0, but still present in lo.
Then we can copy what TestNetHTTPLocalBind is doing in setup, as that is one of the
failing tests and use it for a reproducer:
~~~
$ ruby -rnet/http -e 'http = Net::HTTP.new("localhost", 8080);
http.local_host = Addrinfo.tcp("localhost", 8080).ip_address; p
http.get("/")'
/usr/share/ruby/net/http.rb:1603:in `initialize': Failed to open TCP connection to
localhost:8080 (Address family not supported by protocol - bind(2) for "::1"
port ) (Errno::EAFNOSUPPORT)
from /usr/share/ruby/net/http.rb:1603:in `open'
from /usr/share/ruby/net/http.rb:1603:in `block in connect'
from /usr/share/ruby/timeout.rb:186:in `block in timeout'
from /usr/share/ruby/timeout.rb:193:in `timeout'
from /usr/share/ruby/net/http.rb:1601:in `connect'
from /usr/share/ruby/net/http.rb:1580:in `do_start'
from /usr/share/ruby/net/http.rb:1569:in `start'
from /usr/share/ruby/net/http.rb:2297:in `request'
from /usr/share/ruby/net/http.rb:1917:in `get'
from -e:1:in `<main>'
/usr/share/ruby/net/http.rb:1603:in `initialize': Address family not supported by
protocol - bind(2) for "::1" port (Errno::EAFNOSUPPORT)
from /usr/share/ruby/net/http.rb:1603:in `open'
from /usr/share/ruby/net/http.rb:1603:in `block in connect'
from /usr/share/ruby/timeout.rb:186:in `block in timeout'
from /usr/share/ruby/timeout.rb:193:in `timeout'
from /usr/share/ruby/net/http.rb:1601:in `connect'
from /usr/share/ruby/net/http.rb:1580:in `do_start'
from /usr/share/ruby/net/http.rb:1569:in `start'
from /usr/share/ruby/net/http.rb:2297:in `request'
from /usr/share/ruby/net/http.rb:1917:in `get'
from -e:1:in `<main>'
~~~
The script:
~~~
http = Net::HTTP.new("localhost", 8080)
http.local_host = Addrinfo.tcp("localhost", 8080).ip_address
p http.get("/")
~~~
Without setting the `http.local_host` attribute using Addrinfo, the reproducer does not
fail with EAFNOSUPPORT. Whether `port` is specified or `nil` does not make a difference.
Whether there is a server listening on 8080 or not does not make a difference, the script
fails with the errno regardless.
I have collected `strace` that points to a possible cause:
~~~
$ strace ruby -rnet/http -e 'http = Net::HTTP.new("localhost", 8080);
http.local_host = Addrinfo.tcp("localhost", 8080).ip_address; p
http.get("/")' 2>&1 | grep AF_INET
socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_TCP) = 5
bind(5, {sa_family=AF_INET6, sin6_port=htons(0), sin6_flowinfo=htonl(0),
inet_pton(AF_INET6, "::1", &sin6_addr), sin6_scope_id=0}, 28) = -1
EAFNOSUPPORT (Address family not supported by protocol)
~~~
A socket is created with AF_INET and later is bound with AF_INET6, that is not correct
behavior as far as I can tell.
Full strace is attached.
Observed failures in Ruby test suite related to this issue:
~~~
109) Error:
TestNetHTTPLocalBind#test_bind_to_local_port:
Errno::EAFNOSUPPORT: Failed to open TCP connection to localhost:37337 (Address family not
supported by protocol - bind(2) for "::1" port 45395)
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1603:in `initialize'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1603:in `open'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1603:in `block in connect'
/builddir/build/BUILD/ruby-3.3.0/lib/timeout.rb:186:in `block in timeout'
/builddir/build/BUILD/ruby-3.3.0/lib/timeout.rb:193:in `timeout'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1601:in `connect'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1580:in `do_start'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1569:in `start'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:2297:in `request'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1917:in `get'
/builddir/build/BUILD/ruby-3.3.0/test/net/http/test_http.rb:1282:in
`test_bind_to_local_port'
110) Error:
TestNetHTTPLocalBind#test_bind_to_local_host:
Errno::EAFNOSUPPORT: Failed to open TCP connection to localhost:46329 (Address family not
supported by protocol - bind(2) for "::1" port )
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1603:in `initialize'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1603:in `open'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1603:in `block in connect'
/builddir/build/BUILD/ruby-3.3.0/lib/timeout.rb:186:in `block in timeout'
/builddir/build/BUILD/ruby-3.3.0/lib/timeout.rb:193:in `timeout'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1601:in `connect'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1580:in `do_start'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1569:in `start'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:2297:in `request'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1917:in `get'
/builddir/build/BUILD/ruby-3.3.0/test/net/http/test_http.rb:1267:in
`test_bind_to_local_host'
111) Error:
TestNetHTTPForceEncoding#test_response_body_encoding_false:
Errno::EAFNOSUPPORT: Failed to open TCP connection to localhost:41749 (Address family not
supported by protocol - bind(2) for "::1" port )
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1603:in `initialize'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1603:in `open'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1603:in `block in connect'
/builddir/build/BUILD/ruby-3.3.0/lib/timeout.rb:186:in `block in timeout'
/builddir/build/BUILD/ruby-3.3.0/lib/timeout.rb:193:in `timeout'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1601:in `connect'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1580:in `do_start'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1569:in `start'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:2297:in `request'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1917:in `get'
/builddir/build/BUILD/ruby-3.3.0/test/net/http/test_http.rb:1308:in `fe_request'
/builddir/build/BUILD/ruby-3.3.0/test/net/http/test_http.rb:1312:in
`test_response_body_encoding_false'
112) Error:
TestNetHTTPForceEncoding#test_response_body_encoding_string_without_content_type:
Errno::EAFNOSUPPORT: Failed to open TCP connection to localhost:42775 (Address family not
supported by protocol - bind(2) for "::1" port )
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1603:in `initialize'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1603:in `open'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1603:in `block in connect'
/builddir/build/BUILD/ruby-3.3.0/lib/timeout.rb:186:in `block in timeout'
/builddir/build/BUILD/ruby-3.3.0/lib/timeout.rb:193:in `timeout'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1601:in `connect'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1580:in `do_start'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1569:in `start'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:2297:in `request'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1917:in `get'
/builddir/build/BUILD/ruby-3.3.0/test/net/http/test_http.rb:1308:in `fe_request'
/builddir/build/BUILD/ruby-3.3.0/test/net/http/test_http.rb:1330:in
`test_response_body_encoding_string_without_content_type'
113) Error:
TestNetHTTPForceEncoding#test_response_body_encoding_true_with_content_type:
Errno::EAFNOSUPPORT: Failed to open TCP connection to localhost:36895 (Address family not
supported by protocol - bind(2) for "::1" port )
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1603:in `initialize'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1603:in `open'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1603:in `block in connect'
/builddir/build/BUILD/ruby-3.3.0/lib/timeout.rb:186:in `block in timeout'
/builddir/build/BUILD/ruby-3.3.0/lib/timeout.rb:193:in `timeout'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1601:in `connect'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1580:in `do_start'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1569:in `start'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:2297:in `request'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1917:in `get'
/builddir/build/BUILD/ruby-3.3.0/test/net/http/test_http.rb:1308:in `fe_request'
/builddir/build/BUILD/ruby-3.3.0/test/net/http/test_http.rb:1324:in
`test_response_body_encoding_true_with_content_type'
114) Error:
TestNetHTTPForceEncoding#test_response_body_encoding_encoding_without_content_type:
Errno::EAFNOSUPPORT: Failed to open TCP connection to localhost:37115 (Address family not
supported by protocol - bind(2) for "::1" port )
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1603:in `initialize'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1603:in `open'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1603:in `block in connect'
/builddir/build/BUILD/ruby-3.3.0/lib/timeout.rb:186:in `block in timeout'
/builddir/build/BUILD/ruby-3.3.0/lib/timeout.rb:193:in `timeout'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1601:in `connect'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1580:in `do_start'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1569:in `start'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:2297:in `request'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1917:in `get'
/builddir/build/BUILD/ruby-3.3.0/test/net/http/test_http.rb:1308:in `fe_request'
/builddir/build/BUILD/ruby-3.3.0/test/net/http/test_http.rb:1336:in
`test_response_body_encoding_encoding_without_content_type'
115) Error:
TestNetHTTPForceEncoding#test_response_body_encoding_true_without_content_type:
Errno::EAFNOSUPPORT: Failed to open TCP connection to localhost:37799 (Address family not
supported by protocol - bind(2) for "::1" port )
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1603:in `initialize'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1603:in `open'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1603:in `block in connect'
/builddir/build/BUILD/ruby-3.3.0/lib/timeout.rb:186:in `block in timeout'
/builddir/build/BUILD/ruby-3.3.0/lib/timeout.rb:193:in `timeout'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1601:in `connect'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1580:in `do_start'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1569:in `start'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:2297:in `request'
/builddir/build/BUILD/ruby-3.3.0/lib/net/http.rb:1917:in `get'
/builddir/build/BUILD/ruby-3.3.0/test/net/http/test_http.rb:1308:in `fe_request'
/builddir/build/BUILD/ruby-3.3.0/test/net/http/test_http.rb:1318:in
`test_response_body_encoding_true_without_content_type'
~~~
Related failures from specs:
~~~
1)
An exception occurred during: before :each
TCPSocket#local_address using IPv6 using an implicit hostname the returned Addrinfo uses
the correct IP address ERROR
Errno::ECONNREFUSED: Connection refused - connect(2) for nil port 37121
/builddir/build/BUILD/ruby-3.3.0/spec/ruby/library/socket/tcpsocket/local_address_spec.rb:59:in
`initialize'
/builddir/build/BUILD/ruby-3.3.0/spec/ruby/library/socket/tcpsocket/local_address_spec.rb:59:in
`new'
/builddir/build/BUILD/ruby-3.3.0/spec/ruby/library/socket/tcpsocket/local_address_spec.rb:59:in
`block (4 levels) in <top (required)>'
/builddir/build/BUILD/ruby-3.3.0/spec/ruby/library/socket/tcpsocket/local_address_spec.rb:4:in
`<top (required)>'
2)
An exception occurred during: before :each
TCPSocket#remote_address using IPv6 using an implicit hostname the returned Addrinfo uses
the correct IP address ERROR
Errno::ECONNREFUSED: Connection refused - connect(2) for nil port 39823
/builddir/build/BUILD/ruby-3.3.0/spec/ruby/library/socket/tcpsocket/remote_address_spec.rb:58:in
`initialize'
/builddir/build/BUILD/ruby-3.3.0/spec/ruby/library/socket/tcpsocket/remote_address_spec.rb:58:in
`new'
/builddir/build/BUILD/ruby-3.3.0/spec/ruby/library/socket/tcpsocket/remote_address_spec.rb:58:in
`block (4 levels) in <top (required)>'
/builddir/build/BUILD/ruby-3.3.0/spec/ruby/library/socket/tcpsocket/remote_address_spec.rb:4:in
`<top (required)>'
~~~
---Files--------------------------------
strace_log.txt (304 KB)
--
https://bugs.ruby-lang.org/