[ruby-core:124804] [Ruby Bug#21876] Addrinfo.getaddrinfo(AF_UNSPEC) deadlocks after fork on macOS for IPv4-only hosts
Issue #21876 has been reported by nbeyer@gmail.com (Nathan Beyer). ---------------------------------------- Bug #21876: Addrinfo.getaddrinfo(AF_UNSPEC) deadlocks after fork on macOS for IPv4-only hosts https://bugs.ruby-lang.org/issues/21876 * Author: nbeyer@gmail.com (Nathan Beyer) * Status: Open * ruby -v: ruby 3.4.8 (2025-12-17 revision 995b59f666) +PRISM [arm64-darwin25] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN ---------------------------------------- ## Summary On macOS, `Addrinfo.getaddrinfo(host, service, Socket::AF_UNSPEC, Socket::SOCK_STREAM)` can deadlock in forked child processes when the host has no AAAA (IPv6) DNS records and the parent process previously resolved the same host. This happened to me when using an HTTP library to acquire an OAuth access token in a Rails initializer and then the process was forked, then a separate call was made to the same host in the forked process. ## Environment - macOS (tested on arm64-darwin24 and arm64-darwin25, Apple Silicon) - Ruby 3.4.7, 3.4.8 - The issue is probabilistic — frequency varies by environment but is highly reproducible under sustained DNS activity ## Reproduction Minimal example: ```ruby require "socket" require "timeout" # Parent resolves an IPv4-only host (no AAAA records) Addrinfo.getaddrinfo("httpbin.org", "https", Socket::AF_UNSPEC, Socket::SOCK_STREAM) pid = fork do begin Timeout.timeout(5) do Addrinfo.getaddrinfo("httpbin.org", "https", Socket::AF_UNSPEC, Socket::SOCK_STREAM) end puts "Child: OK" rescue Timeout::Error puts "Child: DEADLOCK — getaddrinfo hung for 5s" end end Process.waitpid(pid) ``` The issue is probabilistic — a single invocation may or may not deadlock. The attached script runs 50 trials each for several variants to demonstrate the pattern. Deadlock may not happen on the first run, but if you run it several times, you should see at least a single deadlock in Test 2, if not deadlock of all results in Test 1 and Test 2. See attachment - ruby_getaddrinfo_fork_bug.rb Typical output: ``` Test 1 (single IPv4-only host): 20/20 deadlocked Test 2 (multi-host warmup): 20/20 deadlocked Test 3 (dual-stack host control): 0/20 deadlocked Test 4 (AF_INET workaround): 0/20 deadlocked ``` ## Context The deadlock occurs when ALL of these conditions hold: 1. **macOS** (not observed on Linux) 2. Parent called `getaddrinfo(host, AF_UNSPEC)` for a host with **no AAAA (IPv6) records** 3. Child calls `getaddrinfo` for the **same host** with `AF_UNSPEC` **Not affected:** - Hosts **with** AAAA records (dual-stack) — e.g., `www.google.com`, `rubygems.org` - Using `Socket::AF_INET` instead of `Socket::AF_UNSPEC` - Hosts the parent never resolved | Host | AAAA records | Child deadlocks? | |------|-------------|-----------------| | httpbin.org | None | **Yes** | | www.github.com | None | **Yes** | | api.github.com | None | **Yes** | | stackoverflow.com | None | **Yes** | | www.google.com | Yes | No | | rubygems.org | Yes | No | | example.com | Yes | No | | www.cloudflare.com | Yes | No | ## Potential Root Cause As I understand it, on macOS, `getaddrinfo` communicates with the `mDNSResponder` system daemon via Mach IPC ports. When `getaddrinfo(AF_UNSPEC)` queries a host with no AAAA records, the negative AAAA result appears to be cached via Mach port state. After `fork()`, the child process inherits the address space (including references to this cached state) but does **not** inherit the Mach port connections to `mDNSResponder`. When the child calls `getaddrinfo` for the same host, it encounters the stale cache entry and deadlocks trying to communicate over the invalidated Mach port. Hosts with positive AAAA results are not affected, presumably because their cache entries do not require re-contacting `mDNSResponder` in the same code path. ## Feature #20590 Ruby 3.4's fork safety improvements (Feature #20590) added a read-write lock around `getaddrinfo` to prevent `fork()` while a `getaddrinfo` call is actively running. However, this does not address the issue reported here — the problem is not about forking *during* a `getaddrinfo` call, but about stale mDNSResponder Mach port state that is inherited by the child process *after* `getaddrinfo` has completed in the parent. ---Files-------------------------------- ruby_getaddrinfo_fork_bug.rb (5.36 KB) -- https://bugs.ruby-lang.org/
Issue #21876 has been updated by luke-gru (Luke Gruber). I'm getting a segfault when running your minimal reproduction script on my Macbook Pro (`Darwin Mac 25.2.0 Darwin Kernel Version 25.2.0 (Apple Silicon)`). I get the segfault when compiling under all 3 `GETADDRINFO_IMPL` implementations that ruby uses. This looks to be a bug in Darwin and not Ruby, although maybe we can work around it. Have you sent this bug report to Apple? If not, I'll try coming up with a reproduction in pure C that we can send to them. ---------------------------------------- Bug #21876: Addrinfo.getaddrinfo(AF_UNSPEC) deadlocks after fork on macOS for IPv4-only hosts https://bugs.ruby-lang.org/issues/21876#change-116655 * Author: nbeyer@gmail.com (Nathan Beyer) * Status: Open * ruby -v: ruby 3.4.8 (2025-12-17 revision 995b59f666) +PRISM [arm64-darwin25] * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN ---------------------------------------- ## Summary On macOS, `Addrinfo.getaddrinfo(host, service, Socket::AF_UNSPEC, Socket::SOCK_STREAM)` can deadlock in forked child processes when the host has no AAAA (IPv6) DNS records and the parent process previously resolved the same host. This happened to me when using an HTTP library to acquire an OAuth access token in a Rails initializer and then the process was forked, then a separate call was made to the same host in the forked process. ## Environment - macOS (tested on arm64-darwin24 and arm64-darwin25, Apple Silicon) - Ruby 3.4.7, 3.4.8 - The issue is probabilistic — frequency varies by environment but is highly reproducible under sustained DNS activity ## Reproduction Minimal example: ```ruby require "socket" require "timeout" # Parent resolves an IPv4-only host (no AAAA records) Addrinfo.getaddrinfo("httpbin.org", "https", Socket::AF_UNSPEC, Socket::SOCK_STREAM) pid = fork do begin Timeout.timeout(5) do Addrinfo.getaddrinfo("httpbin.org", "https", Socket::AF_UNSPEC, Socket::SOCK_STREAM) end puts "Child: OK" rescue Timeout::Error puts "Child: DEADLOCK — getaddrinfo hung for 5s" end end Process.waitpid(pid) ``` The issue is probabilistic — a single invocation may or may not deadlock. The attached script runs 50 trials each for several variants to demonstrate the pattern. Deadlock may not happen on the first run, but if you run it several times, you should see at least a single deadlock in Test 2, if not deadlock of all results in Test 1 and Test 2. See attachment - ruby_getaddrinfo_fork_bug.rb Typical output: ``` Test 1 (single IPv4-only host): 20/20 deadlocked Test 2 (multi-host warmup): 20/20 deadlocked Test 3 (dual-stack host control): 0/20 deadlocked Test 4 (AF_INET workaround): 0/20 deadlocked ``` ## Context The deadlock occurs when ALL of these conditions hold: 1. **macOS** (not observed on Linux) 2. Parent called `getaddrinfo(host, AF_UNSPEC)` for a host with **no AAAA (IPv6) records** 3. Child calls `getaddrinfo` for the **same host** with `AF_UNSPEC` **Not affected:** - Hosts **with** AAAA records (dual-stack) — e.g., `www.google.com`, `rubygems.org` - Using `Socket::AF_INET` instead of `Socket::AF_UNSPEC` - Hosts the parent never resolved | Host | AAAA records | Child deadlocks? | |------|-------------|-----------------| | httpbin.org | None | **Yes** | | www.github.com | None | **Yes** | | api.github.com | None | **Yes** | | stackoverflow.com | None | **Yes** | | www.google.com | Yes | No | | rubygems.org | Yes | No | | example.com | Yes | No | | www.cloudflare.com | Yes | No | ## Potential Root Cause As I understand it, on macOS, `getaddrinfo` communicates with the `mDNSResponder` system daemon via Mach IPC ports. When `getaddrinfo(AF_UNSPEC)` queries a host with no AAAA records, the negative AAAA result appears to be cached via Mach port state. After `fork()`, the child process inherits the address space (including references to this cached state) but does **not** inherit the Mach port connections to `mDNSResponder`. When the child calls `getaddrinfo` for the same host, it encounters the stale cache entry and deadlocks trying to communicate over the invalidated Mach port. Hosts with positive AAAA results are not affected, presumably because their cache entries do not require re-contacting `mDNSResponder` in the same code path. ## Feature #20590 Ruby 3.4's fork safety improvements (Feature #20590) added a read-write lock around `getaddrinfo` to prevent `fork()` while a `getaddrinfo` call is actively running. However, this does not address the issue reported here — the problem is not about forking *during* a `getaddrinfo` call, but about stale mDNSResponder Mach port state that is inherited by the child process *after* `getaddrinfo` has completed in the parent. ---Files-------------------------------- ruby_getaddrinfo_fork_bug.rb (5.36 KB) -- https://bugs.ruby-lang.org/
participants (2)
-
luke-gru (Luke Gruber) -
nbeyer@gmail.com (Nathan Beyer)