Issue #21790 has been updated by adamoffat (Adam Moffat). File python_dns_fork_test.py added mame (Yusuke Endoh) wrote in #note-5:
Thank you. This looks like the same issue reported multiple times in the past, but we were previously stuck without a way to investigate.
https://bugs.ruby-lang.org/issues/15490 https://bugs.ruby-lang.org/issues/15794 https://github.com/redis/redis-rb/issues/859 https://github.com/hanami/hanami/issues/993
It is greatly appreciated that the reproduction conditions are now much clearer.
This issue does not affect Python even in a forked child process, right? If Python avoids this error, checking how it calls `getaddrinfo` might give us a hint for a fix or workaround.
It is difficult for me to debug this without a reproducing environment. Are there any committers or contributors who can reproduce the issue and investigate?
Yes I can confirm that Python works (Python version 3.12.2) Here is a script in Python that executes the same conditions: ``` python #!/usr/bin/env python3 """ python_dns_fork_test.py - Test for macOS Tahoe DNS fork bug This script exercises the exact same conditions as the Ruby reproduction: 1. Parent process resolves DNS for an IPv4-only host (poisoning the child) 2. Fork a child process 3. Child attempts DNS resolution On macOS Tahoe with Ruby, this causes a hang/crash. On Python, this should work fine (demonstrating the bug is Ruby-specific). """ import socket import os import sys import signal TEST_HOST = "api.segment.io" # IPv4-only host TIMEOUT_SECONDS = 90 print(f"Python version: {sys.version}") print(f"Platform: {sys.platform}") print() # Parent does DNS resolution first (this "poisons" the child in Ruby) print(f"Parent: Resolving {TEST_HOST}...") socket.getaddrinfo(TEST_HOST, 443, socket.AF_UNSPEC, socket.SOCK_STREAM) print("Parent: Done") # Fork and try DNS in child print("Forking...") pid = os.fork() if pid == 0: # Child process print(f"Child ({os.getpid()}): Attempting DNS resolution...") # Set up alarm for timeout def timeout_handler(signum, frame): print("Child: FAILED - DNS resolution hung (bug present)") sys.exit(1) signal.signal(signal.SIGALRM, timeout_handler) signal.alarm(TIMEOUT_SECONDS) try: socket.getaddrinfo(TEST_HOST, 443, socket.AF_UNSPEC, socket.SOCK_STREAM) signal.alarm(0) # Cancel the alarm print("Child: SUCCESS - DNS resolution completed!") sys.exit(0) except Exception as e: signal.alarm(0) print(f"Child: ERROR - {e}") sys.exit(1) else: # Parent process _, status = os.waitpid(pid, 0) exit_status = os.WEXITSTATUS(status) print() if exit_status == 0: print("✅ Python does NOT have the DNS fork bug") else: print("❌ Python HAS the DNS fork bug") ``` and here is the output that I get: ``` Python version: 3.12.2 (main, Aug 22 2024, 10:56:22) [Clang 15.0.0 (clang-1500.3.9.4)] Platform: darwin Parent: Resolving api.segment.io... Parent: Done Forking... ✅ Python does NOT have the DNS fork bug ``` ---------------------------------------- Bug #21790: `Socket.getaddrinfo` hangs after `fork()` on macOS 26.1 (Tahoe) for IPv4-only hosts https://bugs.ruby-lang.org/issues/21790#change-115807 * Author: adamoffat (Adam Moffat) * Status: Open * ruby -v: 3.3.8 * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- Ruby's `Socket.getaddrinfo` hangs indefinitely in forked child processes on macOS 26.1 (Tahoe) when resolving IPv4-only hostnames. This is a regression that does not occur on macOS 15.x (Sonoma) or earlier. **Ruby version:** ruby 3.3.8 (2025-04-09 revision b200bad6cd) [arm64-darwin24] Also confirmed this affects Ruby 3.2.6 and 3.4.1. **Reproducible script:** ``` ruby require "socket" require "timeout" puts "Ruby #{RUBY_VERSION} on #{RUBY_PLATFORM}" Socket.getaddrinfo("api.segment.io", 443, nil, :STREAM) puts "Parent: DNS completed" pid = fork do puts "Child: Attempting DNS resolution..." begin Timeout.timeout(90) do Socket.getaddrinfo("api.segment.io", 443, nil, :STREAM) end puts "Child: SUCCESS" exit 0 rescue Timeout::Error puts "Child: FAILED - hung for 90 seconds" exit 1 end end Process.wait(pid) ``` **Note:** Remove the `Timeout.timeout(90)` wrapper to observe the hang indefinitely. The timeout is included only to allow the script to exit for testing purposes. **Result of reproduce process:** ``` Ruby 3.3.8 on arm64-darwin24 Parent: DNS completed Child: Attempting DNS resolution... Child: FAILED - hung for 90 seconds ``` The child process hangs with one thread consuming 100% CPU. Expected result: The child process should complete DNS resolution successfully, as it does on macOS 15.x and earlier. **Analysis:** Stack trace shows: Main thread: Blocked in `wait_getaddrinfo` → `_pthread_cond_wait` DNS thread: Spinning in `_gai_nat64_second_pass` → `nw_path_access_agent_cache` → `_os_log_preferences_refresh` → `SIGSEGV` The crash occurs in macOS's NAT64 synthesis code path. Ruby's signal handler catches the `SIGSEGV` but cannot recover, causing the DNS thread to spin. **Key observations:** - Only affects IPv4-only hosts. Hosts with IPv6 (like google.com) work correctly. - Using `AF_INET` instead of `AF_UNSPEC` works. `Socket.getaddrinfo("api.segment.io", 443, Socket::AF_INET, :STREAM)` succeeds. - Python is not affected. Python calls `getaddrinfo()` synchronously without a background thread. - Parent must do DNS before fork. If the parent has not called getaddrinfo(), the child works correctly. **Workaround:** - Use `resolv-replace` to bypass the native DNS resolver: `require "resolv-replace"` **Impact:** This breaks all Ruby applications using pre-forking worker models (Resque, Unicorn, Puma, Sidekiq, Passenger) on macOS Tahoe. **Apple Bug Report:** Filed with Apple as Feedback Assistant #FB21364061 ---Files-------------------------------- stack_trace.txt (66.6 KB) ruby_dns_fork_bug.rb (1.02 KB) ruby_3.2.6_crash_output.txt (1.79 KB) python_dns_fork_test.py (1.8 KB) -- https://bugs.ruby-lang.org/