[ruby-core:124288] [Ruby Bug#21790] `Socket.getaddrinfo` hangs after `fork()` on macOS 26.1 (Tahoe) for IPv4-only hosts
Issue #21790 has been reported by adamoffat (Adam Moffat). ---------------------------------------- Bug #21790: `Socket.getaddrinfo` hangs after `fork()` on macOS 26.1 (Tahoe) for IPv4-only hosts https://bugs.ruby-lang.org/issues/21790 * Author: adamoffat (Adam Moffat) * Status: Open * ruby -v: 3.3.8 * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- Ruby's `Socket.getaddrinfo` hangs indefinitely in forked child processes on macOS 26.1 (Tahoe) when resolving IPv4-only hostnames. This is a regression that does not occur on macOS 15.x (Sonoma) or earlier. **Ruby version:** ruby 3.3.8 (2025-04-09 revision b200bad6cd) [arm64-darwin24] Also confirmed this affects Ruby 3.2.6 and 3.4.1. **Reproducible script:** ``` ruby require "socket" require "timeout" puts "Ruby #{RUBY_VERSION} on #{RUBY_PLATFORM}" Socket.getaddrinfo("api.segment.io", 443, nil, :STREAM) puts "Parent: DNS completed" pid = fork do puts "Child: Attempting DNS resolution..." begin Timeout.timeout(90) do Socket.getaddrinfo("api.segment.io", 443, nil, :STREAM) end puts "Child: SUCCESS" exit 0 rescue Timeout::Error puts "Child: FAILED - hung for 90 seconds" exit 1 end end Process.wait(pid) ``` **Note:** Remove the `Timeout.timeout(90)` wrapper to observe the hang indefinitely. The timeout is included only to allow the script to exit for testing purposes. **Result of reproduce process:** ``` Ruby 3.3.8 on arm64-darwin24 Parent: DNS completed Child: Attempting DNS resolution... Child: FAILED - hung for 90 seconds ``` The child process hangs with one thread consuming 100% CPU. Expected result: The child process should complete DNS resolution successfully, as it does on macOS 15.x and earlier. **Analysis:** Stack trace shows: Main thread: Blocked in `wait_getaddrinfo` → `_pthread_cond_wait` DNS thread: Spinning in `_gai_nat64_second_pass` → `nw_path_access_agent_cache` → `_os_log_preferences_refresh` → `SIGSEGV` The crash occurs in macOS's NAT64 synthesis code path. Ruby's signal handler catches the `SIGSEGV` but cannot recover, causing the DNS thread to spin. **Key observations:** - Only affects IPv4-only hosts. Hosts with IPv6 (like google.com) work correctly. - Using `AF_INET` instead of `AF_UNSPEC` works. `Socket.getaddrinfo("api.segment.io", 443, Socket::AF_INET, :STREAM)` succeeds. - Python is not affected. Python calls `getaddrinfo()` synchronously without a background thread. - Parent must do DNS before fork. If the parent has not called getaddrinfo(), the child works correctly. **Workaround:** - Use `resolv-replace` to bypass the native DNS resolver: `require "resolv-replace"` **Impact:** This breaks all Ruby applications using pre-forking worker models (Resque, Unicorn, Puma, Sidekiq, Passenger) on macOS Tahoe. **Apple Bug Report:** Filed with Apple as Feedback Assistant #FB21364061 ---Files-------------------------------- stack_trace.txt (66.6 KB) ruby_dns_fork_bug.rb (1.02 KB) -- https://bugs.ruby-lang.org/
Issue #21790 has been updated by adamoffat (Adam Moffat). To confirm: MacOS Sequoia also did not have this issue. ---------------------------------------- Bug #21790: `Socket.getaddrinfo` hangs after `fork()` on macOS 26.1 (Tahoe) for IPv4-only hosts https://bugs.ruby-lang.org/issues/21790#change-115794 * Author: adamoffat (Adam Moffat) * Status: Open * ruby -v: 3.3.8 * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- Ruby's `Socket.getaddrinfo` hangs indefinitely in forked child processes on macOS 26.1 (Tahoe) when resolving IPv4-only hostnames. This is a regression that does not occur on macOS 15.x (Sonoma) or earlier. **Ruby version:** ruby 3.3.8 (2025-04-09 revision b200bad6cd) [arm64-darwin24] Also confirmed this affects Ruby 3.2.6 and 3.4.1. **Reproducible script:** ``` ruby require "socket" require "timeout" puts "Ruby #{RUBY_VERSION} on #{RUBY_PLATFORM}" Socket.getaddrinfo("api.segment.io", 443, nil, :STREAM) puts "Parent: DNS completed" pid = fork do puts "Child: Attempting DNS resolution..." begin Timeout.timeout(90) do Socket.getaddrinfo("api.segment.io", 443, nil, :STREAM) end puts "Child: SUCCESS" exit 0 rescue Timeout::Error puts "Child: FAILED - hung for 90 seconds" exit 1 end end Process.wait(pid) ``` **Note:** Remove the `Timeout.timeout(90)` wrapper to observe the hang indefinitely. The timeout is included only to allow the script to exit for testing purposes. **Result of reproduce process:** ``` Ruby 3.3.8 on arm64-darwin24 Parent: DNS completed Child: Attempting DNS resolution... Child: FAILED - hung for 90 seconds ``` The child process hangs with one thread consuming 100% CPU. Expected result: The child process should complete DNS resolution successfully, as it does on macOS 15.x and earlier. **Analysis:** Stack trace shows: Main thread: Blocked in `wait_getaddrinfo` → `_pthread_cond_wait` DNS thread: Spinning in `_gai_nat64_second_pass` → `nw_path_access_agent_cache` → `_os_log_preferences_refresh` → `SIGSEGV` The crash occurs in macOS's NAT64 synthesis code path. Ruby's signal handler catches the `SIGSEGV` but cannot recover, causing the DNS thread to spin. **Key observations:** - Only affects IPv4-only hosts. Hosts with IPv6 (like google.com) work correctly. - Using `AF_INET` instead of `AF_UNSPEC` works. `Socket.getaddrinfo("api.segment.io", 443, Socket::AF_INET, :STREAM)` succeeds. - Python is not affected. Python calls `getaddrinfo()` synchronously without a background thread. - Parent must do DNS before fork. If the parent has not called getaddrinfo(), the child works correctly. **Workaround:** - Use `resolv-replace` to bypass the native DNS resolver: `require "resolv-replace"` **Impact:** This breaks all Ruby applications using pre-forking worker models (Resque, Unicorn, Puma, Sidekiq, Passenger) on macOS Tahoe. **Apple Bug Report:** Filed with Apple as Feedback Assistant #FB21364061 ---Files-------------------------------- stack_trace.txt (66.6 KB) ruby_dns_fork_bug.rb (1.02 KB) -- https://bugs.ruby-lang.org/
Issue #21790 has been updated by adamoffat (Adam Moffat). I saw that this was added in 3.4.0: https://github.com/ruby/ruby/pull/10864 Seen here: (https://github.com/ruby/ruby/releases/tag/v3_4_0_preview2) But I also tested this using 3.4.1 and it was still an issue. ---------------------------------------- Bug #21790: `Socket.getaddrinfo` hangs after `fork()` on macOS 26.1 (Tahoe) for IPv4-only hosts https://bugs.ruby-lang.org/issues/21790#change-115795 * Author: adamoffat (Adam Moffat) * Status: Open * ruby -v: 3.3.8 * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- Ruby's `Socket.getaddrinfo` hangs indefinitely in forked child processes on macOS 26.1 (Tahoe) when resolving IPv4-only hostnames. This is a regression that does not occur on macOS 15.x (Sonoma) or earlier. **Ruby version:** ruby 3.3.8 (2025-04-09 revision b200bad6cd) [arm64-darwin24] Also confirmed this affects Ruby 3.2.6 and 3.4.1. **Reproducible script:** ``` ruby require "socket" require "timeout" puts "Ruby #{RUBY_VERSION} on #{RUBY_PLATFORM}" Socket.getaddrinfo("api.segment.io", 443, nil, :STREAM) puts "Parent: DNS completed" pid = fork do puts "Child: Attempting DNS resolution..." begin Timeout.timeout(90) do Socket.getaddrinfo("api.segment.io", 443, nil, :STREAM) end puts "Child: SUCCESS" exit 0 rescue Timeout::Error puts "Child: FAILED - hung for 90 seconds" exit 1 end end Process.wait(pid) ``` **Note:** Remove the `Timeout.timeout(90)` wrapper to observe the hang indefinitely. The timeout is included only to allow the script to exit for testing purposes. **Result of reproduce process:** ``` Ruby 3.3.8 on arm64-darwin24 Parent: DNS completed Child: Attempting DNS resolution... Child: FAILED - hung for 90 seconds ``` The child process hangs with one thread consuming 100% CPU. Expected result: The child process should complete DNS resolution successfully, as it does on macOS 15.x and earlier. **Analysis:** Stack trace shows: Main thread: Blocked in `wait_getaddrinfo` → `_pthread_cond_wait` DNS thread: Spinning in `_gai_nat64_second_pass` → `nw_path_access_agent_cache` → `_os_log_preferences_refresh` → `SIGSEGV` The crash occurs in macOS's NAT64 synthesis code path. Ruby's signal handler catches the `SIGSEGV` but cannot recover, causing the DNS thread to spin. **Key observations:** - Only affects IPv4-only hosts. Hosts with IPv6 (like google.com) work correctly. - Using `AF_INET` instead of `AF_UNSPEC` works. `Socket.getaddrinfo("api.segment.io", 443, Socket::AF_INET, :STREAM)` succeeds. - Python is not affected. Python calls `getaddrinfo()` synchronously without a background thread. - Parent must do DNS before fork. If the parent has not called getaddrinfo(), the child works correctly. **Workaround:** - Use `resolv-replace` to bypass the native DNS resolver: `require "resolv-replace"` **Impact:** This breaks all Ruby applications using pre-forking worker models (Resque, Unicorn, Puma, Sidekiq, Passenger) on macOS Tahoe. **Apple Bug Report:** Filed with Apple as Feedback Assistant #FB21364061 ---Files-------------------------------- stack_trace.txt (66.6 KB) ruby_dns_fork_bug.rb (1.02 KB) -- https://bugs.ruby-lang.org/
Issue #21790 has been updated by mame (Yusuke Endoh). Thank you for the report. Since I don't have access to Tahoe, I cannot test this in my own environment. However, I have a few questions to clarify the situation. The change to perform DNS lookups in a dedicated background thread was introduced in Ruby 3.3.0. You mentioned that this affects Ruby 3.2.6 as well. Are you certain it reproduces on 3.2.6? If it fails on 3.2.6, the cause might be unrelated to the background thread, as its behavior should be similar to Python's. Would it be possible to provide a stack trace from the 3.2.6 crash? Though it's just a guess, this might be a bug with `getaddrinfo` on Tahoe itself, but I could be wrong. ---------------------------------------- Bug #21790: `Socket.getaddrinfo` hangs after `fork()` on macOS 26.1 (Tahoe) for IPv4-only hosts https://bugs.ruby-lang.org/issues/21790#change-115797 * Author: adamoffat (Adam Moffat) * Status: Open * ruby -v: 3.3.8 * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- Ruby's `Socket.getaddrinfo` hangs indefinitely in forked child processes on macOS 26.1 (Tahoe) when resolving IPv4-only hostnames. This is a regression that does not occur on macOS 15.x (Sonoma) or earlier. **Ruby version:** ruby 3.3.8 (2025-04-09 revision b200bad6cd) [arm64-darwin24] Also confirmed this affects Ruby 3.2.6 and 3.4.1. **Reproducible script:** ``` ruby require "socket" require "timeout" puts "Ruby #{RUBY_VERSION} on #{RUBY_PLATFORM}" Socket.getaddrinfo("api.segment.io", 443, nil, :STREAM) puts "Parent: DNS completed" pid = fork do puts "Child: Attempting DNS resolution..." begin Timeout.timeout(90) do Socket.getaddrinfo("api.segment.io", 443, nil, :STREAM) end puts "Child: SUCCESS" exit 0 rescue Timeout::Error puts "Child: FAILED - hung for 90 seconds" exit 1 end end Process.wait(pid) ``` **Note:** Remove the `Timeout.timeout(90)` wrapper to observe the hang indefinitely. The timeout is included only to allow the script to exit for testing purposes. **Result of reproduce process:** ``` Ruby 3.3.8 on arm64-darwin24 Parent: DNS completed Child: Attempting DNS resolution... Child: FAILED - hung for 90 seconds ``` The child process hangs with one thread consuming 100% CPU. Expected result: The child process should complete DNS resolution successfully, as it does on macOS 15.x and earlier. **Analysis:** Stack trace shows: Main thread: Blocked in `wait_getaddrinfo` → `_pthread_cond_wait` DNS thread: Spinning in `_gai_nat64_second_pass` → `nw_path_access_agent_cache` → `_os_log_preferences_refresh` → `SIGSEGV` The crash occurs in macOS's NAT64 synthesis code path. Ruby's signal handler catches the `SIGSEGV` but cannot recover, causing the DNS thread to spin. **Key observations:** - Only affects IPv4-only hosts. Hosts with IPv6 (like google.com) work correctly. - Using `AF_INET` instead of `AF_UNSPEC` works. `Socket.getaddrinfo("api.segment.io", 443, Socket::AF_INET, :STREAM)` succeeds. - Python is not affected. Python calls `getaddrinfo()` synchronously without a background thread. - Parent must do DNS before fork. If the parent has not called getaddrinfo(), the child works correctly. **Workaround:** - Use `resolv-replace` to bypass the native DNS resolver: `require "resolv-replace"` **Impact:** This breaks all Ruby applications using pre-forking worker models (Resque, Unicorn, Puma, Sidekiq, Passenger) on macOS Tahoe. **Apple Bug Report:** Filed with Apple as Feedback Assistant #FB21364061 ---Files-------------------------------- stack_trace.txt (66.6 KB) ruby_dns_fork_bug.rb (1.02 KB) -- https://bugs.ruby-lang.org/
Issue #21790 has been updated by adamoffat (Adam Moffat). File ruby_3.2.6_crash_output.txt added I saw that this was added in 3.4.0: https://github.com/ruby/ruby/pull/10864 Seen here: (https://github.com/ruby/ruby/releases/tag/v3_4_0_preview2) But I also tested this using 3.4.1 and it was still an issue. mame (Yusuke Endoh) wrote in #note-3:
Thank you for the report.
Since I don't have access to Tahoe, I cannot test this in my own environment. However, I have a few questions to clarify the situation.
The change to perform DNS lookups in a dedicated background thread was introduced in Ruby 3.3.0. You mentioned that this affects Ruby 3.2.6 as well. Are you certain it reproduces on 3.2.6?
If it fails on 3.2.6, the cause might be unrelated to the background thread, as its behavior should be similar to Python's. Would it be possible to provide a stack trace from the 3.2.6 crash?
Though it's just a guess, this might be a bug with `getaddrinfo` on Tahoe itself, but I could be wrong.
Ah yes, sorry I should have clarified this in my post. I tested this in 3.2.6 but it manifests differently in that version. I tested Ruby 3.2.6 with the same reproduction script. Rather than hanging indefinitely, Ruby 3.2.6 crashes immediately with a segmentation fault when the child process attempts DNS resolution. The crash occurs at the `getaddrinfo` call in the forked child. The backtrace shows the fault originating in macOS system libraries, specifically in `libsystem_trace.dylib` at `_os_log_preferences_refresh`. This confirms Ruby 3.2.6 is also affected by the same underlying issue - it just manifests as an immediate crash rather than a hang. I've attached the full crash output for reference. ---------------------------------------- Bug #21790: `Socket.getaddrinfo` hangs after `fork()` on macOS 26.1 (Tahoe) for IPv4-only hosts https://bugs.ruby-lang.org/issues/21790#change-115801 * Author: adamoffat (Adam Moffat) * Status: Open * ruby -v: 3.3.8 * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- Ruby's `Socket.getaddrinfo` hangs indefinitely in forked child processes on macOS 26.1 (Tahoe) when resolving IPv4-only hostnames. This is a regression that does not occur on macOS 15.x (Sonoma) or earlier. **Ruby version:** ruby 3.3.8 (2025-04-09 revision b200bad6cd) [arm64-darwin24] Also confirmed this affects Ruby 3.2.6 and 3.4.1. **Reproducible script:** ``` ruby require "socket" require "timeout" puts "Ruby #{RUBY_VERSION} on #{RUBY_PLATFORM}" Socket.getaddrinfo("api.segment.io", 443, nil, :STREAM) puts "Parent: DNS completed" pid = fork do puts "Child: Attempting DNS resolution..." begin Timeout.timeout(90) do Socket.getaddrinfo("api.segment.io", 443, nil, :STREAM) end puts "Child: SUCCESS" exit 0 rescue Timeout::Error puts "Child: FAILED - hung for 90 seconds" exit 1 end end Process.wait(pid) ``` **Note:** Remove the `Timeout.timeout(90)` wrapper to observe the hang indefinitely. The timeout is included only to allow the script to exit for testing purposes. **Result of reproduce process:** ``` Ruby 3.3.8 on arm64-darwin24 Parent: DNS completed Child: Attempting DNS resolution... Child: FAILED - hung for 90 seconds ``` The child process hangs with one thread consuming 100% CPU. Expected result: The child process should complete DNS resolution successfully, as it does on macOS 15.x and earlier. **Analysis:** Stack trace shows: Main thread: Blocked in `wait_getaddrinfo` → `_pthread_cond_wait` DNS thread: Spinning in `_gai_nat64_second_pass` → `nw_path_access_agent_cache` → `_os_log_preferences_refresh` → `SIGSEGV` The crash occurs in macOS's NAT64 synthesis code path. Ruby's signal handler catches the `SIGSEGV` but cannot recover, causing the DNS thread to spin. **Key observations:** - Only affects IPv4-only hosts. Hosts with IPv6 (like google.com) work correctly. - Using `AF_INET` instead of `AF_UNSPEC` works. `Socket.getaddrinfo("api.segment.io", 443, Socket::AF_INET, :STREAM)` succeeds. - Python is not affected. Python calls `getaddrinfo()` synchronously without a background thread. - Parent must do DNS before fork. If the parent has not called getaddrinfo(), the child works correctly. **Workaround:** - Use `resolv-replace` to bypass the native DNS resolver: `require "resolv-replace"` **Impact:** This breaks all Ruby applications using pre-forking worker models (Resque, Unicorn, Puma, Sidekiq, Passenger) on macOS Tahoe. **Apple Bug Report:** Filed with Apple as Feedback Assistant #FB21364061 ---Files-------------------------------- stack_trace.txt (66.6 KB) ruby_dns_fork_bug.rb (1.02 KB) ruby_3.2.6_crash_output.txt (1.79 KB) -- https://bugs.ruby-lang.org/
Issue #21790 has been updated by mame (Yusuke Endoh). Thank you. This looks like the same issue reported multiple times in the past, but we were previously stuck without a way to investigate. https://bugs.ruby-lang.org/issues/15490 https://bugs.ruby-lang.org/issues/15794 https://github.com/redis/redis-rb/issues/859 https://github.com/hanami/hanami/issues/993 It is greatly appreciated that the reproduction conditions are now much clearer. This issue does not affect Python even in a forked child process, right? If Python avoids this error, checking how it calls `getaddrinfo` might give us a hint for a fix or workaround. It is difficult for me to debug this without a reproducing environment. Are there any committers or contributors who can reproduce the issue and investigate? ---------------------------------------- Bug #21790: `Socket.getaddrinfo` hangs after `fork()` on macOS 26.1 (Tahoe) for IPv4-only hosts https://bugs.ruby-lang.org/issues/21790#change-115803 * Author: adamoffat (Adam Moffat) * Status: Open * ruby -v: 3.3.8 * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- Ruby's `Socket.getaddrinfo` hangs indefinitely in forked child processes on macOS 26.1 (Tahoe) when resolving IPv4-only hostnames. This is a regression that does not occur on macOS 15.x (Sonoma) or earlier. **Ruby version:** ruby 3.3.8 (2025-04-09 revision b200bad6cd) [arm64-darwin24] Also confirmed this affects Ruby 3.2.6 and 3.4.1. **Reproducible script:** ``` ruby require "socket" require "timeout" puts "Ruby #{RUBY_VERSION} on #{RUBY_PLATFORM}" Socket.getaddrinfo("api.segment.io", 443, nil, :STREAM) puts "Parent: DNS completed" pid = fork do puts "Child: Attempting DNS resolution..." begin Timeout.timeout(90) do Socket.getaddrinfo("api.segment.io", 443, nil, :STREAM) end puts "Child: SUCCESS" exit 0 rescue Timeout::Error puts "Child: FAILED - hung for 90 seconds" exit 1 end end Process.wait(pid) ``` **Note:** Remove the `Timeout.timeout(90)` wrapper to observe the hang indefinitely. The timeout is included only to allow the script to exit for testing purposes. **Result of reproduce process:** ``` Ruby 3.3.8 on arm64-darwin24 Parent: DNS completed Child: Attempting DNS resolution... Child: FAILED - hung for 90 seconds ``` The child process hangs with one thread consuming 100% CPU. Expected result: The child process should complete DNS resolution successfully, as it does on macOS 15.x and earlier. **Analysis:** Stack trace shows: Main thread: Blocked in `wait_getaddrinfo` → `_pthread_cond_wait` DNS thread: Spinning in `_gai_nat64_second_pass` → `nw_path_access_agent_cache` → `_os_log_preferences_refresh` → `SIGSEGV` The crash occurs in macOS's NAT64 synthesis code path. Ruby's signal handler catches the `SIGSEGV` but cannot recover, causing the DNS thread to spin. **Key observations:** - Only affects IPv4-only hosts. Hosts with IPv6 (like google.com) work correctly. - Using `AF_INET` instead of `AF_UNSPEC` works. `Socket.getaddrinfo("api.segment.io", 443, Socket::AF_INET, :STREAM)` succeeds. - Python is not affected. Python calls `getaddrinfo()` synchronously without a background thread. - Parent must do DNS before fork. If the parent has not called getaddrinfo(), the child works correctly. **Workaround:** - Use `resolv-replace` to bypass the native DNS resolver: `require "resolv-replace"` **Impact:** This breaks all Ruby applications using pre-forking worker models (Resque, Unicorn, Puma, Sidekiq, Passenger) on macOS Tahoe. **Apple Bug Report:** Filed with Apple as Feedback Assistant #FB21364061 ---Files-------------------------------- stack_trace.txt (66.6 KB) ruby_dns_fork_bug.rb (1.02 KB) ruby_3.2.6_crash_output.txt (1.79 KB) -- https://bugs.ruby-lang.org/
Issue #21790 has been updated by adamoffat (Adam Moffat). File python_dns_fork_test.py added mame (Yusuke Endoh) wrote in #note-5:
Thank you. This looks like the same issue reported multiple times in the past, but we were previously stuck without a way to investigate.
https://bugs.ruby-lang.org/issues/15490 https://bugs.ruby-lang.org/issues/15794 https://github.com/redis/redis-rb/issues/859 https://github.com/hanami/hanami/issues/993
It is greatly appreciated that the reproduction conditions are now much clearer.
This issue does not affect Python even in a forked child process, right? If Python avoids this error, checking how it calls `getaddrinfo` might give us a hint for a fix or workaround.
It is difficult for me to debug this without a reproducing environment. Are there any committers or contributors who can reproduce the issue and investigate?
Yes I can confirm that Python works (Python version 3.12.2) Here is a script in Python that executes the same conditions: ``` python #!/usr/bin/env python3 """ python_dns_fork_test.py - Test for macOS Tahoe DNS fork bug This script exercises the exact same conditions as the Ruby reproduction: 1. Parent process resolves DNS for an IPv4-only host (poisoning the child) 2. Fork a child process 3. Child attempts DNS resolution On macOS Tahoe with Ruby, this causes a hang/crash. On Python, this should work fine (demonstrating the bug is Ruby-specific). """ import socket import os import sys import signal TEST_HOST = "api.segment.io" # IPv4-only host TIMEOUT_SECONDS = 90 print(f"Python version: {sys.version}") print(f"Platform: {sys.platform}") print() # Parent does DNS resolution first (this "poisons" the child in Ruby) print(f"Parent: Resolving {TEST_HOST}...") socket.getaddrinfo(TEST_HOST, 443, socket.AF_UNSPEC, socket.SOCK_STREAM) print("Parent: Done") # Fork and try DNS in child print("Forking...") pid = os.fork() if pid == 0: # Child process print(f"Child ({os.getpid()}): Attempting DNS resolution...") # Set up alarm for timeout def timeout_handler(signum, frame): print("Child: FAILED - DNS resolution hung (bug present)") sys.exit(1) signal.signal(signal.SIGALRM, timeout_handler) signal.alarm(TIMEOUT_SECONDS) try: socket.getaddrinfo(TEST_HOST, 443, socket.AF_UNSPEC, socket.SOCK_STREAM) signal.alarm(0) # Cancel the alarm print("Child: SUCCESS - DNS resolution completed!") sys.exit(0) except Exception as e: signal.alarm(0) print(f"Child: ERROR - {e}") sys.exit(1) else: # Parent process _, status = os.waitpid(pid, 0) exit_status = os.WEXITSTATUS(status) print() if exit_status == 0: print("✅ Python does NOT have the DNS fork bug") else: print("❌ Python HAS the DNS fork bug") ``` and here is the output that I get: ``` Python version: 3.12.2 (main, Aug 22 2024, 10:56:22) [Clang 15.0.0 (clang-1500.3.9.4)] Platform: darwin Parent: Resolving api.segment.io... Parent: Done Forking... ✅ Python does NOT have the DNS fork bug ``` ---------------------------------------- Bug #21790: `Socket.getaddrinfo` hangs after `fork()` on macOS 26.1 (Tahoe) for IPv4-only hosts https://bugs.ruby-lang.org/issues/21790#change-115807 * Author: adamoffat (Adam Moffat) * Status: Open * ruby -v: 3.3.8 * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- Ruby's `Socket.getaddrinfo` hangs indefinitely in forked child processes on macOS 26.1 (Tahoe) when resolving IPv4-only hostnames. This is a regression that does not occur on macOS 15.x (Sonoma) or earlier. **Ruby version:** ruby 3.3.8 (2025-04-09 revision b200bad6cd) [arm64-darwin24] Also confirmed this affects Ruby 3.2.6 and 3.4.1. **Reproducible script:** ``` ruby require "socket" require "timeout" puts "Ruby #{RUBY_VERSION} on #{RUBY_PLATFORM}" Socket.getaddrinfo("api.segment.io", 443, nil, :STREAM) puts "Parent: DNS completed" pid = fork do puts "Child: Attempting DNS resolution..." begin Timeout.timeout(90) do Socket.getaddrinfo("api.segment.io", 443, nil, :STREAM) end puts "Child: SUCCESS" exit 0 rescue Timeout::Error puts "Child: FAILED - hung for 90 seconds" exit 1 end end Process.wait(pid) ``` **Note:** Remove the `Timeout.timeout(90)` wrapper to observe the hang indefinitely. The timeout is included only to allow the script to exit for testing purposes. **Result of reproduce process:** ``` Ruby 3.3.8 on arm64-darwin24 Parent: DNS completed Child: Attempting DNS resolution... Child: FAILED - hung for 90 seconds ``` The child process hangs with one thread consuming 100% CPU. Expected result: The child process should complete DNS resolution successfully, as it does on macOS 15.x and earlier. **Analysis:** Stack trace shows: Main thread: Blocked in `wait_getaddrinfo` → `_pthread_cond_wait` DNS thread: Spinning in `_gai_nat64_second_pass` → `nw_path_access_agent_cache` → `_os_log_preferences_refresh` → `SIGSEGV` The crash occurs in macOS's NAT64 synthesis code path. Ruby's signal handler catches the `SIGSEGV` but cannot recover, causing the DNS thread to spin. **Key observations:** - Only affects IPv4-only hosts. Hosts with IPv6 (like google.com) work correctly. - Using `AF_INET` instead of `AF_UNSPEC` works. `Socket.getaddrinfo("api.segment.io", 443, Socket::AF_INET, :STREAM)` succeeds. - Python is not affected. Python calls `getaddrinfo()` synchronously without a background thread. - Parent must do DNS before fork. If the parent has not called getaddrinfo(), the child works correctly. **Workaround:** - Use `resolv-replace` to bypass the native DNS resolver: `require "resolv-replace"` **Impact:** This breaks all Ruby applications using pre-forking worker models (Resque, Unicorn, Puma, Sidekiq, Passenger) on macOS Tahoe. **Apple Bug Report:** Filed with Apple as Feedback Assistant #FB21364061 ---Files-------------------------------- stack_trace.txt (66.6 KB) ruby_dns_fork_bug.rb (1.02 KB) ruby_3.2.6_crash_output.txt (1.79 KB) python_dns_fork_test.py (1.8 KB) -- https://bugs.ruby-lang.org/
Issue #21790 has been updated by adamoffat (Adam Moffat). File python_dns_fork_test.py added File python_crash_output.txt added Ah my earlier Python script had a bug. My initial Python test incorrectly reported success. The script used `os.WEXITSTATUS()` to check the child's exit status, but this function only works for processes that exit normally. When a process is killed by a signal (`SIGSEGV`), it returns 0, giving a false positive. After fixing the script to check `os.WIFSIGNALED()`, I was able to confirm the child is killed by signal 11 (`SIGSEGV`). The crash logs show the identical stack trace to Ruby: `_gai_nat64_second_pass` → `nw_path_access_agent_cache` → _`os_log_preferences_refresh`. This is an OS-level bug in macOS Tahoe, not language-specific. My apologies. ---------------------------------------- Bug #21790: `Socket.getaddrinfo` hangs after `fork()` on macOS 26.1 (Tahoe) for IPv4-only hosts https://bugs.ruby-lang.org/issues/21790#change-115808 * Author: adamoffat (Adam Moffat) * Status: Open * ruby -v: 3.3.8 * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- Ruby's `Socket.getaddrinfo` hangs indefinitely in forked child processes on macOS 26.1 (Tahoe) when resolving IPv4-only hostnames. This is a regression that does not occur on macOS 15.x (Sonoma) or earlier. **Ruby version:** ruby 3.3.8 (2025-04-09 revision b200bad6cd) [arm64-darwin24] Also confirmed this affects Ruby 3.2.6 and 3.4.1. **Reproducible script:** ``` ruby require "socket" require "timeout" puts "Ruby #{RUBY_VERSION} on #{RUBY_PLATFORM}" Socket.getaddrinfo("api.segment.io", 443, nil, :STREAM) puts "Parent: DNS completed" pid = fork do puts "Child: Attempting DNS resolution..." begin Timeout.timeout(90) do Socket.getaddrinfo("api.segment.io", 443, nil, :STREAM) end puts "Child: SUCCESS" exit 0 rescue Timeout::Error puts "Child: FAILED - hung for 90 seconds" exit 1 end end Process.wait(pid) ``` **Note:** Remove the `Timeout.timeout(90)` wrapper to observe the hang indefinitely. The timeout is included only to allow the script to exit for testing purposes. **Result of reproduce process:** ``` Ruby 3.3.8 on arm64-darwin24 Parent: DNS completed Child: Attempting DNS resolution... Child: FAILED - hung for 90 seconds ``` The child process hangs with one thread consuming 100% CPU. Expected result: The child process should complete DNS resolution successfully, as it does on macOS 15.x and earlier. **Analysis:** Stack trace shows: Main thread: Blocked in `wait_getaddrinfo` → `_pthread_cond_wait` DNS thread: Spinning in `_gai_nat64_second_pass` → `nw_path_access_agent_cache` → `_os_log_preferences_refresh` → `SIGSEGV` The crash occurs in macOS's NAT64 synthesis code path. Ruby's signal handler catches the `SIGSEGV` but cannot recover, causing the DNS thread to spin. **Key observations:** - Only affects IPv4-only hosts. Hosts with IPv6 (like google.com) work correctly. - Using `AF_INET` instead of `AF_UNSPEC` works. `Socket.getaddrinfo("api.segment.io", 443, Socket::AF_INET, :STREAM)` succeeds. - Python is not affected. Python calls `getaddrinfo()` synchronously without a background thread. - Parent must do DNS before fork. If the parent has not called getaddrinfo(), the child works correctly. **Workaround:** - Use `resolv-replace` to bypass the native DNS resolver: `require "resolv-replace"` **Impact:** This breaks all Ruby applications using pre-forking worker models (Resque, Unicorn, Puma, Sidekiq, Passenger) on macOS Tahoe. **Apple Bug Report:** Filed with Apple as Feedback Assistant #FB21364061 ---Files-------------------------------- stack_trace.txt (66.6 KB) ruby_dns_fork_bug.rb (1.02 KB) ruby_3.2.6_crash_output.txt (1.79 KB) python_dns_fork_test.py (1.8 KB) python_dns_fork_test.py (1.97 KB) python_crash_output.txt (1.28 KB) -- https://bugs.ruby-lang.org/
Issue #21790 has been updated by mame (Yusuke Endoh). Status changed from Open to Third Party's Issue Thank you for your confirmation. This is most likely a macOS bug, so I'd close this as a third-party issue. It would be the best for macOS to fix the issue, but if someone finds a workaround, I'd consider importing it in the Ruby side. ---------------------------------------- Bug #21790: `Socket.getaddrinfo` hangs after `fork()` on macOS 26.1 (Tahoe) for IPv4-only hosts https://bugs.ruby-lang.org/issues/21790#change-116074 * Author: adamoffat (Adam Moffat) * Status: Third Party's Issue * ruby -v: 3.3.8 * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN ---------------------------------------- Ruby's `Socket.getaddrinfo` hangs indefinitely in forked child processes on macOS 26.1 (Tahoe) when resolving IPv4-only hostnames. This is a regression that does not occur on macOS 15.x (Sonoma) or earlier. **Ruby version:** ruby 3.3.8 (2025-04-09 revision b200bad6cd) [arm64-darwin24] Also confirmed this affects Ruby 3.2.6 and 3.4.1. **Reproducible script:** ``` ruby require "socket" require "timeout" puts "Ruby #{RUBY_VERSION} on #{RUBY_PLATFORM}" Socket.getaddrinfo("api.segment.io", 443, nil, :STREAM) puts "Parent: DNS completed" pid = fork do puts "Child: Attempting DNS resolution..." begin Timeout.timeout(90) do Socket.getaddrinfo("api.segment.io", 443, nil, :STREAM) end puts "Child: SUCCESS" exit 0 rescue Timeout::Error puts "Child: FAILED - hung for 90 seconds" exit 1 end end Process.wait(pid) ``` **Note:** Remove the `Timeout.timeout(90)` wrapper to observe the hang indefinitely. The timeout is included only to allow the script to exit for testing purposes. **Result of reproduce process:** ``` Ruby 3.3.8 on arm64-darwin24 Parent: DNS completed Child: Attempting DNS resolution... Child: FAILED - hung for 90 seconds ``` The child process hangs with one thread consuming 100% CPU. Expected result: The child process should complete DNS resolution successfully, as it does on macOS 15.x and earlier. **Analysis:** Stack trace shows: Main thread: Blocked in `wait_getaddrinfo` → `_pthread_cond_wait` DNS thread: Spinning in `_gai_nat64_second_pass` → `nw_path_access_agent_cache` → `_os_log_preferences_refresh` → `SIGSEGV` The crash occurs in macOS's NAT64 synthesis code path. Ruby's signal handler catches the `SIGSEGV` but cannot recover, causing the DNS thread to spin. **Key observations:** - Only affects IPv4-only hosts. Hosts with IPv6 (like google.com) work correctly. - Using `AF_INET` instead of `AF_UNSPEC` works. `Socket.getaddrinfo("api.segment.io", 443, Socket::AF_INET, :STREAM)` succeeds. - Python is not affected. Python calls `getaddrinfo()` synchronously without a background thread. - Parent must do DNS before fork. If the parent has not called getaddrinfo(), the child works correctly. **Workaround:** - Use `resolv-replace` to bypass the native DNS resolver: `require "resolv-replace"` **Impact:** This breaks all Ruby applications using pre-forking worker models (Resque, Unicorn, Puma, Sidekiq, Passenger) on macOS Tahoe. **Apple Bug Report:** Filed with Apple as Feedback Assistant #FB21364061 ---Files-------------------------------- stack_trace.txt (66.6 KB) ruby_dns_fork_bug.rb (1.02 KB) ruby_3.2.6_crash_output.txt (1.79 KB) python_dns_fork_test.py (1.8 KB) python_dns_fork_test.py (1.97 KB) python_crash_output.txt (1.28 KB) -- https://bugs.ruby-lang.org/
participants (2)
-
adamoffat (Adam Moffat) -
mame (Yusuke Endoh)