[ruby-core:112213] [Ruby master Bug#19412] Socket starts queueing and not responding after a certain amount of requests

Issue #19412 has been reported by brodock (Gabriel Mazetto). ---------------------------------------- Bug #19412: Socket starts queueing and not responding after a certain amount of requests https://bugs.ruby-lang.org/issues/19412 * Author: brodock (Gabriel Mazetto) * Status: Open * Priority: Normal * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- Here is a an example code: ``` require 'socket' PORT = 8080 BACKLOG = 50 LISTEN = '127.0.0.1' def handle_connection(connection, _addrinfo) request = connection.gets puts request connection.write "HTTP/1.1 200\r\n" connection.write "Content-Type: text/html\r\n" connection.write "\r\n" connection.write "Hello world! Current time is #{Time.now}" ensure connection.close end begin socket = Socket.new(:INET, :STREAM) socket.bind(Addrinfo.tcp(LISTEN, PORT)) socket.listen(BACKLOG) puts "Requested queue size: #{BACKLOG} bigger then limit: #{Socket::SOMAXCONN}" if BACKLOG > Socket::SOMAXCONN loop do listening, = IO.select([socket]) io, = listening connection, addrinfo = io.accept handle_connection(connection, addrinfo) end ensure socket.close end ``` This tries to simulate a TCP server that responds as if it was an HTTP server. The amount of requests it can sustain seems to depend on the OS. On a Linux machine running ubuntu 20.04 I get something around 7.6K to 7.8K until it stops responding. ``` $ uname -a Linux ... 5.17.5-x86_64-linode154 #1 SMP PREEMPT Mon May 2 15:07:22 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux $ ab -n 20000 -c 50 http://127.0.0.1:8080/ This is ApacheBench, Version 2.3 <$Revision: 1843412 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 127.0.0.1 (be patient) Completed 2000 requests Completed 4000 requests Completed 6000 requests apr_pollset_poll: The timeout specified has expired (70007) Total of 7883 requests completed ``` on MacOS Ventura I get around 16K: ``` $ uname -a Darwin ... 22.2.0 Darwin Kernel Version 22.2.0: Fri Nov 11 02:04:44 PST 2022; root:xnu-8792.61.2~4/RELEASE_ARM64_T8103 arm64 $ ab -n 20000 -c 50 http://127.0.0.1:8080/ This is ApacheBench, Version 2.3 <$Revision: 1901567 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 127.0.0.1 (be patient) Completed 2000 requests Completed 4000 requests Completed 6000 requests Completed 8000 requests Completed 10000 requests Completed 12000 requests Completed 14000 requests Completed 16000 requests apr_socket_recv: Operation timed out (60) Total of 16375 requests completed ``` in both cases when that limit reaches, if I abort the program (CTRL-C) and try to run it again it takes a while until the port is released: ``` ... GET / HTTP/1.0 GET / HTTP/1.0 GET / HTTP/1.0 ^Csocketserver.rb:29:in `select': Interrupt from socketserver.rb:29:in `block in <main>' from socketserver.rb:28:in `loop' from socketserver.rb:28:in `<main>' $ ruby socketserver.rb socketserver.rb:23:in `bind': Address already in use - bind(2) for 127.0.0.1:8080 (Errno::EADDRINUSE) from socketserver.rb:23:in `<main>' ``` After killing the process it seems no process is holding the port: ``` lsof -wni tcp:8080 ``` Running the command above does not return anything (it does when the program is still running). I think we may be failing to release something when interacting with the Socket on syscalls/kernel level and we endup filling up some queue/buffer that eventually gets freed. -- https://bugs.ruby-lang.org/

Issue #19412 has been updated by jeremyevans0 (Jeremy Evans). To try to narrow down the issue, does it still occur if you change the main loop section to?: ```ruby begin socket = TCPServer.new(LISTEN, PORT) socket.listen(BACKLOG) puts "Requested queue size: #{BACKLOG} bigger then limit: #{Socket::SOMAXCONN}" if BACKLOG > Socket::SOMAXCONN loop do handle_connection(socket.accept, nil) end ensure socket.close end ``` ---------------------------------------- Bug #19412: Socket starts queueing and not responding after a certain amount of requests https://bugs.ruby-lang.org/issues/19412#change-101782 * Author: brodock (Gabriel Mazetto) * Status: Open * Priority: Normal * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- Here is a an example code: ``` require 'socket' PORT = 8080 BACKLOG = 50 LISTEN = '127.0.0.1' def handle_connection(connection, _addrinfo) request = connection.gets puts request connection.write "HTTP/1.1 200\r\n" connection.write "Content-Type: text/html\r\n" connection.write "\r\n" connection.write "Hello world! Current time is #{Time.now}" ensure connection.close end begin socket = Socket.new(:INET, :STREAM) socket.bind(Addrinfo.tcp(LISTEN, PORT)) socket.listen(BACKLOG) puts "Requested queue size: #{BACKLOG} bigger then limit: #{Socket::SOMAXCONN}" if BACKLOG > Socket::SOMAXCONN loop do listening, = IO.select([socket]) io, = listening connection, addrinfo = io.accept handle_connection(connection, addrinfo) end ensure socket.close end ``` This tries to simulate a TCP server that responds as if it was an HTTP server. The amount of requests it can sustain seems to depend on the OS. On a Linux machine running ubuntu 20.04 I get something around 7.6K to 7.8K until it stops responding. ``` $ uname -a Linux ... 5.17.5-x86_64-linode154 #1 SMP PREEMPT Mon May 2 15:07:22 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux $ ab -n 20000 -c 50 http://127.0.0.1:8080/ This is ApacheBench, Version 2.3 <$Revision: 1843412 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 127.0.0.1 (be patient) Completed 2000 requests Completed 4000 requests Completed 6000 requests apr_pollset_poll: The timeout specified has expired (70007) Total of 7883 requests completed ``` on MacOS Ventura I get around 16K: ``` $ uname -a Darwin ... 22.2.0 Darwin Kernel Version 22.2.0: Fri Nov 11 02:04:44 PST 2022; root:xnu-8792.61.2~4/RELEASE_ARM64_T8103 arm64 $ ab -n 20000 -c 50 http://127.0.0.1:8080/ This is ApacheBench, Version 2.3 <$Revision: 1901567 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 127.0.0.1 (be patient) Completed 2000 requests Completed 4000 requests Completed 6000 requests Completed 8000 requests Completed 10000 requests Completed 12000 requests Completed 14000 requests Completed 16000 requests apr_socket_recv: Operation timed out (60) Total of 16375 requests completed ``` in both cases when that limit reaches, if I abort the program (CTRL-C) and try to run it again it takes a while until the port is released: ``` ... GET / HTTP/1.0 GET / HTTP/1.0 GET / HTTP/1.0 ^Csocketserver.rb:29:in `select': Interrupt from socketserver.rb:29:in `block in <main>' from socketserver.rb:28:in `loop' from socketserver.rb:28:in `<main>' $ ruby socketserver.rb socketserver.rb:23:in `bind': Address already in use - bind(2) for 127.0.0.1:8080 (Errno::EADDRINUSE) from socketserver.rb:23:in `<main>' ``` After killing the process it seems no process is holding the port: ``` lsof -wni tcp:8080 ``` Running the command above does not return anything (it does when the program is still running). I think we may be failing to release something when interacting with the Socket on syscalls/kernel level and we endup filling up some queue/buffer that eventually gets freed. -- https://bugs.ruby-lang.org/

Issue #19412 has been updated by brodock (Gabriel Mazetto). @jeremyevans0 yes, it behaves the same regarding the max amount of requests it can handle: ``` $ ab -n 20000 -c 50 http://127.0.0.1:8080/ This is ApacheBench, Version 2.3 <$Revision: 1901567 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 127.0.0.1 (be patient) Completed 2000 requests Completed 4000 requests Completed 6000 requests Completed 8000 requests Completed 10000 requests Completed 12000 requests Completed 14000 requests Completed 16000 requests apr_socket_recv: Operation timed out (60) Total of 16438 requests completed ``` the one difference is that I can bind right after I abort the program, but it takes a while for it to run the requests again, and when it does, it get limitar 16k requests... ``` $ ab -n 20000 -c 50 http://127.0.0.1:8080/ This is ApacheBench, Version 2.3 <$Revision: 1901567 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 127.0.0.1 (be patient) Completed 2000 requests Completed 4000 requests Completed 6000 requests Completed 8000 requests Completed 10000 requests Completed 12000 requests Completed 14000 requests Completed 16000 requests apr_socket_recv: Operation timed out (60) Total of 16398 requests completed ``` ---------------------------------------- Bug #19412: Socket starts queueing and not responding after a certain amount of requests https://bugs.ruby-lang.org/issues/19412#change-102095 * Author: brodock (Gabriel Mazetto) * Status: Open * Priority: Normal * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- Here is a an example code: ``` require 'socket' PORT = 8080 BACKLOG = 50 LISTEN = '127.0.0.1' def handle_connection(connection, _addrinfo) request = connection.gets puts request connection.write "HTTP/1.1 200\r\n" connection.write "Content-Type: text/html\r\n" connection.write "\r\n" connection.write "Hello world! Current time is #{Time.now}" ensure connection.close end begin socket = Socket.new(:INET, :STREAM) socket.bind(Addrinfo.tcp(LISTEN, PORT)) socket.listen(BACKLOG) puts "Requested queue size: #{BACKLOG} bigger then limit: #{Socket::SOMAXCONN}" if BACKLOG > Socket::SOMAXCONN loop do listening, = IO.select([socket]) io, = listening connection, addrinfo = io.accept handle_connection(connection, addrinfo) end ensure socket.close end ``` This tries to simulate a TCP server that responds as if it was an HTTP server. The amount of requests it can sustain seems to depend on the OS. On a Linux machine running ubuntu 20.04 I get something around 7.6K to 7.8K until it stops responding. ``` $ uname -a Linux ... 5.17.5-x86_64-linode154 #1 SMP PREEMPT Mon May 2 15:07:22 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux $ ab -n 20000 -c 50 http://127.0.0.1:8080/ This is ApacheBench, Version 2.3 <$Revision: 1843412 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 127.0.0.1 (be patient) Completed 2000 requests Completed 4000 requests Completed 6000 requests apr_pollset_poll: The timeout specified has expired (70007) Total of 7883 requests completed ``` on MacOS Ventura I get around 16K: ``` $ uname -a Darwin ... 22.2.0 Darwin Kernel Version 22.2.0: Fri Nov 11 02:04:44 PST 2022; root:xnu-8792.61.2~4/RELEASE_ARM64_T8103 arm64 $ ab -n 20000 -c 50 http://127.0.0.1:8080/ This is ApacheBench, Version 2.3 <$Revision: 1901567 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 127.0.0.1 (be patient) Completed 2000 requests Completed 4000 requests Completed 6000 requests Completed 8000 requests Completed 10000 requests Completed 12000 requests Completed 14000 requests Completed 16000 requests apr_socket_recv: Operation timed out (60) Total of 16375 requests completed ``` in both cases when that limit reaches, if I abort the program (CTRL-C) and try to run it again it takes a while until the port is released: ``` ... GET / HTTP/1.0 GET / HTTP/1.0 GET / HTTP/1.0 ^Csocketserver.rb:29:in `select': Interrupt from socketserver.rb:29:in `block in <main>' from socketserver.rb:28:in `loop' from socketserver.rb:28:in `<main>' $ ruby socketserver.rb socketserver.rb:23:in `bind': Address already in use - bind(2) for 127.0.0.1:8080 (Errno::EADDRINUSE) from socketserver.rb:23:in `<main>' ``` After killing the process it seems no process is holding the port: ``` lsof -wni tcp:8080 ``` Running the command above does not return anything (it does when the program is still running). I think we may be failing to release something when interacting with the Socket on syscalls/kernel level and we endup filling up some queue/buffer that eventually gets freed. -- https://bugs.ruby-lang.org/

Issue #19412 has been updated by jeremyevans0 (Jeremy Evans). Using the server given, I cannot reproduce using the following Ruby code: ```ruby require 'socket' PORT = 8080 LISTEN = '127.0.0.1' 20000.times do socket = TCPSocket.new(LISTEN, PORT) socket.write("HTTP/1.1 GET /\n") socket.shutdown(Socket::SHUT_WR) socket.read ensure socket&.close end ``` Is there a way you can reproduce this using ruby as the client instead of ApacheBench? That would allow for easier testing. Also, does this reproduce when using ApacheBench with `-c 1`? ---------------------------------------- Bug #19412: Socket starts queueing and not responding after a certain amount of requests https://bugs.ruby-lang.org/issues/19412#change-102096 * Author: brodock (Gabriel Mazetto) * Status: Open * Priority: Normal * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- Here is a an example code: ``` require 'socket' PORT = 8080 BACKLOG = 50 LISTEN = '127.0.0.1' def handle_connection(connection, _addrinfo) request = connection.gets puts request connection.write "HTTP/1.1 200\r\n" connection.write "Content-Type: text/html\r\n" connection.write "\r\n" connection.write "Hello world! Current time is #{Time.now}" ensure connection.close end begin socket = Socket.new(:INET, :STREAM) socket.bind(Addrinfo.tcp(LISTEN, PORT)) socket.listen(BACKLOG) puts "Requested queue size: #{BACKLOG} bigger then limit: #{Socket::SOMAXCONN}" if BACKLOG > Socket::SOMAXCONN loop do listening, = IO.select([socket]) io, = listening connection, addrinfo = io.accept handle_connection(connection, addrinfo) end ensure socket.close end ``` This tries to simulate a TCP server that responds as if it was an HTTP server. The amount of requests it can sustain seems to depend on the OS. On a Linux machine running ubuntu 20.04 I get something around 7.6K to 7.8K until it stops responding. ``` $ uname -a Linux ... 5.17.5-x86_64-linode154 #1 SMP PREEMPT Mon May 2 15:07:22 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux $ ab -n 20000 -c 50 http://127.0.0.1:8080/ This is ApacheBench, Version 2.3 <$Revision: 1843412 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 127.0.0.1 (be patient) Completed 2000 requests Completed 4000 requests Completed 6000 requests apr_pollset_poll: The timeout specified has expired (70007) Total of 7883 requests completed ``` on MacOS Ventura I get around 16K: ``` $ uname -a Darwin ... 22.2.0 Darwin Kernel Version 22.2.0: Fri Nov 11 02:04:44 PST 2022; root:xnu-8792.61.2~4/RELEASE_ARM64_T8103 arm64 $ ab -n 20000 -c 50 http://127.0.0.1:8080/ This is ApacheBench, Version 2.3 <$Revision: 1901567 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 127.0.0.1 (be patient) Completed 2000 requests Completed 4000 requests Completed 6000 requests Completed 8000 requests Completed 10000 requests Completed 12000 requests Completed 14000 requests Completed 16000 requests apr_socket_recv: Operation timed out (60) Total of 16375 requests completed ``` in both cases when that limit reaches, if I abort the program (CTRL-C) and try to run it again it takes a while until the port is released: ``` ... GET / HTTP/1.0 GET / HTTP/1.0 GET / HTTP/1.0 ^Csocketserver.rb:29:in `select': Interrupt from socketserver.rb:29:in `block in <main>' from socketserver.rb:28:in `loop' from socketserver.rb:28:in `<main>' $ ruby socketserver.rb socketserver.rb:23:in `bind': Address already in use - bind(2) for 127.0.0.1:8080 (Errno::EADDRINUSE) from socketserver.rb:23:in `<main>' ``` After killing the process it seems no process is holding the port: ``` lsof -wni tcp:8080 ``` Running the command above does not return anything (it does when the program is still running). I think we may be failing to release something when interacting with the Socket on syscalls/kernel level and we endup filling up some queue/buffer that eventually gets freed. -- https://bugs.ruby-lang.org/

Issue #19412 has been updated by brodock (Gabriel Mazetto). ruby -v set to ruby 3.2.1 (2023-02-08 revision 31819e82c8) [arm64-darwin22] @jeremyevans0 running apache benchmark with `-c1` does show some of the behavior. It will go from 0 to somethign around 16000 in 4 seconds and get stuck there, see a 3x run log: ``` ab -n 20000 -c 1 http://127.0.0.1:8080/ This is ApacheBench, Version 2.3 <$Revision: 1901567 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 127.0.0.1 (be patient) Completed 2000 requests Completed 4000 requests Completed 6000 requests Completed 8000 requests Completed 10000 requests Completed 12000 requests Completed 14000 requests Completed 16000 requests apr_socket_recv: Operation timed out (60) Total of 16346 requests completed ``` ``` ab -n 20000 -c 1 http://127.0.0.1:8080/ This is ApacheBench, Version 2.3 <$Revision: 1901567 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 127.0.0.1 (be patient) Completed 2000 requests Completed 4000 requests Completed 6000 requests Completed 8000 requests Completed 10000 requests Completed 12000 requests Completed 14000 requests Completed 16000 requests apr_socket_recv: Operation timed out (60) Total of 16347 requests completed ``` ``` ab -n 20000 -c 1 http://127.0.0.1:8080/ This is ApacheBench, Version 2.3 <$Revision: 1901567 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 127.0.0.1 (be patient) Completed 2000 requests Completed 4000 requests Completed 6000 requests Completed 8000 requests Completed 10000 requests Completed 12000 requests Completed 14000 requests Completed 16000 requests apr_socket_recv: Operation timed out (60) Total of 16349 requests completed ``` I've also run a version of your client and got to reproduce the error (with similar output): ```ruby require 'socket' PORT = 8080 LISTEN = '127.0.0.1' NTIMES = 20000 NTIMES.times do |t| socket = TCPSocket.new(LISTEN, PORT) socket.write("HTTP/1.1 GET /\n") socket.shutdown(Socket::SHUT_WR) socket.read puts "Completed #{t} requests" if t > 0 && t % (NTIMES / 10) == 0 rescue Exception => e puts "Total of #{t-1} requests completed" raise e ensure socket&.close end ``` ``` ruby --version Could not locate Gemfile or .bundle/ directory ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [arm64-darwin21] ruby client.rb Could not locate Gemfile or .bundle/ directory Completed 2000 requests Completed 4000 requests Completed 6000 requests Completed 8000 requests Completed 10000 requests Completed 12000 requests Completed 14000 requests Completed 16000 requests Total of 16344 requests completed client.rb:8:in `initialize': Operation timed out - connect(2) for "127.0.0.1" port 8080 (Errno::ETIMEDOUT) from client.rb:8:in `new' from client.rb:8:in `block in <main>' from client.rb:7:in `times' from client.rb:7:in `<main>' ``` and with ruby 3.2.1: ``` rvm use 3.2.1 Using /Users/brodock/.rvm/gems/ruby-3.2.1 ruby --version Could not locate Gemfile or .bundle/ directory ruby 3.2.1 (2023-02-08 revision 31819e82c8) [arm64-darwin22] ruby client.rb Could not locate Gemfile or .bundle/ directory Completed 2000 requests Completed 4000 requests Completed 6000 requests Completed 8000 requests Completed 10000 requests Completed 12000 requests Completed 14000 requests Completed 16000 requests Total of 16342 requests completed client.rb:8:in `initialize': connect(2) for "127.0.0.1" port 8080 (IO::TimeoutError) from client.rb:8:in `new' from client.rb:8:in `block in <main>' from client.rb:7:in `times' from client.rb:7:in `<main>' ``` ---------------------------------------- Bug #19412: Socket starts queueing and not responding after a certain amount of requests https://bugs.ruby-lang.org/issues/19412#change-102394 * Author: brodock (Gabriel Mazetto) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.1 (2023-02-08 revision 31819e82c8) [arm64-darwin22] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- Here is a an example code: ``` require 'socket' PORT = 8080 BACKLOG = 50 LISTEN = '127.0.0.1' def handle_connection(connection, _addrinfo) request = connection.gets puts request connection.write "HTTP/1.1 200\r\n" connection.write "Content-Type: text/html\r\n" connection.write "\r\n" connection.write "Hello world! Current time is #{Time.now}" ensure connection.close end begin socket = Socket.new(:INET, :STREAM) socket.bind(Addrinfo.tcp(LISTEN, PORT)) socket.listen(BACKLOG) puts "Requested queue size: #{BACKLOG} bigger then limit: #{Socket::SOMAXCONN}" if BACKLOG > Socket::SOMAXCONN loop do listening, = IO.select([socket]) io, = listening connection, addrinfo = io.accept handle_connection(connection, addrinfo) end ensure socket.close end ``` This tries to simulate a TCP server that responds as if it was an HTTP server. The amount of requests it can sustain seems to depend on the OS. On a Linux machine running ubuntu 20.04 I get something around 7.6K to 7.8K until it stops responding. ``` $ uname -a Linux ... 5.17.5-x86_64-linode154 #1 SMP PREEMPT Mon May 2 15:07:22 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux $ ab -n 20000 -c 50 http://127.0.0.1:8080/ This is ApacheBench, Version 2.3 <$Revision: 1843412 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 127.0.0.1 (be patient) Completed 2000 requests Completed 4000 requests Completed 6000 requests apr_pollset_poll: The timeout specified has expired (70007) Total of 7883 requests completed ``` on MacOS Ventura I get around 16K: ``` $ uname -a Darwin ... 22.2.0 Darwin Kernel Version 22.2.0: Fri Nov 11 02:04:44 PST 2022; root:xnu-8792.61.2~4/RELEASE_ARM64_T8103 arm64 $ ab -n 20000 -c 50 http://127.0.0.1:8080/ This is ApacheBench, Version 2.3 <$Revision: 1901567 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 127.0.0.1 (be patient) Completed 2000 requests Completed 4000 requests Completed 6000 requests Completed 8000 requests Completed 10000 requests Completed 12000 requests Completed 14000 requests Completed 16000 requests apr_socket_recv: Operation timed out (60) Total of 16375 requests completed ``` in both cases when that limit reaches, if I abort the program (CTRL-C) and try to run it again it takes a while until the port is released: ``` ... GET / HTTP/1.0 GET / HTTP/1.0 GET / HTTP/1.0 ^Csocketserver.rb:29:in `select': Interrupt from socketserver.rb:29:in `block in <main>' from socketserver.rb:28:in `loop' from socketserver.rb:28:in `<main>' $ ruby socketserver.rb socketserver.rb:23:in `bind': Address already in use - bind(2) for 127.0.0.1:8080 (Errno::EADDRINUSE) from socketserver.rb:23:in `<main>' ``` After killing the process it seems no process is holding the port: ``` lsof -wni tcp:8080 ``` Running the command above does not return anything (it does when the program is still running). I think we may be failing to release something when interacting with the Socket on syscalls/kernel level and we endup filling up some queue/buffer that eventually gets freed. -- https://bugs.ruby-lang.org/

Issue #19412 has been updated by jeremyevans0 (Jeremy Evans). brodock (Gabriel Mazetto) wrote in #note-4:
I've also run a version of your client and got to reproduce the error (with similar output):
```ruby require 'socket'
PORT = 8080 LISTEN = '127.0.0.1' NTIMES = 20000
NTIMES.times do |t| socket = TCPSocket.new(LISTEN, PORT) socket.write("HTTP/1.1 GET /\n") socket.shutdown(Socket::SHUT_WR) socket.read
puts "Completed #{t} requests" if t > 0 && t % (NTIMES / 10) == 0 rescue Exception => e puts "Total of #{t-1} requests completed"
raise e ensure socket&.close end ```
I tried that client program in my environment (OpenBSD/amd64), and was able to run it without failure with both Ruby 3.1 and 3.2. In terms of further testing, it looks like you are using rvm. While it seems unlikely that that is the problem, can you try using a system-provided Ruby? Also, did you test the above client program on Linux (the output shows MacOS)? ---------------------------------------- Bug #19412: Socket starts queueing and not responding after a certain amount of requests https://bugs.ruby-lang.org/issues/19412#change-102395 * Author: brodock (Gabriel Mazetto) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.1 (2023-02-08 revision 31819e82c8) [arm64-darwin22] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- Here is a an example code: ``` require 'socket' PORT = 8080 BACKLOG = 50 LISTEN = '127.0.0.1' def handle_connection(connection, _addrinfo) request = connection.gets puts request connection.write "HTTP/1.1 200\r\n" connection.write "Content-Type: text/html\r\n" connection.write "\r\n" connection.write "Hello world! Current time is #{Time.now}" ensure connection.close end begin socket = Socket.new(:INET, :STREAM) socket.bind(Addrinfo.tcp(LISTEN, PORT)) socket.listen(BACKLOG) puts "Requested queue size: #{BACKLOG} bigger then limit: #{Socket::SOMAXCONN}" if BACKLOG > Socket::SOMAXCONN loop do listening, = IO.select([socket]) io, = listening connection, addrinfo = io.accept handle_connection(connection, addrinfo) end ensure socket.close end ``` This tries to simulate a TCP server that responds as if it was an HTTP server. The amount of requests it can sustain seems to depend on the OS. On a Linux machine running ubuntu 20.04 I get something around 7.6K to 7.8K until it stops responding. ``` $ uname -a Linux ... 5.17.5-x86_64-linode154 #1 SMP PREEMPT Mon May 2 15:07:22 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux $ ab -n 20000 -c 50 http://127.0.0.1:8080/ This is ApacheBench, Version 2.3 <$Revision: 1843412 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 127.0.0.1 (be patient) Completed 2000 requests Completed 4000 requests Completed 6000 requests apr_pollset_poll: The timeout specified has expired (70007) Total of 7883 requests completed ``` on MacOS Ventura I get around 16K: ``` $ uname -a Darwin ... 22.2.0 Darwin Kernel Version 22.2.0: Fri Nov 11 02:04:44 PST 2022; root:xnu-8792.61.2~4/RELEASE_ARM64_T8103 arm64 $ ab -n 20000 -c 50 http://127.0.0.1:8080/ This is ApacheBench, Version 2.3 <$Revision: 1901567 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 127.0.0.1 (be patient) Completed 2000 requests Completed 4000 requests Completed 6000 requests Completed 8000 requests Completed 10000 requests Completed 12000 requests Completed 14000 requests Completed 16000 requests apr_socket_recv: Operation timed out (60) Total of 16375 requests completed ``` in both cases when that limit reaches, if I abort the program (CTRL-C) and try to run it again it takes a while until the port is released: ``` ... GET / HTTP/1.0 GET / HTTP/1.0 GET / HTTP/1.0 ^Csocketserver.rb:29:in `select': Interrupt from socketserver.rb:29:in `block in <main>' from socketserver.rb:28:in `loop' from socketserver.rb:28:in `<main>' $ ruby socketserver.rb socketserver.rb:23:in `bind': Address already in use - bind(2) for 127.0.0.1:8080 (Errno::EADDRINUSE) from socketserver.rb:23:in `<main>' ``` After killing the process it seems no process is holding the port: ``` lsof -wni tcp:8080 ``` Running the command above does not return anything (it does when the program is still running). I think we may be failing to release something when interacting with the Socket on syscalls/kernel level and we endup filling up some queue/buffer that eventually gets freed. -- https://bugs.ruby-lang.org/

Issue #19412 has been updated by brodock (Gabriel Mazetto). I've went back to my Linux machine and tried to reproduce the problem. Using Ruby 3.0 (from the package manager), I got to make it crash only using some decent concurrency (100) and running it for longer time: ``` brodock@pop-os:~/Projetos/ractor$ neofetch ///////////// brodock@pop-os ///////////////////// -------------- ///////*767//////////////// OS: Pop!_OS 22.04 LTS x86_64 //////7676767676*////////////// Kernel: 6.2.0-76060200-generic /////76767//7676767////////////// Uptime: 13 mins /////767676///*76767/////////////// Packages: 2031 (dpkg), 25 (flatpak) ///////767676///76767.///7676*/////// Shell: bash 5.1.16 /////////767676//76767///767676//////// Resolution: 3840x2160 //////////76767676767////76767///////// DE: GNOME 42.5 ///////////76767676//////7676////////// WM: Mutter ////////////,7676,///////767/////////// WM Theme: Pop /////////////*7676///////76//////////// Theme: Pop-dark [GTK2/3] ///////////////7676//////////////////// Icons: Pop [GTK2/3] ///////////////7676///767//////////// Terminal: gnome-terminal //////////////////////'//////////// CPU: AMD Ryzen 9 5900X (24) @ 3.700GHz //////.7676767676767676767,////// GPU: NVIDIA GeForce GTX 1660 SUPER /////767676767676767676767///// Memory: 3709MiB / 32002MiB /////////////////////////// ///////////////////// ///////////// ``` ``` ab -n 90000 -c 200 http://localhost:8080/ This is ApacheBench, Version 2.3 <$Revision: 1879490 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking localhost (be patient) Completed 9000 requests Completed 18000 requests Completed 27000 requests Completed 36000 requests Completed 45000 requests Completed 54000 requests Completed 63000 requests Completed 72000 requests Completed 81000 requests apr_pollset_poll: The timeout specified has expired (70007) Total of 89888 requests completed ``` It seems Linux can handle the issue better than Mac OS, but whatever is breaking is still there. This is what I got running the server part: ``` GET / HTTP/1.0 GET / HTTP/1.0 GET / HTTP/1.0 GET / HTTP/1.0 GET / HTTP/1.0 GET / HTTP/1.0 bug-19412.rb:13:in `write': Broken pipe (Errno::EPIPE) from bug-19412.rb:13:in `handle_connection' from bug-19412.rb:28:in `block in <main>' from bug-19412.rb:27:in `loop' from bug-19412.rb:27:in `<main>' ``` When I just run the `client.rb` it doesn´t break as there is not enough concurrency, so it never reaches a point where it can exhaust whatever resource is getting exhausted. Weirdly enough I can´t reproduce the problem when using hey: https://github.com/rakyll/hey ``` hey -n 180000 -c 300 http://localhost:8080/ Summary: Total: 14.8814 secs Slowest: 14.5462 secs Fastest: 0.0002 secs Average: 0.0112 secs Requests/sec: 12095.6387 Response time histogram: 0.000 [1] | 1.455 [179742] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 2.909 [36] | 4.364 [77] | 5.819 [16] | 7.273 [88] | 8.728 [23] | 10.182 [0] | 11.637 [0] | 13.092 [0] | 14.546 [17] | Latency distribution: 10% in 0.0006 secs 25% in 0.0015 secs 50% in 0.0019 secs 75% in 0.0020 secs 90% in 0.0022 secs 95% in 0.0023 secs 99% in 0.0025 secs Details (average, fastest, slowest): DNS+dialup: 0.0066 secs, 0.0002 secs, 14.5462 secs DNS-lookup: 0.0001 secs, 0.0000 secs, 0.0491 secs req write: 0.0000 secs, 0.0000 secs, 0.0025 secs resp wait: 0.0045 secs, 0.0000 secs, 13.5324 secs resp read: 0.0000 secs, 0.0000 secs, 0.0033 secs Status code distribution: [200] 180000 responses ``` ---------------------------------------- Bug #19412: Socket starts queueing and not responding after a certain amount of requests https://bugs.ruby-lang.org/issues/19412#change-102420 * Author: brodock (Gabriel Mazetto) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.1 (2023-02-08 revision 31819e82c8) [arm64-darwin22] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- Here is a an example code: ```ruby require 'socket' PORT = 8080 BACKLOG = 50 LISTEN = '127.0.0.1' def handle_connection(connection, _addrinfo) request = connection.gets puts request connection.write "HTTP/1.1 200\r\n" connection.write "Content-Type: text/html\r\n" connection.write "\r\n" connection.write "Hello world! Current time is #{Time.now}" ensure connection.close end begin socket = Socket.new(:INET, :STREAM) socket.bind(Addrinfo.tcp(LISTEN, PORT)) socket.listen(BACKLOG) puts "Requested queue size: #{BACKLOG} bigger then limit: #{Socket::SOMAXCONN}" if BACKLOG > Socket::SOMAXCONN loop do listening, = IO.select([socket]) io, = listening connection, addrinfo = io.accept handle_connection(connection, addrinfo) end ensure socket.close end ``` This tries to simulate a TCP server that responds as if it was an HTTP server. The amount of requests it can sustain seems to depend on the OS. On a Linux machine running ubuntu 20.04 I get something around 7.6K to 7.8K until it stops responding. ``` $ uname -a Linux ... 5.17.5-x86_64-linode154 #1 SMP PREEMPT Mon May 2 15:07:22 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux $ ab -n 20000 -c 50 http://127.0.0.1:8080/ This is ApacheBench, Version 2.3 <$Revision: 1843412 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 127.0.0.1 (be patient) Completed 2000 requests Completed 4000 requests Completed 6000 requests apr_pollset_poll: The timeout specified has expired (70007) Total of 7883 requests completed ``` on MacOS Ventura I get around 16K: ``` $ uname -a Darwin ... 22.2.0 Darwin Kernel Version 22.2.0: Fri Nov 11 02:04:44 PST 2022; root:xnu-8792.61.2~4/RELEASE_ARM64_T8103 arm64 $ ab -n 20000 -c 50 http://127.0.0.1:8080/ This is ApacheBench, Version 2.3 <$Revision: 1901567 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 127.0.0.1 (be patient) Completed 2000 requests Completed 4000 requests Completed 6000 requests Completed 8000 requests Completed 10000 requests Completed 12000 requests Completed 14000 requests Completed 16000 requests apr_socket_recv: Operation timed out (60) Total of 16375 requests completed ``` in both cases when that limit reaches, if I abort the program (CTRL-C) and try to run it again it takes a while until the port is released: ``` ... GET / HTTP/1.0 GET / HTTP/1.0 GET / HTTP/1.0 ^Csocketserver.rb:29:in `select': Interrupt from socketserver.rb:29:in `block in <main>' from socketserver.rb:28:in `loop' from socketserver.rb:28:in `<main>' $ ruby socketserver.rb socketserver.rb:23:in `bind': Address already in use - bind(2) for 127.0.0.1:8080 (Errno::EADDRINUSE) from socketserver.rb:23:in `<main>' ``` After killing the process it seems no process is holding the port: ``` lsof -wni tcp:8080 ``` Running the command above does not return anything (it does when the program is still running). I think we may be failing to release something when interacting with the Socket on syscalls/kernel level and we endup filling up some queue/buffer that eventually gets freed. -- https://bugs.ruby-lang.org/

Issue #19412 has been updated by brodock (Gabriel Mazetto). I've also tried to build something similar to "hey" in ruby and couldn´t make it crash (this is likely because of the GVL, we can´t get real concurrency): ```ruby require 'bundler/inline' gemfile do source 'https://rubygems.org' gem 'thread' end require 'socket' require 'thread/pool' require 'thread/channel' PORT = 8080 LISTEN = '127.0.0.1' NTIMES = 80000 CONCURRENCY = 512 pool = Thread.pool(CONCURRENCY) channel = Thread.channel NTIMES.times do |t| pool.process do socket = TCPSocket.new(LISTEN, PORT) socket.write("HTTP/1.1 GET /\n") socket.shutdown(Socket::SHUT_WR) socket.read channel.send([:success]) rescue Exception => e channel.send([:error, e]) ensure socket&.close end end output_thread = Thread.new { success = 0 errors = [] while data = channel.receive case data[0] when :success success += 1 puts "Completed #{success} requests" if success > 0 && (success % (NTIMES / 10) == 0) when :error errors << data[1] when :eof puts "Finished with #{errors.size} errors" errors.each { puts _1 } break end end } pool.shutdown channel.send([:eof]) output_thread.join ``` ---------------------------------------- Bug #19412: Socket starts queueing and not responding after a certain amount of requests https://bugs.ruby-lang.org/issues/19412#change-102421 * Author: brodock (Gabriel Mazetto) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.1 (2023-02-08 revision 31819e82c8) [arm64-darwin22] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- Here is a an example code: ```ruby require 'socket' PORT = 8080 BACKLOG = 50 LISTEN = '127.0.0.1' def handle_connection(connection, _addrinfo) request = connection.gets puts request connection.write "HTTP/1.1 200\r\n" connection.write "Content-Type: text/html\r\n" connection.write "\r\n" connection.write "Hello world! Current time is #{Time.now}" ensure connection.close end begin socket = Socket.new(:INET, :STREAM) socket.bind(Addrinfo.tcp(LISTEN, PORT)) socket.listen(BACKLOG) puts "Requested queue size: #{BACKLOG} bigger then limit: #{Socket::SOMAXCONN}" if BACKLOG > Socket::SOMAXCONN loop do listening, = IO.select([socket]) io, = listening connection, addrinfo = io.accept handle_connection(connection, addrinfo) end ensure socket.close end ``` This tries to simulate a TCP server that responds as if it was an HTTP server. The amount of requests it can sustain seems to depend on the OS. On a Linux machine running ubuntu 20.04 I get something around 7.6K to 7.8K until it stops responding. ``` $ uname -a Linux ... 5.17.5-x86_64-linode154 #1 SMP PREEMPT Mon May 2 15:07:22 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux $ ab -n 20000 -c 50 http://127.0.0.1:8080/ This is ApacheBench, Version 2.3 <$Revision: 1843412 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 127.0.0.1 (be patient) Completed 2000 requests Completed 4000 requests Completed 6000 requests apr_pollset_poll: The timeout specified has expired (70007) Total of 7883 requests completed ``` on MacOS Ventura I get around 16K: ``` $ uname -a Darwin ... 22.2.0 Darwin Kernel Version 22.2.0: Fri Nov 11 02:04:44 PST 2022; root:xnu-8792.61.2~4/RELEASE_ARM64_T8103 arm64 $ ab -n 20000 -c 50 http://127.0.0.1:8080/ This is ApacheBench, Version 2.3 <$Revision: 1901567 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 127.0.0.1 (be patient) Completed 2000 requests Completed 4000 requests Completed 6000 requests Completed 8000 requests Completed 10000 requests Completed 12000 requests Completed 14000 requests Completed 16000 requests apr_socket_recv: Operation timed out (60) Total of 16375 requests completed ``` in both cases when that limit reaches, if I abort the program (CTRL-C) and try to run it again it takes a while until the port is released: ``` ... GET / HTTP/1.0 GET / HTTP/1.0 GET / HTTP/1.0 ^Csocketserver.rb:29:in `select': Interrupt from socketserver.rb:29:in `block in <main>' from socketserver.rb:28:in `loop' from socketserver.rb:28:in `<main>' $ ruby socketserver.rb socketserver.rb:23:in `bind': Address already in use - bind(2) for 127.0.0.1:8080 (Errno::EADDRINUSE) from socketserver.rb:23:in `<main>' ``` After killing the process it seems no process is holding the port: ``` lsof -wni tcp:8080 ``` Running the command above does not return anything (it does when the program is still running). I think we may be failing to release something when interacting with the Socket on syscalls/kernel level and we endup filling up some queue/buffer that eventually gets freed. -- https://bugs.ruby-lang.org/

Issue #19412 has been updated by brodock (Gabriel Mazetto). @jeremyevans0 I've tested again in MacOS system provided ruby version: ``` ruby --version Could not locate Gemfile or .bundle/ directory ruby 2.6.10p210 (2022-04-12 revision 67958) [universal.arm64e-darwin22] ``` And that is consistent. It seems the problem is specific to MacOS, or worst when in MacOS. I've tried running the code above to make sure this is not apache/hey thing. The behavior is that it runs a couple of thousand requests, then stop for 15-20s then runs another batch and stops and continue doing that... Because of how the code above handles exceptions, I can see then only at the end: ``` ruby client-pool.rb 16:54:52 Could not locate Gemfile or .bundle/ directory Completed 8000 requests Completed 16000 requests Completed 24000 requests Completed 32000 requests Completed 40000 requests Completed 48000 requests Completed 56000 requests Completed 64000 requests Completed 72000 requests Finished with 310 errors Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Connection reset by peer Socket is not connected - shutdown(2) Broken pipe Broken pipe Broken pipe connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 Connection reset by peer Connection reset by peer - connect(2) for "127.0.0.1" port 8080 Socket is not connected - shutdown(2) Socket is not connected - shutdown(2) Connection reset by peer - connect(2) for "127.0.0.1" port 8080 Socket is not connected - shutdown(2) Socket is not connected - shutdown(2) Socket is not connected - shutdown(2) Connection reset by peer Connection reset by peer Connection reset by peer connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 Connection reset by peer Socket is not connected - shutdown(2) Socket is not connected - shutdown(2) Socket is not connected - shutdown(2) Connection reset by peer Connection reset by peer connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 connect(2) for "127.0.0.1" port 8080 ``` ---------------------------------------- Bug #19412: Socket starts queueing and not responding after a certain amount of requests https://bugs.ruby-lang.org/issues/19412#change-102422 * Author: brodock (Gabriel Mazetto) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.1 (2023-02-08 revision 31819e82c8) [arm64-darwin22] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- Here is a an example code: ```ruby require 'socket' PORT = 8080 BACKLOG = 50 LISTEN = '127.0.0.1' def handle_connection(connection, _addrinfo) request = connection.gets puts request connection.write "HTTP/1.1 200\r\n" connection.write "Content-Type: text/html\r\n" connection.write "\r\n" connection.write "Hello world! Current time is #{Time.now}" ensure connection.close end begin socket = Socket.new(:INET, :STREAM) socket.bind(Addrinfo.tcp(LISTEN, PORT)) socket.listen(BACKLOG) puts "Requested queue size: #{BACKLOG} bigger then limit: #{Socket::SOMAXCONN}" if BACKLOG > Socket::SOMAXCONN loop do listening, = IO.select([socket]) io, = listening connection, addrinfo = io.accept handle_connection(connection, addrinfo) end ensure socket.close end ``` This tries to simulate a TCP server that responds as if it was an HTTP server. The amount of requests it can sustain seems to depend on the OS. On a Linux machine running ubuntu 20.04 I get something around 7.6K to 7.8K until it stops responding. ``` $ uname -a Linux ... 5.17.5-x86_64-linode154 #1 SMP PREEMPT Mon May 2 15:07:22 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux $ ab -n 20000 -c 50 http://127.0.0.1:8080/ This is ApacheBench, Version 2.3 <$Revision: 1843412 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 127.0.0.1 (be patient) Completed 2000 requests Completed 4000 requests Completed 6000 requests apr_pollset_poll: The timeout specified has expired (70007) Total of 7883 requests completed ``` on MacOS Ventura I get around 16K: ``` $ uname -a Darwin ... 22.2.0 Darwin Kernel Version 22.2.0: Fri Nov 11 02:04:44 PST 2022; root:xnu-8792.61.2~4/RELEASE_ARM64_T8103 arm64 $ ab -n 20000 -c 50 http://127.0.0.1:8080/ This is ApacheBench, Version 2.3 <$Revision: 1901567 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 127.0.0.1 (be patient) Completed 2000 requests Completed 4000 requests Completed 6000 requests Completed 8000 requests Completed 10000 requests Completed 12000 requests Completed 14000 requests Completed 16000 requests apr_socket_recv: Operation timed out (60) Total of 16375 requests completed ``` in both cases when that limit reaches, if I abort the program (CTRL-C) and try to run it again it takes a while until the port is released: ``` ... GET / HTTP/1.0 GET / HTTP/1.0 GET / HTTP/1.0 ^Csocketserver.rb:29:in `select': Interrupt from socketserver.rb:29:in `block in <main>' from socketserver.rb:28:in `loop' from socketserver.rb:28:in `<main>' $ ruby socketserver.rb socketserver.rb:23:in `bind': Address already in use - bind(2) for 127.0.0.1:8080 (Errno::EADDRINUSE) from socketserver.rb:23:in `<main>' ``` After killing the process it seems no process is holding the port: ``` lsof -wni tcp:8080 ``` Running the command above does not return anything (it does when the program is still running). I think we may be failing to release something when interacting with the Socket on syscalls/kernel level and we endup filling up some queue/buffer that eventually gets freed. -- https://bugs.ruby-lang.org/

Issue #19412 has been updated by jeremyevans0 (Jeremy Evans). brodock (Gabriel Mazetto) wrote in #note-9:
@jeremyevans0 I've tested again in MacOS system provided ruby version:
``` ruby --version Could not locate Gemfile or .bundle/ directory ruby 2.6.10p210 (2022-04-12 revision 67958) [universal.arm64e-darwin22] ```
And that is consistent. It seems the problem is specific to MacOS, or worst when in MacOS.
I've tried running the code above to make sure this is not apache/hey thing.
In your pure Ruby hey-example, you have `CONCURRENCY = 512`, but your server has `BACKLOG = 50`. You would want `BACKLOG` to be higher than `CONCURRENCY`, otherwise I think at least some failures may be expected. Note that Ruby doesn't deal well with a high number of threads (and 512 threads would definitely be considered high in this case). For the client part, if you want to test concurrency, it would be better to fork/spawn separate processes and not use threads. I don't have any other ideas about what could be going wrong. You should try to see if either your client or server is leaking file descriptors. You could also try increasing the timeout. Unfortunately, not being able to reproduce the problem, I can only guess as to what the root cause might be. ---------------------------------------- Bug #19412: Socket starts queueing and not responding after a certain amount of requests https://bugs.ruby-lang.org/issues/19412#change-102423 * Author: brodock (Gabriel Mazetto) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.1 (2023-02-08 revision 31819e82c8) [arm64-darwin22] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- Here is a an example code: ```ruby require 'socket' PORT = 8080 BACKLOG = 50 LISTEN = '127.0.0.1' def handle_connection(connection, _addrinfo) request = connection.gets puts request connection.write "HTTP/1.1 200\r\n" connection.write "Content-Type: text/html\r\n" connection.write "\r\n" connection.write "Hello world! Current time is #{Time.now}" ensure connection.close end begin socket = Socket.new(:INET, :STREAM) socket.bind(Addrinfo.tcp(LISTEN, PORT)) socket.listen(BACKLOG) puts "Requested queue size: #{BACKLOG} bigger then limit: #{Socket::SOMAXCONN}" if BACKLOG > Socket::SOMAXCONN loop do listening, = IO.select([socket]) io, = listening connection, addrinfo = io.accept handle_connection(connection, addrinfo) end ensure socket.close end ``` This tries to simulate a TCP server that responds as if it was an HTTP server. The amount of requests it can sustain seems to depend on the OS. On a Linux machine running ubuntu 20.04 I get something around 7.6K to 7.8K until it stops responding. ``` $ uname -a Linux ... 5.17.5-x86_64-linode154 #1 SMP PREEMPT Mon May 2 15:07:22 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux $ ab -n 20000 -c 50 http://127.0.0.1:8080/ This is ApacheBench, Version 2.3 <$Revision: 1843412 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 127.0.0.1 (be patient) Completed 2000 requests Completed 4000 requests Completed 6000 requests apr_pollset_poll: The timeout specified has expired (70007) Total of 7883 requests completed ``` on MacOS Ventura I get around 16K: ``` $ uname -a Darwin ... 22.2.0 Darwin Kernel Version 22.2.0: Fri Nov 11 02:04:44 PST 2022; root:xnu-8792.61.2~4/RELEASE_ARM64_T8103 arm64 $ ab -n 20000 -c 50 http://127.0.0.1:8080/ This is ApacheBench, Version 2.3 <$Revision: 1901567 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 127.0.0.1 (be patient) Completed 2000 requests Completed 4000 requests Completed 6000 requests Completed 8000 requests Completed 10000 requests Completed 12000 requests Completed 14000 requests Completed 16000 requests apr_socket_recv: Operation timed out (60) Total of 16375 requests completed ``` in both cases when that limit reaches, if I abort the program (CTRL-C) and try to run it again it takes a while until the port is released: ``` ... GET / HTTP/1.0 GET / HTTP/1.0 GET / HTTP/1.0 ^Csocketserver.rb:29:in `select': Interrupt from socketserver.rb:29:in `block in <main>' from socketserver.rb:28:in `loop' from socketserver.rb:28:in `<main>' $ ruby socketserver.rb socketserver.rb:23:in `bind': Address already in use - bind(2) for 127.0.0.1:8080 (Errno::EADDRINUSE) from socketserver.rb:23:in `<main>' ``` After killing the process it seems no process is holding the port: ``` lsof -wni tcp:8080 ``` Running the command above does not return anything (it does when the program is still running). I think we may be failing to release something when interacting with the Socket on syscalls/kernel level and we endup filling up some queue/buffer that eventually gets freed. -- https://bugs.ruby-lang.org/

Issue #19412 has been updated by brodock (Gabriel Mazetto). File clipboard-202303161214-zhq7o.png added File clipboard-202303161214-7xx91.png added File clipboard-202303161215-5wjbo.png added @jeremyevans0 I'm new to debugging C code. Your idea on filedescriptors seems to to be somewhat spot on. I'm not sure if this is the right way to debug this or not, but I managed to capture "Filesystem activity" through MacOS instrumentor. As you can see, there is a big gap between a chunk of request it processed which can illustrate what I was observing. looking at the description of what is happening you can see that the code couldn't get "accept" to return with success after some number of executions. What is interesting is that I can't see any `close` calls, only `accept` and `writev` which is weird. Could it be that somehow in MacOS we are not issuing `close` calls and it is just "timing out"?    ---------------------------------------- Bug #19412: Socket starts queueing and not responding after a certain amount of requests https://bugs.ruby-lang.org/issues/19412#change-102430 * Author: brodock (Gabriel Mazetto) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.1 (2023-02-08 revision 31819e82c8) [arm64-darwin22] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- Here is a an example code: ```ruby require 'socket' PORT = 8080 BACKLOG = 50 LISTEN = '127.0.0.1' def handle_connection(connection, _addrinfo) request = connection.gets puts request connection.write "HTTP/1.1 200\r\n" connection.write "Content-Type: text/html\r\n" connection.write "\r\n" connection.write "Hello world! Current time is #{Time.now}" ensure connection.close end begin socket = Socket.new(:INET, :STREAM) socket.bind(Addrinfo.tcp(LISTEN, PORT)) socket.listen(BACKLOG) puts "Requested queue size: #{BACKLOG} bigger then limit: #{Socket::SOMAXCONN}" if BACKLOG > Socket::SOMAXCONN loop do listening, = IO.select([socket]) io, = listening connection, addrinfo = io.accept handle_connection(connection, addrinfo) end ensure socket.close end ``` This tries to simulate a TCP server that responds as if it was an HTTP server. The amount of requests it can sustain seems to depend on the OS. On a Linux machine running ubuntu 20.04 I get something around 7.6K to 7.8K until it stops responding. ``` $ uname -a Linux ... 5.17.5-x86_64-linode154 #1 SMP PREEMPT Mon May 2 15:07:22 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux $ ab -n 20000 -c 50 http://127.0.0.1:8080/ This is ApacheBench, Version 2.3 <$Revision: 1843412 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 127.0.0.1 (be patient) Completed 2000 requests Completed 4000 requests Completed 6000 requests apr_pollset_poll: The timeout specified has expired (70007) Total of 7883 requests completed ``` on MacOS Ventura I get around 16K: ``` $ uname -a Darwin ... 22.2.0 Darwin Kernel Version 22.2.0: Fri Nov 11 02:04:44 PST 2022; root:xnu-8792.61.2~4/RELEASE_ARM64_T8103 arm64 $ ab -n 20000 -c 50 http://127.0.0.1:8080/ This is ApacheBench, Version 2.3 <$Revision: 1901567 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 127.0.0.1 (be patient) Completed 2000 requests Completed 4000 requests Completed 6000 requests Completed 8000 requests Completed 10000 requests Completed 12000 requests Completed 14000 requests Completed 16000 requests apr_socket_recv: Operation timed out (60) Total of 16375 requests completed ``` in both cases when that limit reaches, if I abort the program (CTRL-C) and try to run it again it takes a while until the port is released: ``` ... GET / HTTP/1.0 GET / HTTP/1.0 GET / HTTP/1.0 ^Csocketserver.rb:29:in `select': Interrupt from socketserver.rb:29:in `block in <main>' from socketserver.rb:28:in `loop' from socketserver.rb:28:in `<main>' $ ruby socketserver.rb socketserver.rb:23:in `bind': Address already in use - bind(2) for 127.0.0.1:8080 (Errno::EADDRINUSE) from socketserver.rb:23:in `<main>' ``` After killing the process it seems no process is holding the port: ``` lsof -wni tcp:8080 ``` Running the command above does not return anything (it does when the program is still running). I think we may be failing to release something when interacting with the Socket on syscalls/kernel level and we endup filling up some queue/buffer that eventually gets freed. ---Files-------------------------------- clipboard-202303161214-zhq7o.png (1.02 MB) clipboard-202303161214-7xx91.png (683 KB) clipboard-202303161215-5wjbo.png (1.02 MB) -- https://bugs.ruby-lang.org/

Issue #19412 has been updated by brodock (Gabriel Mazetto). when the pause happens it seems the socket is still kept alive, based on netstat: (there is a huge list of those): ``` tcp4 0 0 localhost.http-alt localhost.56818 TIME_WAIT tcp4 0 0 localhost.http-alt localhost.56819 TIME_WAIT tcp4 0 0 localhost.http-alt localhost.56820 TIME_WAIT tcp4 0 0 localhost.http-alt localhost.56821 TIME_WAIT tcp4 0 0 localhost.http-alt localhost.56822 TIME_WAIT tcp4 0 0 localhost.http-alt localhost.56823 TIME_WAIT tcp4 0 0 localhost.http-alt localhost.56825 TIME_WAIT tcp4 0 0 localhost.http-alt localhost.56824 TIME_WAIT tcp4 0 0 localhost.http-alt localhost.56826 TIME_WAIT tcp4 0 0 localhost.http-alt localhost.56827 TIME_WAIT tcp4 0 0 localhost.http-alt localhost.56828 TIME_WAIT tcp4 0 0 localhost.http-alt localhost.56829 TIME_WAIT tcp4 0 0 localhost.http-alt localhost.56830 TIME_WAIT tcp4 0 0 localhost.http-alt localhost.56831 TIME_WAIT tcp4 0 0 localhost.http-alt localhost.56832 TIME_WAIT tcp4 0 0 localhost.http-alt localhost.56833 TIME_WAIT tcp4 0 0 localhost.http-alt localhost.56834 TIME_WAIT tcp4 0 0 localhost.http-alt localhost.56835 TIME_WAIT ``` based on https://stackoverflow.com/questions/2187889/how-can-i-make-a-socket-close-im... it seems that there is a way to opt for socket reuse which would bypass this. It still weird that there is such a small limit on MacOS for how many of those we keep waiting before we can't accept anymore ---------------------------------------- Bug #19412: Socket starts queueing and not responding after a certain amount of requests https://bugs.ruby-lang.org/issues/19412#change-102431 * Author: brodock (Gabriel Mazetto) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.1 (2023-02-08 revision 31819e82c8) [arm64-darwin22] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- Here is a an example code: ```ruby require 'socket' PORT = 8080 BACKLOG = 50 LISTEN = '127.0.0.1' def handle_connection(connection, _addrinfo) request = connection.gets puts request connection.write "HTTP/1.1 200\r\n" connection.write "Content-Type: text/html\r\n" connection.write "\r\n" connection.write "Hello world! Current time is #{Time.now}" ensure connection.close end begin socket = Socket.new(:INET, :STREAM) socket.bind(Addrinfo.tcp(LISTEN, PORT)) socket.listen(BACKLOG) puts "Requested queue size: #{BACKLOG} bigger then limit: #{Socket::SOMAXCONN}" if BACKLOG > Socket::SOMAXCONN loop do listening, = IO.select([socket]) io, = listening connection, addrinfo = io.accept handle_connection(connection, addrinfo) end ensure socket.close end ``` This tries to simulate a TCP server that responds as if it was an HTTP server. The amount of requests it can sustain seems to depend on the OS. On a Linux machine running ubuntu 20.04 I get something around 7.6K to 7.8K until it stops responding. ``` $ uname -a Linux ... 5.17.5-x86_64-linode154 #1 SMP PREEMPT Mon May 2 15:07:22 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux $ ab -n 20000 -c 50 http://127.0.0.1:8080/ This is ApacheBench, Version 2.3 <$Revision: 1843412 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 127.0.0.1 (be patient) Completed 2000 requests Completed 4000 requests Completed 6000 requests apr_pollset_poll: The timeout specified has expired (70007) Total of 7883 requests completed ``` on MacOS Ventura I get around 16K: ``` $ uname -a Darwin ... 22.2.0 Darwin Kernel Version 22.2.0: Fri Nov 11 02:04:44 PST 2022; root:xnu-8792.61.2~4/RELEASE_ARM64_T8103 arm64 $ ab -n 20000 -c 50 http://127.0.0.1:8080/ This is ApacheBench, Version 2.3 <$Revision: 1901567 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 127.0.0.1 (be patient) Completed 2000 requests Completed 4000 requests Completed 6000 requests Completed 8000 requests Completed 10000 requests Completed 12000 requests Completed 14000 requests Completed 16000 requests apr_socket_recv: Operation timed out (60) Total of 16375 requests completed ``` in both cases when that limit reaches, if I abort the program (CTRL-C) and try to run it again it takes a while until the port is released: ``` ... GET / HTTP/1.0 GET / HTTP/1.0 GET / HTTP/1.0 ^Csocketserver.rb:29:in `select': Interrupt from socketserver.rb:29:in `block in <main>' from socketserver.rb:28:in `loop' from socketserver.rb:28:in `<main>' $ ruby socketserver.rb socketserver.rb:23:in `bind': Address already in use - bind(2) for 127.0.0.1:8080 (Errno::EADDRINUSE) from socketserver.rb:23:in `<main>' ``` After killing the process it seems no process is holding the port: ``` lsof -wni tcp:8080 ``` Running the command above does not return anything (it does when the program is still running). I think we may be failing to release something when interacting with the Socket on syscalls/kernel level and we endup filling up some queue/buffer that eventually gets freed. ---Files-------------------------------- clipboard-202303161214-zhq7o.png (1.02 MB) clipboard-202303161214-7xx91.png (683 KB) clipboard-202303161215-5wjbo.png (1.02 MB) -- https://bugs.ruby-lang.org/

Issue #19412 has been updated by brodock (Gabriel Mazetto). Some other updates. I've been trying to undestand the `TIME_WAIT` and why it does behave the way it doesn on MacOS (but not on Linux). This response got me ideas to try setting `linger` on: https://stackoverflow.com/questions/3757289/when-is-tcp-option-so-linger-0-r... This modified code now does "release" the socket immediately, which allows me to respond to that huge ammount of requests. My understanding is that if I set linger to `1` that would mean 1 tick/cycle or 1 second depending on the OS (worst case would be 1 second) but setting that on MacOS brings the old behavior back of having `TIME_WAIT` all over the place. ```ruby require 'socket' PORT = 8080 BACKLOG = 64 LISTEN = '127.0.0.1' LINGER = 0 @processed = 0 def handle_connection(connection, _addrinfo) request = connection.gets connection.write "HTTP/1.1 200\r\n" connection.write "Content-Type: text/html\r\n" connection.write "\r\n" connection.write "Hello world! Current time is #{Time.now}" @processed += 1 puts "#{request} - #{@processed}" ensure connection.setsockopt(Socket::Option.linger(true, 0)) connection.close end begin socket = TCPServer.new(LISTEN, PORT) socket.setsockopt(:SOCKET, :REUSEADDR, true) socket.setsockopt(Socket::IPPROTO_TCP, Socket::TCP_NODELAY, 1) socket.listen(BACKLOG) puts "Requested queue size: #{BACKLOG} bigger then limit: #{Socket::SOMAXCONN}" if BACKLOG > Socket::SOMAXCONN loop do handle_connection(socket.accept, nil) end ensure socket.close end ``` ---------------------------------------- Bug #19412: Socket starts queueing and not responding after a certain amount of requests https://bugs.ruby-lang.org/issues/19412#change-102438 * Author: brodock (Gabriel Mazetto) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.1 (2023-02-08 revision 31819e82c8) [arm64-darwin22] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- Here is a an example code: ```ruby require 'socket' PORT = 8080 BACKLOG = 50 LISTEN = '127.0.0.1' def handle_connection(connection, _addrinfo) request = connection.gets puts request connection.write "HTTP/1.1 200\r\n" connection.write "Content-Type: text/html\r\n" connection.write "\r\n" connection.write "Hello world! Current time is #{Time.now}" ensure connection.close end begin socket = Socket.new(:INET, :STREAM) socket.bind(Addrinfo.tcp(LISTEN, PORT)) socket.listen(BACKLOG) puts "Requested queue size: #{BACKLOG} bigger then limit: #{Socket::SOMAXCONN}" if BACKLOG > Socket::SOMAXCONN loop do listening, = IO.select([socket]) io, = listening connection, addrinfo = io.accept handle_connection(connection, addrinfo) end ensure socket.close end ``` This tries to simulate a TCP server that responds as if it was an HTTP server. The amount of requests it can sustain seems to depend on the OS. On a Linux machine running ubuntu 20.04 I get something around 7.6K to 7.8K until it stops responding. ``` $ uname -a Linux ... 5.17.5-x86_64-linode154 #1 SMP PREEMPT Mon May 2 15:07:22 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux $ ab -n 20000 -c 50 http://127.0.0.1:8080/ This is ApacheBench, Version 2.3 <$Revision: 1843412 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 127.0.0.1 (be patient) Completed 2000 requests Completed 4000 requests Completed 6000 requests apr_pollset_poll: The timeout specified has expired (70007) Total of 7883 requests completed ``` on MacOS Ventura I get around 16K: ``` $ uname -a Darwin ... 22.2.0 Darwin Kernel Version 22.2.0: Fri Nov 11 02:04:44 PST 2022; root:xnu-8792.61.2~4/RELEASE_ARM64_T8103 arm64 $ ab -n 20000 -c 50 http://127.0.0.1:8080/ This is ApacheBench, Version 2.3 <$Revision: 1901567 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 127.0.0.1 (be patient) Completed 2000 requests Completed 4000 requests Completed 6000 requests Completed 8000 requests Completed 10000 requests Completed 12000 requests Completed 14000 requests Completed 16000 requests apr_socket_recv: Operation timed out (60) Total of 16375 requests completed ``` in both cases when that limit reaches, if I abort the program (CTRL-C) and try to run it again it takes a while until the port is released: ``` ... GET / HTTP/1.0 GET / HTTP/1.0 GET / HTTP/1.0 ^Csocketserver.rb:29:in `select': Interrupt from socketserver.rb:29:in `block in <main>' from socketserver.rb:28:in `loop' from socketserver.rb:28:in `<main>' $ ruby socketserver.rb socketserver.rb:23:in `bind': Address already in use - bind(2) for 127.0.0.1:8080 (Errno::EADDRINUSE) from socketserver.rb:23:in `<main>' ``` After killing the process it seems no process is holding the port: ``` lsof -wni tcp:8080 ``` Running the command above does not return anything (it does when the program is still running). I think we may be failing to release something when interacting with the Socket on syscalls/kernel level and we endup filling up some queue/buffer that eventually gets freed. ---Files-------------------------------- clipboard-202303161214-zhq7o.png (1.02 MB) clipboard-202303161214-7xx91.png (683 KB) clipboard-202303161215-5wjbo.png (1.02 MB) -- https://bugs.ruby-lang.org/
participants (4)
-
brodock (Gabriel Mazetto)
-
brodock (Gabriel Mazetto)
-
jeremyevans0 (Jeremy Evans)
-
jeremyevans0 (Jeremy Evans)