[ruby-core:115187] [Ruby master Bug#19976] test/fiber/test_queue.rb stuck tests in Ubuntu ppc64le

Issue #19976 has been reported by jaruga (Jun Aruga). ---------------------------------------- Bug #19976: test/fiber/test_queue.rb stuck tests in Ubuntu ppc64le https://bugs.ruby-lang.org/issues/19976 * Author: jaruga (Jun Aruga) * Status: Open * Priority: Normal * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- I have seen the `test_pop_with_timeout` and `test_pop_with_timeout_and_value` are stuck/hang on GCC compilers on ruby's master branch in RubyCI ppc64le server Focal/Jammy and Travis CI ppc64le. This ticket is to manage this issue. On August 27 2023, we saw the following stuck/hang issue in Travis ppc64le Ubuntu focal. The used GCC version was 10.5.0. https://app.travis-ci.com/github/ruby/ruby/jobs/608696247#L2355 ``` Retrying hung up testcases... [1/2] TestFiberQueue#test_pop_with_timeout_and_value = 0.00 s [2/2] TestFiberQueue#test_pop_with_timeout ====[ 540 seconds still running ]==== ====[ 1080 seconds still running ]==== ====[ 1620 seconds still running ]==== ====[ 2160 seconds still running ]==== ``` We upgraded RubyCI's ppc64le server from focal to jammy, and started to use newer GCC 11.4.0 (gcc-11) on the server. https://packages.ubuntu.com/jammy-updates/gcc-11 - 11.4.0 We have not seen the issue in the server after starting using the gcc 11.4.0. http://rubyci.s3.amazonaws.com/ppc64le/ruby-master/recent.html However, I saw this issue on October 27 2023 again in Travis Ubuntu ppc64le jammy when I tried to upgrade Travis ppc64le from focal to jammy. I didn't seen the issue in Travis ppc64le focal. The used gcc version is also 11.4.0. https://github.com/ruby/ruby/pull/8739 https://app.travis-ci.com/github/junaruga/ruby/jobs/612361931#L2930 ``` [1/2] TestFiberQueue#test_pop_with_timeout====[ 1080 seconds still running ]==== ====[ 1620 seconds still running ]==== ====[ 2160 seconds still running ]==== ``` This means something is different between RubyCI ppc64le server and Travis ppc64le environments for running the tests. I was able to reproduce this stuck/hang issue with the reproducing script below in RubyCI's ppc64le Ubuntu jammy server. https://github.com/junaruga/report-ruby-fiber-hung_up-tests The possible differences that may cause the issue is a parallel execution `make -jN`, or compiler flag`-O1`, or `-ggdb3`. https://github.com/junaruga/report-ruby-fiber-hung_up-tests/blob/d94205d9d7f... I also sent the PR to make the stuck/hang tests fail, and it was merged. The tests failing immediately is better than the tests being stuck/hang. https://github.com/ruby/ruby/pull/8791 I hope we find the cause and fix this stuck/hang issue in gcc 11.4.0 in Ubuntu jammy. -- https://bugs.ruby-lang.org/

Issue #19976 has been updated by jaruga (Jun Aruga). It seems that this issue was gone by removing the `optflags=-O1` in Travis ppc64le on this PR's 2nd commit https://github.com/ruby/ruby/commit/ca7296767b5db9a401bc64738984f35880061a73 . ---------------------------------------- Bug #19976: test/fiber/test_queue.rb stuck tests in Ubuntu ppc64le https://bugs.ruby-lang.org/issues/19976#change-105117 * Author: jaruga (Jun Aruga) * Status: Open * Priority: Normal * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- I have seen the `test_pop_with_timeout` and `test_pop_with_timeout_and_value` are stuck/hang on GCC compilers on ruby's master branch in RubyCI ppc64le server Focal/Jammy and Travis CI ppc64le. This ticket is to manage this issue. On August 27 2023, we saw the following stuck/hang issue in Travis ppc64le Ubuntu focal. The used GCC version was 10.5.0. https://app.travis-ci.com/github/ruby/ruby/jobs/608696247#L2355 ``` Retrying hung up testcases... [1/2] TestFiberQueue#test_pop_with_timeout_and_value = 0.00 s [2/2] TestFiberQueue#test_pop_with_timeout ====[ 540 seconds still running ]==== ====[ 1080 seconds still running ]==== ====[ 1620 seconds still running ]==== ====[ 2160 seconds still running ]==== ``` We upgraded RubyCI's ppc64le server from focal to jammy, and started to use newer GCC 11.4.0 (gcc-11) on the server. https://packages.ubuntu.com/jammy-updates/gcc-11 - 11.4.0 We have not seen the issue in the server after starting using the gcc 11.4.0. http://rubyci.s3.amazonaws.com/ppc64le/ruby-master/recent.html However, I saw this issue on October 27 2023 again in Travis Ubuntu ppc64le jammy when I tried to upgrade Travis ppc64le from focal to jammy. I didn't seen the issue in Travis ppc64le focal. The used gcc version is also 11.4.0. https://github.com/ruby/ruby/pull/8739 https://app.travis-ci.com/github/junaruga/ruby/jobs/612361931#L2930 ``` [1/2] TestFiberQueue#test_pop_with_timeout====[ 1080 seconds still running ]==== ====[ 1620 seconds still running ]==== ====[ 2160 seconds still running ]==== ``` This means something is different between RubyCI ppc64le server and Travis ppc64le environments for running the tests. I was able to reproduce this stuck/hang issue with the reproducing script below in RubyCI's ppc64le Ubuntu jammy server. https://github.com/junaruga/report-ruby-fiber-hung_up-tests The possible differences that may cause the issue is a parallel execution `make -jN`, or compiler flag`-O1`, or `-ggdb3`. https://github.com/junaruga/report-ruby-fiber-hung_up-tests/blob/d94205d9d7f... I also sent the PR to make the stuck/hang tests fail, and it was merged. The tests failing immediately is better than the tests being stuck/hang. https://github.com/ruby/ruby/pull/8791 I hope we find the cause and fix this stuck/hang issue in gcc 11.4.0 in Ubuntu jammy. -- https://bugs.ruby-lang.org/
participants (1)
-
jaruga (Jun Aruga)