[ruby-core:122968] [Ruby Misc#21544] CI: Launchable: Early flake detection

15 Aug 2025

      Issue #21544 has been reported by ono-max (Naoto Ono).

----------------------------------------
Misc #21544: CI: Launchable: Early flake detection
https://bugs.ruby-lang.org/issues/21544

* Author: ono-max (Naoto Ono)
* Status: Open
----------------------------------------
I'm a software engineer at Launchable, and I'd like to propose the mechanism to detect potential flaky tests.

## Background

In the current Ruby CI, it's easy for flaky tests to be unintentionally merged into the master branch. Even when tests are known to be flaky, they can often be overlooked because re-running them may  yield a passing result.

Once flaky tests make it into the master branch, they become much harder to debug, and even reverting the changes can be difficult.

Therefore, it's crucial to identify potential flaky tests as early as possible—ideally before merging a pull request into the master branch.

If we can detect a flaky test during the PR stage, we can prevent it from being introduced into the codebase.

## How it works

Here is the basic flow:

1. After the initial run, Launchable provides a list of tests to re-run.

2. The Ruby test framework executes these tests and uploads the results.

3. Using historical data, Launchable detects new or updated tests and returns an updated list.

4. The Ruby test framework runs the new list of tests and uploads the results.

5. Based on these results, Launchable determines which tests need to be re-run.

6. The Ruby test framework executes the identified tests and uploads the results.

7. Repeat step 5 and 6 until Launchable returns an empty test list.

Here is the image:

<img src="CI_workflow.png" alt="My Image" style="width:600px;">

## Concerns

* How much will Ruby CI slow down with this change?

I don't believe this change significantly slow down the CI process. For example, consider a new test that takes 10 seconds to run. Adding 5 seconds to records to record test results to Launchable and another 5 seconds to provide the test list for re-runs, the total time per attempt becomes:

(10 sec + 5 sec + 5 sec) = 20 seconds.

With 5 retries, the total execution time would be:

20 seconds × 5 = 100 seconds.

---Files--------------------------------
CI_workflow.png (266 KB)

-- 
https://bugs.ruby-lang.org/

ono-max (Naoto Ono)

tags

participants (1)