[ruby-core:116879] [Ruby master Feature#20282] Enhancing Ruby's Coverage with Per-Test Coverage Reports

20 Feb 2024

      Issue #20282 has been updated by anmarchenko (Andrey Marchenko).

ioquatix (Samuel Williams) wrote:
...
As Ruby applications grow in complexity, the need for more sophisticated testing and coverage analysis tools becomes paramount. Current coverage tools in Ruby offer a good starting point but fall short in delivering the granularity and flexibility required by modern development practices. Specifically, there is a significant gap in "per-test coverage" reporting, which limits developers' ability to pinpoint exactly which tests exercise which lines of code. This proposal seeks to initiate a discussion around improving Ruby's coverage module to address this gap.
Hi! I think this would be a great addition to Ruby, it would help me to solve an important business case at Datadog.

I work as a library developer at Datadog and we are working on developer tools for test visibility and smart test execution in Ruby. One product that we are working on right now for Ruby is an intelligent test runner https://docs.datadoghq.com/intelligent_test_runner  that saves time for devs by running only relevant tests for a feature branch. We use test impact analysis technique for that https://martinfowler.com/articles/rise-test-impact-analysis.html that requires us to collect per test code coverage for users of our library when they run their test suite using minitest/rspec/cucumber/etc. 

We need to create a solution that:
- tracks code coverage per test while being invisible for users, do not interferes with existing simplecov/covered setup
- has low performance overhead
- works with parallel and sequential test runners

I am evaluating several possible solutions for this case, one being using `Coverage.suspend/resume` with `Coverage.result(stop: false, clear: true)`. The most important problem with this solution for me is not only that it does not support threaded parallel executors, but it also skews existing code coverage that our users might have. All major code coverage libraries in Ruby (simplecov/covered/single_cov) use Coverage under the hood and Coverage has only one global state. So if we would use suspend/resume and clear results after each test, this would break all existing code coverage libraries. I would be delighted to see some API to spy on current code coverage collection and receive what was covered between specific points of time (start/stop of each test) and I think it would benefit many developer tools out there. 

----------------------------------------
Feature #20282: Enhancing Ruby's Coverage with Per-Test Coverage Reports
https://bugs.ruby-lang.org/issues/20282#change-106915

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
----------------------------------------
As Ruby applications grow in complexity, the need for more sophisticated testing and coverage analysis tools becomes paramount. Current coverage tools in Ruby offer a good starting point but fall short in delivering the granularity and flexibility required by modern development practices. Specifically, there is a significant gap in "per-test coverage" reporting, which limits developers' ability to pinpoint exactly which tests exercise which lines of code. This proposal seeks to initiate a discussion around improving Ruby's coverage module to address this gap.

## Objectives

The primary goal of this initiative is to introduce support for per-test coverage reports within Ruby, focusing on three key areas:

1. Scoped Coverage Data Capture: Implementing the capability to capture coverage data within user-defined scopes, such as global, thread, or fiber scopes. This would allow for more granular control over the coverage analysis process.

2. Efficient Data Capture Controls: Developing mechanisms to efficiently control the capture of coverage data. This includes the ability to exclude specific files, include/ignore/merge eval'd code, to ensure that the coverage data is both relevant and manageable.

3. Compatibility and Consistency: Ensuring that the coverage data is exposed in a manner that is consistent with existing coverage tools and standards. This compatibility is crucial for integrating with a wide array of tooling and for facilitating a seamless developer experience.

## Proposed Solutions

The heart of this proposal lies in the introduction of a new subclassable component within the Coverage module, tentatively named `Coverage::Capture`. This component would allow users to define custom coverage capture behaviors tailored to their specific needs. Below is a hypothetical interface for such a mechanism:

```ruby
class Coverage::Capture
  def self.start
    self.new.tap(&:start)
  end

  # Start receiving coverage callbacks.
  def start
  end

  # Stop receiving coverage callbacks.
  def stop
  end

  # User-overridable statement coverage callback.
  def statement(iseq, location)
    fetch(iseq)&.statement_coverage.increment(location)
  end

  # Additional methods for branch/declaration coverage would follow a similar pattern.
end

class MyCoverageCapture < Coverage::Capture
  # Provides efficient data capture controls - can return nil if skipping coverage for this iseq, or can store coverage data per-thread, per-fiber, etc.
  def fetch(iseq)
    @coverage[iseq] ||= Coverage.default_coverage(iseq)
  end  
end

# Usage example:
my_coverage_capture = MyCoverageCapture.start
# Execute test suite or specific tests
my_coverage_capture.stop
# Access detailed coverage data
puts my_coverage_capture.coverage.statement_coverage
```

In addition, we'd need a well defined interface for `Coverage.default_coverage`, which includes line, branch and declaration coverage statistics. I suggest we take inspiration from the proposed interface defined by the vscode text editor: https://github.com/microsoft/vscode/blob/b44593a612337289c079425a5b2cc701021... - this interface was designed to be compatible with a wide range of coverage libraries, so represents the intersection of that functionality.

```ruby
# Hypothetical interface (mostly copied from vscode's proposed interface):
module Coverage
  # Contains coverage metadata for a file
  class Target
    attr_reader :instruction_sequence
    attr_accessor :statement_coverage, :branch_coverage, :declaration_coverage, :detailed_coverage

    # @param statement_coverage [Hash(Location, StatementCoverage)] A hash table of statement coverage instances keyed on location.
    # Similar structures for other coverage data.
    def initialize(instruction_sequence, statement_coverage, branch_coverage=nil, declaration_coverage=nil)
      @instruction_sequence = instruction_sequence
      @statement_coverage = statement_coverage
      @branch_coverage = branch_coverage
      @declaration_coverage = declaration_coverage
    end
  end

  # Coverage information for a single statement or line.
  class StatementCoverage
    # The number of times this statement was executed, or a boolean indicating
    # whether it was executed if the exact count is unknown. If zero or false,
    # the statement will be marked as un-covered.
    attr_accessor :executed

    # Statement location (line number? or range? or position? AST?)
    attr_accessor :location

    # Coverage from branches of this line or statement. If it's not a
    # conditional, this will be empty.
    attr_accessor :branches

    # Initializes a new instance of the StatementCoverage class.
    #
    # @parameter executed [Number, Boolean] The number of times this statement was executed, or a
    # boolean indicating whether it was executed if the exact count is unknown. If zero or false,
    # the statement will be marked as un-covered.
    #
    # @parameter location [Position, Range] The statement position.
    #
    # @parameter branches [Array(BranchCoverage)] Coverage from branches of this line.
    # If it's not a conditional, this should be omitted.
    def initialize(executed, location, branches=[])
      @executed = executed
      @location = location
      @branches = branches
    end
  end

  # Coverage information for a branch
  class BranchCoverage
    # The number of times this branch was executed, or a boolean indicating
    # whether it was executed if the exact count is unknown. If zero or false,
    # the branch will be marked as un-covered.
    attr_accessor :executed

    # Branch location.
    attr_accessor :location

    # Label for the branch, used in the context of "the ${label} branch was
    # not taken," for example.
    attr_accessor :label

    # Initializes a new instance of the BranchCoverage class.
    #
    # @param executed [Number, Boolean] The number of times this branch was executed, or a
    # boolean indicating whether it was executed if the exact count is unknown. If zero or false,
    # the branch will be marked as un-covered.
    #
    # @param location [Position, Range] (optional) The branch position.
    #
    # @param label [String] (optional) Label for the branch, used in the context of
    # "the ${label} branch was not taken," for example.
    def initialize(executed, location=nil, label=nil)
      @executed = executed
      @location = location
      @label = label
    end
  end

  # Coverage information for a declaration
  class DeclarationCoverage
    # Name of the declaration. Depending on the reporter and language, this
    # may be types such as functions, methods, or namespaces.
    attr_accessor :name

    # The number of times this declaration was executed, or a boolean
    # indicating whether it was executed if the exact count is unknown. If
    # zero or false, the declaration will be marked as un-covered.
    attr_accessor :executed

    # Declaration location.
    attr_accessor :location

    # Initializes a new instance of the DeclarationCoverage class.
    #
    # @param name [String] Name of the declaration.
    #
    # @param executed [Number, Boolean] The number of times this declaration was executed, or a
    # boolean indicating whether it was executed if the exact count is unknown. If zero or false,
    # the declaration will be marked as un-covered.
    #
    # @param location [Position, Range] The declaration position.
    def initialize(name, executed, location)
      @name = name
      @executed = executed
      @location = location
    end
  end
end
```

By following this format, we will be compatible with a wide range of external tools.

-- 
https://bugs.ruby-lang.org/