[ruby-core:124940] [Ruby Feature#21943] Add StringScanner#get_int to extract capture group as Integer without intermediate String

6 Mar 2026

      Issue #21943 has been updated by Eregon (Benoit Daloze).

jinroq (Jinroq SAITOH) wrote:
...
In the context of `Date._strptime("%Y-%m-%d")`, this overhead is a significant portion of the total parse time, as shown in earlier profiling:
| Operation | Time |
| :--- | :--- |
| C ext `_strptime` (reference) | 408 ns |
| SC.new + scan + captures + `.to_i` x3 | 1,210 ns |
| Pure Ruby `_strptime_ymd` total | 1,290 ns |
The capture extraction + `.to_i` conversion accounts for roughly 40% of the total parse time. `get_int` directly reduces this portion.
This part is not clear to me, notably what does that 40% refer to.

What I would expect is the measurement of the pure-Ruby `strptime`, with and without `StringScanner#get_int`.
Then we could could see how much it helps for the Date use case.

----------------------------------------
Feature #21943: Add StringScanner#get_int to extract capture group as Integer without intermediate String
https://bugs.ruby-lang.org/issues/21943#change-116619

* Author: jinroq (Jinroq SAITOH)
* Status: Open
----------------------------------------
## Motivation

The date library is being [rewritten from C to pure Ruby](https://github.com/ruby/date/pull/155). During this effort, `Date._strptime` was identified as a major performance bottleneck. Profiling revealed that the root cause is the overhead of extracting capture groups as Strings and then converting them to Integers:

```
sc.scan(/(\d{4})-(\d{2})-(\d{2})/)
year = sc[1].to_i   # allocates String "2024", converts to Integer, discards String
mon  = sc[2].to_i   # allocates String "06",   converts to Integer, discards String
mday = sc[3].to_i   # allocates String "15",   converts to Integer, discards String
```

Each `sc[n].to_i` call allocates a temporary String object that is immediately discarded. When parsing dates, only the integer values are needed — the intermediate Strings serve no purpose.

In the C implementation of date, matched byte ranges are converted directly to integers without any String allocation. The pure Ruby version cannot do this with the current StringScanner API.

## Proposal

Add `StringScanner#get_int(index`) that returns the captured substring at the given index as an Integer, converting directly from the matched byte range at the C level without allocating an intermediate String object.

```
scanner = StringScanner.new("2024-06-15")
scanner.scan(/(\d{4})-(\d{2})-(\d{2})/)
scanner.get_int(1)  # => 2024
scanner.get_int(2)  # => 6
scanner.get_int(3)  # => 15
```

It returns `nil` in the same cases where `scanner[index]` would return `nil` (no match, index out of range, optional group did not participate).

## Use case

The primary use case is `Date._strptime` in the pure Ruby date library. The fast path for `%Y-%m-%d` format currently does:

```
# Current: 3 temporary String allocations
sc.scan(/(\d{4})-(\d{2})-(\d{2})/)
year = sc[1].to_i
mon  = sc[2].to_i
mday = sc[3].to_i
```

With `get_int`:

```
# Proposed: 0 temporary String allocations
sc.scan(/(\d{4})-(\d{2})-(\d{2})/)
year = sc.get_int(1)
mon  = sc.get_int(2)
mday = sc.get_int(3)
```

This pattern appears throughout `_strptime` for every date/time component (`%H`, `%M`, `%S`, `%m`, `%d`, etc.), so the cumulative impact is significant.

## Benchmark

Environment: Ruby 4.0.1, x86_64-linux

| Operation | i/s | per iteration | Comparison |
| :--- | :--- | :--- | :--- |
| **sc.get_int(n)** | 1,029,041.7 | 971.78 ns/i | (Reference) |
| **sc[n].to_i** | 791,945.6 | 1.26 μs/i | 1.30x slower |

`get_int` is 1.30x faster than `sc[n].to_i` for a typical date parsing scenario (3 capture groups). The improvement comes from eliminating 3 temporary String allocations per call.

In the context of `Date._strptime("%Y-%m-%d")`, this overhead is a significant portion of the total parse time, as shown in earlier profiling:

| Operation | Time |
| :--- | :--- |
| C ext `_strptime` (reference) | 408 ns |
| SC.new + scan + captures + `.to_i` x3 | 1,210 ns |
| Pure Ruby `_strptime_ymd` total | 1,290 ns |

The capture extraction + `.to_i` conversion accounts for roughly 40% of the total parse time. `get_int` directly reduces this portion.

## Implementation

A working implementation is available. It reuses the same index resolution logic as `StringScanner#[]` (including negative indices) but calls `rb_cstr2inum` on the matched byte range instead of `extract_range`, avoiding String object allocation entirely.

-- 
https://bugs.ruby-lang.org/