[ruby-core:118301] [Ruby master Feature#20576] Add MatchData#bytebegin and MatchData#byteend

12 Jun 2024

      Issue #20576 has been updated by Eregon (Benoit Daloze).

Does this difference matter in realistic usages (e.g. that net-imap one)? How much improvement is it there?

Regarding naming, `byteend` seems hard to read, I think `byte_begin`/`byte_end` is much clearer.

----------------------------------------
Feature #20576: Add MatchData#bytebegin and MatchData#byteend
https://bugs.ruby-lang.org/issues/20576#change-108807

* Author: shugo (Shugo Maeda)
* Status: Open
* Target version: 3.4
----------------------------------------
I'd like to propose MatchData#bytebegin and MatchData#byteend.
These methods are similar to MatchData#begin and MatchData#end, but returns offsets in bytes instead of codepoints.

Pull request: https://github.com/ruby/ruby/pull/10973

One of the use cases is scanning strings: https://github.com/ruby/net-imap/pull/286/files
MatchData#byteend is faster than MatchData#byteoffset because there is no need to allocate an Array.
Here's a benchmark result:

```
voyager:ruby$ cat b.rb 
require "benchmark"
require "strscan"

text = "あ" * 100000

Benchmark.bmbm do |b|
  b.report("byteoffset(0)[1]") do
    pos = 0
    while text.byteindex(/\G./, pos)
      pos = $~.byteoffset(0)[1]
    end
  end

  b.report("byteend(0)") do
    pos = 0
    while text.byteindex(/\G./, pos)
      pos = $~.byteend(0)
    end
  end
end
voyager:ruby$ ./tool/runruby.rb b.rb           
Rehearsal ----------------------------------------------------
byteoffset(0)[1]   0.020558   0.000393   0.020951 (  0.020963)
byteend(0)         0.018149   0.000000   0.018149 (  0.018151)
------------------------------------------- total: 0.039100sec

                       user     system      total        real
byteoffset(0)[1]   0.020821   0.000000   0.020821 (  0.020822)
byteend(0)         0.017455   0.000000   0.017455 (  0.017455)
```

-- 
https://bugs.ruby-lang.org/