
Issue #20576 has been updated by Eregon (Benoit Daloze). Does this difference matter in realistic usages (e.g. that net-imap one)? How much improvement is it there? Regarding naming, `byteend` seems hard to read, I think `byte_begin`/`byte_end` is much clearer. ---------------------------------------- Feature #20576: Add MatchData#bytebegin and MatchData#byteend https://bugs.ruby-lang.org/issues/20576#change-108807 * Author: shugo (Shugo Maeda) * Status: Open * Target version: 3.4 ---------------------------------------- I'd like to propose MatchData#bytebegin and MatchData#byteend. These methods are similar to MatchData#begin and MatchData#end, but returns offsets in bytes instead of codepoints. Pull request: https://github.com/ruby/ruby/pull/10973 One of the use cases is scanning strings: https://github.com/ruby/net-imap/pull/286/files MatchData#byteend is faster than MatchData#byteoffset because there is no need to allocate an Array. Here's a benchmark result: ``` voyager:ruby$ cat b.rb require "benchmark" require "strscan" text = "あ" * 100000 Benchmark.bmbm do |b| b.report("byteoffset(0)[1]") do pos = 0 while text.byteindex(/\G./, pos) pos = $~.byteoffset(0)[1] end end b.report("byteend(0)") do pos = 0 while text.byteindex(/\G./, pos) pos = $~.byteend(0) end end end voyager:ruby$ ./tool/runruby.rb b.rb Rehearsal ---------------------------------------------------- byteoffset(0)[1] 0.020558 0.000393 0.020951 ( 0.020963) byteend(0) 0.018149 0.000000 0.018149 ( 0.018151) ------------------------------------------- total: 0.039100sec user system total real byteoffset(0)[1] 0.020821 0.000000 0.020821 ( 0.020822) byteend(0) 0.017455 0.000000 0.017455 ( 0.017455) ``` -- https://bugs.ruby-lang.org/