
Issue #20394 has been updated by shan (Shannon Skipper). Dan0042 (Daniel DeLorme) wrote in #note-4:
It doesn't seem like String#getbyte is much faster than File#getbyte, and StringIO#getbyte is fastest of all.
I'm seeing a similar result to what you show above with YJIT disabled, but `str.getbyte(i)` seems to pull ahead substantially with YJIT enabled on macOS and Linux with both Ruby 3.3 and nightly. ``` ruby 3.3.0 (2023-12-25 revision 5124f9ac75) +YJIT [arm64-darwin23] Calculating ------------------------------------- fd.getbyte 114.407 (± 0.9%) i/s - 575.000 in 5.026157s io.getbyte 148.602 (± 0.7%) i/s - 756.000 in 5.087645s str.getbyte(i) 261.846 (± 0.8%) i/s - 1.310k in 5.003151s Comparison: str.getbyte(i): 261.8 i/s io.getbyte: 148.6 i/s - 1.76x slower fd.getbyte: 114.4 i/s - 2.29x slower ``` ---------------------------------------- Feature #20394: Add an offset parameter to `String#to_i` https://bugs.ruby-lang.org/issues/20394#change-107476 * Author: byroot (Jean Boussier) * Status: Open ---------------------------------------- ### Context I maintain the `redis-client` gem, and it comes with an optional swapable implementation in C that binds the `hiredis` C client, [which used to performs up to 5 times faster in some cases](https://github.com/redis-rb/redis-client/commit/9fabd57c6786a03fe0c6021eab5b...). I recently paired with @tenderlovemaking to try to close this gap, or even try to make the pure Ruby version faster, and we came up with several optimizations that now almost make both version on par (assuming YJIT is enabled). An important source of performance loss, is that the Redis protocol is line based and to parse it in Ruby requires to slice a lot of small strings from the buffer. To give an example, here's how an Array with two String (`["foo", "plop"]`) is serialized in RESP3 (Redis protocol): ``` *2\r\n $3\r\n foo\r\n $4\r\n plop\r\n ``` From this you can understand that a big hotspot in the parser is essentially `Integer(gets)`. With @tenderlovemaking we managed to get [a fairly significant perf boost](https://github.com/redis-rb/redis-client/commit/41b3abe94243d2598211d448c4e4...) by avoiding these string allocation using `String#getbyte` and [basically implementing a rudimentary `String#to_i(offset: )` in Ruby](https://github.com/redis-rb/redis-client/commit/41b3abe94243d2598211d448c4e4...). But while the gains are huge with YJIT enabled, they are much more tame with the interpreter. And it feels a bit wrong to have to implement this sorts of things for performance reasons. ### `String#to_i(offset: )` Similar to `String#unpack(offset:)` ([Feature #18254]), I believe `String#to_i(offset: )` would be useful. ### Alternative new `String#unpack` format Another possibility would be to add a new format to `String#pack` `String#unpack` for decimal numbers. It sounds a bit weird at first, but given it supports things like Base64 and hexadecimal, perhaps it's not that much of a stretch? -- https://bugs.ruby-lang.org/