
Issue #20394 has been updated by Dan0042 (Daniel DeLorme). mame (Yusuke Endoh) wrote in #note-6:
Generalizing, we may want `IO#scanf`, but that's probably overkill?
This previously existed in the stdlib but was removed: #16170 byroot (Jean Boussier) wrote in #note-7:
Indeed, but the problem is that you then have very few methods to parse values or peak in the buffer to find elements. Lots of methods needed for parsing various protocols are present on `String`, but not `IO`.
With StringIO you have the best of both worlds! You can use the methods with built-in #pos, like #getbyte, or drop down to the underlying #string if you want to use specialized methods like #unpack or #scan ! We could even add #unpack and #scan to StringIO and they would start the unpack/scan operation at the current #pos. As well as any other methods needed for parsing various protocols. It comes down to: we can either make String a better buffer (adding `#to_i(offset:)`), or make StringIO a better buffer (adding `#get_integer`). IMHO the latter seems a better direction in the long run.
Also until recently, methods like `IO#gets` didn't have a way to be timedout, so they weren't safe to use in such context. In 3.2+ there's now `IO#timeout=`, but as I said parsing directly from the IO turned out much slower in my attempts, but perhaps I didn't do it well.
I think the timeout stuff is orthogonal to parsing with offsets. Let's compare String vs StringIO so we can ignore timeouts and focus on the API design of #to_i vs #get_integer ```ruby # @str is String @str << io.gets case @str[@offset] #@offset is character position, so this may be O(n) operation when "*" @offset += 1 nb_elements = @str.to_i(offset: @offset) @offset += ??? end # @io is StringIO @io.string << io.gets case @io.getc when "*" nb_elements = @io.get_integer end ``` It seems to me that the Redis protocol, like most wire protocols, is designed to be parsed in a streaming fashion, so a stream-oriented API like StringIO works pretty well. But I recognize the speed aspect is important, with String#getbyte much faster than StringIO#getbyte when YJIT is enabled... which I find very strange. Unlike IO, StringIO is just a wrapper over a String object, so in theory there's no reason why they should be so different. ---------------------------------------- Feature #20394: Add an offset parameter to `String#to_i` https://bugs.ruby-lang.org/issues/20394#change-107509 * Author: byroot (Jean Boussier) * Status: Open ---------------------------------------- ### Context I maintain the `redis-client` gem, and it comes with an optional swapable implementation in C that binds the `hiredis` C client, [which used to performs up to 5 times faster in some cases](https://github.com/redis-rb/redis-client/commit/9fabd57c6786a03fe0c6021eab5b...). I recently paired with @tenderlovemaking to try to close this gap, or even try to make the pure Ruby version faster, and we came up with several optimizations that now almost make both version on par (assuming YJIT is enabled). An important source of performance loss, is that the Redis protocol is line based and to parse it in Ruby requires to slice a lot of small strings from the buffer. To give an example, here's how an Array with two String (`["foo", "plop"]`) is serialized in RESP3 (Redis protocol): ``` *2\r\n $3\r\n foo\r\n $4\r\n plop\r\n ``` From this you can understand that a big hotspot in the parser is essentially `Integer(gets)`. With @tenderlovemaking we managed to get [a fairly significant perf boost](https://github.com/redis-rb/redis-client/commit/41b3abe94243d2598211d448c4e4...) by avoiding these string allocation using `String#getbyte` and [basically implementing a rudimentary `String#to_i(offset: )` in Ruby](https://github.com/redis-rb/redis-client/commit/41b3abe94243d2598211d448c4e4...). But while the gains are huge with YJIT enabled, they are much more tame with the interpreter. And it feels a bit wrong to have to implement this sorts of things for performance reasons. ### `String#to_i(offset: )` Similar to `String#unpack(offset:)` ([Feature #18254]), I believe `String#to_i(offset: )` would be useful. ### Alternative new `String#unpack` format Another possibility would be to add a new format to `String#pack` `String#unpack` for decimal numbers. It sounds a bit weird at first, but given it supports things like Base64 and hexadecimal, perhaps it's not that much of a stretch? -- https://bugs.ruby-lang.org/