
Issue #20394 has been updated by byroot (Jean Boussier). Status changed from Open to Closed
This interpretation mismatch could be a source of vulnerability.
Good catch @mame, that does indeed make the `to_i` proposal much more problematic. I guess I don't really have much of a proposal anymore.
With StringIO you have the best of both worlds!
Not today no. You can look at gems that implement network protocols, I'm yet to find one that uses `StringIO` as a buffer, `StringIO` isn't as convenient as you make it out to be. Maybe it could become that, but it isn't today.
It comes down to: we can either make String a better buffer [...] or make StringIO a better buffer [...]. IMHO the latter seems a better direction in the long run.
The thing is, was StringIO even thought as a buffer in the first place? My understanding is that it's just meant as a facade to pass strings to APIs that expect an IO, nothing more. I don't see any extra methods on it that suggest using it as a buffer. Now for better or for worse, since Ruby strings are mutable, and historically they didn't have an encoding, they're used as buffers everywhere. i.e. look at methods like `IO#read_nonblock`, they take a string parameter called `outbuf`, look at core gems like `net-http`, they use that same "read into a string, then parse the string" pattern, etc. I'd be all for a dedicated Buffer class that allow to efficiently parse text protocols like HTTP and RESP3, but right now all we got is `String`.
I think the timeout stuff is orthogonal to parsing with offsets.
It's not. If you are parsing a stream directly from an IO using blocking methods, you must be able to timeout if the character or pattern you are waiting for never comes. In your example, that `io.gets` could potentially block forever if the IO is a socket and the server is unresponsive or malicious. Hence why most if not all protocol clients use `read_nonblock` into a String and then parse the string.
with String#getbyte much faster than StringIO#getbyte when YJIT is enabled... which I find very strange. Unlike IO, StringIO is just a wrapper over a String object, so in theory there's no reason why they should be so different.
It's not that strange. YJIT has special optimization for `String#getbyte` and some other String methods because they are such hotspots. Additionally, `StringIO` is implemented in C so YJIT can't optimize it, and even if it was re-written in Ruby, it would be some extra method calls that YJIT isn't yet capable of inlining. So in the end I think I'll just implement some small gems that provide the capabilities I need. ---------------------------------------- Feature #20394: Add an offset parameter to `String#to_i` https://bugs.ruby-lang.org/issues/20394#change-107517 * Author: byroot (Jean Boussier) * Status: Closed ---------------------------------------- ### Context I maintain the `redis-client` gem, and it comes with an optional swapable implementation in C that binds the `hiredis` C client, [which used to performs up to 5 times faster in some cases](https://github.com/redis-rb/redis-client/commit/9fabd57c6786a03fe0c6021eab5b...). I recently paired with @tenderlovemaking to try to close this gap, or even try to make the pure Ruby version faster, and we came up with several optimizations that now almost make both version on par (assuming YJIT is enabled). An important source of performance loss, is that the Redis protocol is line based and to parse it in Ruby requires to slice a lot of small strings from the buffer. To give an example, here's how an Array with two String (`["foo", "plop"]`) is serialized in RESP3 (Redis protocol): ``` *2\r\n $3\r\n foo\r\n $4\r\n plop\r\n ``` From this you can understand that a big hotspot in the parser is essentially `Integer(gets)`. With @tenderlovemaking we managed to get [a fairly significant perf boost](https://github.com/redis-rb/redis-client/commit/41b3abe94243d2598211d448c4e4...) by avoiding these string allocation using `String#getbyte` and [basically implementing a rudimentary `String#to_i(offset: )` in Ruby](https://github.com/redis-rb/redis-client/commit/41b3abe94243d2598211d448c4e4...). But while the gains are huge with YJIT enabled, they are much more tame with the interpreter. And it feels a bit wrong to have to implement this sorts of things for performance reasons. ### `String#to_i(offset: )` Similar to `String#unpack(offset:)` ([Feature #18254]), I believe `String#to_i(offset: )` would be useful. ### Alternative new `String#unpack` format Another possibility would be to add a new format to `String#pack` `String#unpack` for decimal numbers. It sounds a bit weird at first, but given it supports things like Base64 and hexadecimal, perhaps it's not that much of a stretch? -- https://bugs.ruby-lang.org/