[ruby-core:124312] [Ruby Feature#21796] unpack variant that returns the final offset
Issue #21796 has been reported by nobu (Nobuyoshi Nakada). ---------------------------------------- Feature #21796: unpack variant that returns the final offset https://bugs.ruby-lang.org/issues/21796 * Author: nobu (Nobuyoshi Nakada) * Status: Open ---------------------------------------- mame (Yusuke Endoh) wrote in #note-4:
It's a shame `unpack` doesn't tell you how many bytes it read. You'd probably want a `unpack` variant that returns the final offset too, or a specifier that returns the current offset (like `o`?).
```ruby bytes = "\x01\x02\x03" offset = 0 leb128_value1, offset = bytes.unpack("Ro", offset: offset) #=> 1 leb128_value2, offset = bytes.unpack("Ro", offset: offset) #=> 2 leb128_value3, offset = bytes.unpack("Ro", offset: offset) #=> 3 ```
mame (Yusuke Endoh) wrote in #note-6:
You could tell how many bytes you read based on the size of the leb128_value returned.
That apparoach is unreliable because LEB128 is redundant. For example, both `"\x03"` and `"\x83\x00"` are valid LEB128 encodings of the value 3. See the note of the section Values - Integers, in the Wasm spec. https://webassembly.github.io/spec/core/binary/values.html#integers
Issue #21796 has been updated by byroot (Jean Boussier). It would be useful indeed, but I'm not sure a new method is the best way? I think the simplest would be a new keyword parameter: ```ruby offset, *values = bytes.unpack("Ro", offset: offset, return_offset:true) ``` Another possibility would be to add an `unpack` like method to `StringScanner`, for the case where you want to iteratively deserialize a binary string. ---------------------------------------- Feature #21796: unpack variant that returns the final offset https://bugs.ruby-lang.org/issues/21796#change-115816 * Author: nobu (Nobuyoshi Nakada) * Status: Open ---------------------------------------- mame (Yusuke Endoh) wrote in [#note-4](https://bugs.ruby-lang.org/issues/21785#note-4):
It's a shame `unpack` doesn't tell you how many bytes it read. You'd probably want a `unpack` variant that returns the final offset too, or a specifier that returns the current offset (like `o`?).
```ruby bytes = "\x01\x02\x03" offset = 0 leb128_value1, offset = bytes.unpack("Ro", offset: offset) #=> 1 leb128_value2, offset = bytes.unpack("Ro", offset: offset) #=> 2 leb128_value3, offset = bytes.unpack("Ro", offset: offset) #=> 3 ```
mame (Yusuke Endoh) wrote in [#note-6](https://bugs.ruby-lang.org/issues/21785#note-6):
You could tell how many bytes you read based on the size of the leb128_value returned.
That apparoach is unreliable because LEB128 is redundant. For example, both `"\x03"` and `"\x83\x00"` are valid LEB128 encodings of the value 3. See the note of the section Values - Integers, in the Wasm spec. https://webassembly.github.io/spec/core/binary/values.html#integers
Issue #21796 has been updated by tenderlovemaking (Aaron Patterson). I really like this idea. @jhawthorn suggested `^` instead of `o` though, and I really like it. ```ruby bytes = "\x01\x02\x03" offset = 0 leb128_value1, offset = bytes.unpack("R^", offset: offset) #=> 1 leb128_value2, offset = bytes.unpack("R^", offset: offset) #=> 2 leb128_value3, offset = bytes.unpack("R^", offset: offset) #=> 3 ```
I think the simplest would be a new keyword parameter
Why a new parameter? You might be interested in more than one location. We already have [pack directives for skipping bytes](https://github.com/ruby/ruby/blob/master/doc/language/packed_data.rdoc#addit...) (`@`, `X`, and `x`). It seems natural to add a directive to return the current offset.
Another possibility would be to add an unpack like method to StringScanner, for the case where you want to iteratively deserialize a binary string.
I think this would be very useful in general, but I think maybe a separate Redmine ticket? ---------------------------------------- Feature #21796: unpack variant that returns the final offset https://bugs.ruby-lang.org/issues/21796#change-115830 * Author: nobu (Nobuyoshi Nakada) * Status: Open ---------------------------------------- mame (Yusuke Endoh) wrote in [#note-4](https://bugs.ruby-lang.org/issues/21785#note-4):
It's a shame `unpack` doesn't tell you how many bytes it read. You'd probably want a `unpack` variant that returns the final offset too, or a specifier that returns the current offset (like `o`?).
```ruby bytes = "\x01\x02\x03" offset = 0 leb128_value1, offset = bytes.unpack("Ro", offset: offset) #=> 1 leb128_value2, offset = bytes.unpack("Ro", offset: offset) #=> 2 leb128_value3, offset = bytes.unpack("Ro", offset: offset) #=> 3 ```
mame (Yusuke Endoh) wrote in [#note-6](https://bugs.ruby-lang.org/issues/21785#note-6):
You could tell how many bytes you read based on the size of the leb128_value returned.
That apparoach is unreliable because LEB128 is redundant. For example, both `"\x03"` and `"\x83\x00"` are valid LEB128 encodings of the value 3. See the note of the section Values - Integers, in the Wasm spec. https://webassembly.github.io/spec/core/binary/values.html#integers
Issue #21796 has been updated by byroot (Jean Boussier).
Why a new parameter?
because I misread the ticket, I didn't notice the `o`. I do think `^` for offset is pure genius though. ---------------------------------------- Feature #21796: unpack variant that returns the final offset https://bugs.ruby-lang.org/issues/21796#change-115833 * Author: nobu (Nobuyoshi Nakada) * Status: Open ---------------------------------------- mame (Yusuke Endoh) wrote in [#note-4](https://bugs.ruby-lang.org/issues/21785#note-4):
It's a shame `unpack` doesn't tell you how many bytes it read. You'd probably want a `unpack` variant that returns the final offset too, or a specifier that returns the current offset (like `o`?).
```ruby bytes = "\x01\x02\x03" offset = 0 leb128_value1, offset = bytes.unpack("Ro", offset: offset) #=> 1 leb128_value2, offset = bytes.unpack("Ro", offset: offset) #=> 2 leb128_value3, offset = bytes.unpack("Ro", offset: offset) #=> 3 ```
mame (Yusuke Endoh) wrote in [#note-6](https://bugs.ruby-lang.org/issues/21785#note-6):
You could tell how many bytes you read based on the size of the leb128_value returned.
That apparoach is unreliable because LEB128 is redundant. For example, both `"\x03"` and `"\x83\x00"` are valid LEB128 encodings of the value 3. See the note of the section Values - Integers, in the Wasm spec. https://webassembly.github.io/spec/core/binary/values.html#integers
Issue #21796 has been updated by matz (Yukihiro Matsumoto). I like `^` specifier too. Matz. ---------------------------------------- Feature #21796: unpack variant that returns the final offset https://bugs.ruby-lang.org/issues/21796#change-115856 * Author: nobu (Nobuyoshi Nakada) * Status: Open ---------------------------------------- mame (Yusuke Endoh) wrote in [#note-4](https://bugs.ruby-lang.org/issues/21785#note-4):
It's a shame `unpack` doesn't tell you how many bytes it read. You'd probably want a `unpack` variant that returns the final offset too, or a specifier that returns the current offset (like `o`?).
```ruby bytes = "\x01\x02\x03" offset = 0 leb128_value1, offset = bytes.unpack("Ro", offset: offset) #=> 1 leb128_value2, offset = bytes.unpack("Ro", offset: offset) #=> 2 leb128_value3, offset = bytes.unpack("Ro", offset: offset) #=> 3 ```
mame (Yusuke Endoh) wrote in [#note-6](https://bugs.ruby-lang.org/issues/21785#note-6):
You could tell how many bytes you read based on the size of the leb128_value returned.
That apparoach is unreliable because LEB128 is redundant. For example, both `"\x03"` and `"\x83\x00"` are valid LEB128 encodings of the value 3. See the note of the section Values - Integers, in the Wasm spec. https://webassembly.github.io/spec/core/binary/values.html#integers
Issue #21796 has been updated by nobu (Nobuyoshi Nakada). This might be useful for `A`, `a`, and `Z` as well. Updated the PR to use `^` with the tests. ---------------------------------------- Feature #21796: unpack variant that returns the final offset https://bugs.ruby-lang.org/issues/21796#change-115898 * Author: nobu (Nobuyoshi Nakada) * Status: Open ---------------------------------------- mame (Yusuke Endoh) wrote in [#note-4](https://bugs.ruby-lang.org/issues/21785#note-4):
It's a shame `unpack` doesn't tell you how many bytes it read. You'd probably want a `unpack` variant that returns the final offset too, or a specifier that returns the current offset (like `o`?).
```ruby bytes = "\x01\x02\x03" offset = 0 leb128_value1, offset = bytes.unpack("Ro", offset: offset) #=> 1 leb128_value2, offset = bytes.unpack("Ro", offset: offset) #=> 2 leb128_value3, offset = bytes.unpack("Ro", offset: offset) #=> 3 ```
mame (Yusuke Endoh) wrote in [#note-6](https://bugs.ruby-lang.org/issues/21785#note-6):
You could tell how many bytes you read based on the size of the leb128_value returned.
That apparoach is unreliable because LEB128 is redundant. For example, both `"\x03"` and `"\x83\x00"` are valid LEB128 encodings of the value 3. See the note of the section Values - Integers, in the Wasm spec. https://webassembly.github.io/spec/core/binary/values.html#integers
Issue #21796 has been updated by matz (Yukihiro Matsumoto). Go ahead. Matz. ---------------------------------------- Feature #21796: unpack variant that returns the final offset https://bugs.ruby-lang.org/issues/21796#change-116388 * Author: nobu (Nobuyoshi Nakada) * Status: Open ---------------------------------------- mame (Yusuke Endoh) wrote in [#note-4](https://bugs.ruby-lang.org/issues/21785#note-4):
It's a shame `unpack` doesn't tell you how many bytes it read. You'd probably want a `unpack` variant that returns the final offset too, or a specifier that returns the current offset (like `o`?).
```ruby bytes = "\x01\x02\x03" offset = 0 leb128_value1, offset = bytes.unpack("Ro", offset: offset) #=> 1 leb128_value2, offset = bytes.unpack("Ro", offset: offset) #=> 2 leb128_value3, offset = bytes.unpack("Ro", offset: offset) #=> 3 ```
mame (Yusuke Endoh) wrote in [#note-6](https://bugs.ruby-lang.org/issues/21785#note-6):
You could tell how many bytes you read based on the size of the leb128_value returned.
That apparoach is unreliable because LEB128 is redundant. For example, both `"\x03"` and `"\x83\x00"` are valid LEB128 encodings of the value 3. See the note of the section Values - Integers, in the Wasm spec. https://webassembly.github.io/spec/core/binary/values.html#integers
participants (4)
-
byroot (Jean Boussier) -
matz (Yukihiro Matsumoto) -
nobu (Nobuyoshi Nakada) -
tenderlovemaking (Aaron Patterson)