[ruby-core:124258] [Ruby Feature#21785] Add signed and unsigned LEB128 support to pack / unpack
Issue #21785 has been reported by tenderlovemaking (Aaron Patterson). ---------------------------------------- Feature #21785: Add signed and unsigned LEB128 support to pack / unpack https://bugs.ruby-lang.org/issues/21785 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- Hi, I'd like to add signed and unsigned LEB128 support to the pack and unpack methods. LEB128 is a variable length encoding scheme for integers. You can read the wikipedia entry about it here: https://en.wikipedia.org/wiki/LEB128 LEB128 is used in DWARF, WebAssembly, MQTT, and Protobuf. I'm sure there are other formats, but these are the ones I'm familiar with. I sent a pull request here: https://github.com/ruby/ruby/pull/15589 I'm proposing `K` for the unsigned version and `k` for the signed version. I just picked `k` because it was available, I'm open to other format strings. Thanks for consideration! -- https://bugs.ruby-lang.org/
Issue #21785 has been updated by tenderlovemaking (Aaron Patterson). Sorry, I probably should have put an example in the original post. Here is a sample of the usage: ``` irb(main):003> [0xFFF].pack("K") => "\xFF\x1F" irb(main):004> [0xFFF].pack("K").unpack1("K") => 4095 irb(main):005> [-123].pack("k") => "\x85\x7F" irb(main):006> [-123].pack("k").unpack1("k") => -123 ``` ---------------------------------------- Feature #21785: Add signed and unsigned LEB128 support to pack / unpack https://bugs.ruby-lang.org/issues/21785#change-115753 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- Hi, I'd like to add signed and unsigned LEB128 support to the pack and unpack methods. LEB128 is a variable length encoding scheme for integers. You can read the wikipedia entry about it here: https://en.wikipedia.org/wiki/LEB128 LEB128 is used in DWARF, WebAssembly, MQTT, and Protobuf. I'm sure there are other formats, but these are the ones I'm familiar with. I sent a pull request here: https://github.com/ruby/ruby/pull/15589 I'm proposing `K` for the unsigned version and `k` for the signed version. I just picked `k` because it was available, I'm open to other format strings. Thanks for consideration! -- https://bugs.ruby-lang.org/
Issue #21785 has been updated by matz (Yukihiro Matsumoto). I am positive about the addition of LEB128. But I don't really like K/k because it doesn't remind me of LEB128 at all (though I know we've used L, E, B already). Given that the only case pairs not yet used are k, r, and y, either R (vaRiable length), or Y (next to W - BER) would be better than K/k. Matz. ---------------------------------------- Feature #21785: Add signed and unsigned LEB128 support to pack / unpack https://bugs.ruby-lang.org/issues/21785#change-115762 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- Hi, I'd like to add signed and unsigned LEB128 support to the pack and unpack methods. LEB128 is a variable length encoding scheme for integers. You can read the wikipedia entry about it here: https://en.wikipedia.org/wiki/LEB128 LEB128 is used in DWARF, WebAssembly, MQTT, and Protobuf. I'm sure there are other formats, but these are the ones I'm familiar with. I sent a pull request here: https://github.com/ruby/ruby/pull/15589 I'm proposing `K` for the unsigned version and `k` for the signed version. I just picked `k` because it was available, I'm open to other format strings. Thanks for consideration! -- https://bugs.ruby-lang.org/
Issue #21785 has been updated by tenderlovemaking (Aaron Patterson). matz (Yukihiro Matsumoto) wrote in #note-2:
I am positive about the addition of LEB128. But I don't really like K/k because it doesn't remind me of LEB128 at all (though I know we've used L, E, B already).
Given that the only case pairs not yet used are k, r, and y, either R (vaRiable length), or Y (next to W - BER) would be better than K/k.
Matz.
Thanks for the feedback. I've updated the patch to use R/r! ---------------------------------------- Feature #21785: Add signed and unsigned LEB128 support to pack / unpack https://bugs.ruby-lang.org/issues/21785#change-115766 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- Hi, I'd like to add signed and unsigned LEB128 support to the pack and unpack methods. LEB128 is a variable length encoding scheme for integers. You can read the wikipedia entry about it here: https://en.wikipedia.org/wiki/LEB128 LEB128 is used in DWARF, WebAssembly, MQTT, and Protobuf. I'm sure there are other formats, but these are the ones I'm familiar with. I sent a pull request here: https://github.com/ruby/ruby/pull/15589 I'm proposing `K` for the unsigned version and `k` for the signed version. I just picked `k` because it was available, I'm open to other format strings. Thanks for consideration! -- https://bugs.ruby-lang.org/
Issue #21785 has been updated by mame (Yusuke Endoh). It's a shame `unpack` doesn't tell you how many bytes it read. You'd probably want a `unpack` variant that returns the final offset too, or a specifier that returns the current offset (like `o`?). ```ruby bytes = "\x01\x02\x03" offset = 0 leb128_value1, offset = bytes.unpack("Ro", offset: offset) #=> 1 leb128_value2, offset = bytes.unpack("Ro", offset: offset) #=> 2 leb128_value3, offset = bytes.unpack("Ro", offset: offset) #=> 3 ``` ---------------------------------------- Feature #21785: Add signed and unsigned LEB128 support to pack / unpack https://bugs.ruby-lang.org/issues/21785#change-115787 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- Hi, I'd like to add signed and unsigned LEB128 support to the pack and unpack methods. LEB128 is a variable length encoding scheme for integers. You can read the wikipedia entry about it here: https://en.wikipedia.org/wiki/LEB128 LEB128 is used in DWARF, WebAssembly, MQTT, and Protobuf. I'm sure there are other formats, but these are the ones I'm familiar with. I sent a pull request here: https://github.com/ruby/ruby/pull/15589 I'm proposing `K` for the unsigned version and `k` for the signed version. I just picked `k` because it was available, I'm open to other format strings. Thanks for consideration! -- https://bugs.ruby-lang.org/
Issue #21785 has been updated by tenderlovemaking (Aaron Patterson). mame (Yusuke Endoh) wrote in #note-4:
It's a shame `unpack` doesn't tell you how many bytes it read. You'd probably want a `unpack` variant that returns the final offset too, or a specifier that returns the current offset (like `o`?).
```ruby bytes = "\x01\x02\x03" offset = 0 leb128_value1, offset = bytes.unpack("Ro", offset: offset) #=> 1 leb128_value2, offset = bytes.unpack("Ro", offset: offset) #=> 2 leb128_value3, offset = bytes.unpack("Ro", offset: offset) #=> 3 ```
You could tell how many bytes you read based on the size of the leb128_value returned. But I agree, getting the information directly from `unpack` would be nice. ---------------------------------------- Feature #21785: Add signed and unsigned LEB128 support to pack / unpack https://bugs.ruby-lang.org/issues/21785#change-115792 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- Hi, I'd like to add signed and unsigned LEB128 support to the pack and unpack methods. LEB128 is a variable length encoding scheme for integers. You can read the wikipedia entry about it here: https://en.wikipedia.org/wiki/LEB128 LEB128 is used in DWARF, WebAssembly, MQTT, and Protobuf. I'm sure there are other formats, but these are the ones I'm familiar with. I sent a pull request here: https://github.com/ruby/ruby/pull/15589 I'm proposing `K` for the unsigned version and `k` for the signed version. I just picked `k` because it was available, I'm open to other format strings. Thanks for consideration! -- https://bugs.ruby-lang.org/
Issue #21785 has been updated by mame (Yusuke Endoh).
You could tell how many bytes you read based on the size of the leb128_value returned.
That apparoach is unreliable because LEB128 is redundant. For example, both `"\x03"` and `"\x83\x00"` are valid LEB128 encodings of the value 3. See the note of the section Values - Integers, in the Wasm spec. https://webassembly.github.io/spec/core/binary/values.html#integers ---------------------------------------- Feature #21785: Add signed and unsigned LEB128 support to pack / unpack https://bugs.ruby-lang.org/issues/21785#change-115796 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- Hi, I'd like to add signed and unsigned LEB128 support to the pack and unpack methods. LEB128 is a variable length encoding scheme for integers. You can read the wikipedia entry about it here: https://en.wikipedia.org/wiki/LEB128 LEB128 is used in DWARF, WebAssembly, MQTT, and Protobuf. I'm sure there are other formats, but these are the ones I'm familiar with. I sent a pull request here: https://github.com/ruby/ruby/pull/15589 I'm proposing `K` for the unsigned version and `k` for the signed version. I just picked `k` because it was available, I'm open to other format strings. Thanks for consideration! -- https://bugs.ruby-lang.org/
Issue #21785 has been updated by tenderlovemaking (Aaron Patterson). mame (Yusuke Endoh) wrote in #note-6:
That apparoach is unreliable because LEB128 is redundant. For example, both `"\x03"` and `"\x83\x00"` are valid LEB128 encodings of the value 3.
Ah of course. I didn't think about that. 🤦♀️ ---------------------------------------- Feature #21785: Add signed and unsigned LEB128 support to pack / unpack https://bugs.ruby-lang.org/issues/21785#change-115802 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- Hi, I'd like to add signed and unsigned LEB128 support to the pack and unpack methods. LEB128 is a variable length encoding scheme for integers. You can read the wikipedia entry about it here: https://en.wikipedia.org/wiki/LEB128 LEB128 is used in DWARF, WebAssembly, MQTT, and Protobuf. I'm sure there are other formats, but these are the ones I'm familiar with. I sent a pull request here: https://github.com/ruby/ruby/pull/15589 I'm proposing `K` for the unsigned version and `k` for the signed version. I just picked `k` because it was available, I'm open to other format strings. Thanks for consideration! -- https://bugs.ruby-lang.org/
Issue #21785 has been updated by matz (Yukihiro Matsumoto). It is too late to introduce it in Ruby 4.0, let's aim for 4.1. Matz. ---------------------------------------- Feature #21785: Add signed and unsigned LEB128 support to pack / unpack https://bugs.ruby-lang.org/issues/21785#change-115839 * Author: tenderlovemaking (Aaron Patterson) * Status: Closed ---------------------------------------- Hi, I'd like to add signed and unsigned LEB128 support to the pack and unpack methods. LEB128 is a variable length encoding scheme for integers. You can read the wikipedia entry about it here: https://en.wikipedia.org/wiki/LEB128 LEB128 is used in DWARF, WebAssembly, MQTT, and Protobuf. I'm sure there are other formats, but these are the ones I'm familiar with. I sent a pull request here: https://github.com/ruby/ruby/pull/15589 I'm proposing `K` for the unsigned version and `k` for the signed version. I just picked `k` because it was available, I'm open to other format strings. Thanks for consideration! -- https://bugs.ruby-lang.org/
Issue #21785 has been updated by tenderlovemaking (Aaron Patterson). Is it OK if I merge this again? Thanks ---------------------------------------- Feature #21785: Add signed and unsigned LEB128 support to pack / unpack https://bugs.ruby-lang.org/issues/21785#change-116273 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- Hi, I'd like to add signed and unsigned LEB128 support to the pack and unpack methods. LEB128 is a variable length encoding scheme for integers. You can read the wikipedia entry about it here: https://en.wikipedia.org/wiki/LEB128 LEB128 is used in DWARF, WebAssembly, MQTT, and Protobuf. I'm sure there are other formats, but these are the ones I'm familiar with. I sent a pull request here: https://github.com/ruby/ruby/pull/15589 I'm proposing `K` for the unsigned version and `k` for the signed version. I just picked `k` because it was available, I'm open to other format strings. Thanks for consideration! -- https://bugs.ruby-lang.org/
Issue #21785 has been updated by matz (Yukihiro Matsumoto). Yes. Matz. ---------------------------------------- Feature #21785: Add signed and unsigned LEB128 support to pack / unpack https://bugs.ruby-lang.org/issues/21785#change-116387 * Author: tenderlovemaking (Aaron Patterson) * Status: Open ---------------------------------------- Hi, I'd like to add signed and unsigned LEB128 support to the pack and unpack methods. LEB128 is a variable length encoding scheme for integers. You can read the wikipedia entry about it here: https://en.wikipedia.org/wiki/LEB128 LEB128 is used in DWARF, WebAssembly, MQTT, and Protobuf. I'm sure there are other formats, but these are the ones I'm familiar with. I sent a pull request here: https://github.com/ruby/ruby/pull/15589 I'm proposing `K` for the unsigned version and `k` for the signed version. I just picked `k` because it was available, I'm open to other format strings. Thanks for consideration! -- https://bugs.ruby-lang.org/
participants (3)
-
mame (Yusuke Endoh) -
matz (Yukihiro Matsumoto) -
tenderlovemaking (Aaron Patterson)