[ruby-core:111306] [Ruby master Feature#19236] Allow to create hashes with a specific capacity from Ruby

Issue #19236 has been reported by byroot (Jean Boussier). ---------------------------------------- Feature #19236: Allow to create hashes with a specific capacity from Ruby https://bugs.ruby-lang.org/issues/19236 * Author: byroot (Jean Boussier) * Status: Open * Priority: Normal * Target version: 3.3 ---------------------------------------- Followup on [Feature #18683] which added a C-API for this purpose. Various protocol parsers such as Redis `RESP3` or `msgpack`, have to create hashes, and they know the size in advance. For efficiency, it would be preferable if they could directly allocate a Hash of the necessary size, so that large hashes wouldn't cause many re-alloccations and re-hash. `String` and `Array` both already offer similar APIs: ```ruby String.new(capacity: XXX) Array.new(XX) / rb_ary_new_capa(long) ``` However there's no such public API for Hashes in Ruby land. ### Proposal I think `Hash` should have a way to create a new hash with a `capacity` parameter. The logical signature of `Hash.new(capacity: 1000)` was deemed too incompatible in [Feature #18683]. @Eregon proposed to add `Hash.create(capacity: 1000)`. -- https://bugs.ruby-lang.org/

Issue #19236 has been updated by janosch-x (Janosch Müller). maybe the genie is out of the bottle already, but it would be nice to have a uniform API for creating objects with a given capacity, e.g. ```ruby Array.with_capacity(100) # => [] Hash.with_capacity(100) # => {} IO::Buffer.with_capacity(100) # => #<IO::Buffer> String.with_capacity(100) # => '' # more? ``` for `Array` and `IO::Buffer`, `::with_capacity` would essentially be an alias for `::new`. for `String`, the `capacity` kwarg could be deprecated to limit the number of APIs. ---------------------------------------- Feature #19236: Allow to create hashes with a specific capacity from Ruby https://bugs.ruby-lang.org/issues/19236#change-100797 * Author: byroot (Jean Boussier) * Status: Open * Priority: Normal * Target version: 3.3 ---------------------------------------- Followup on [Feature #18683] which added a C-API for this purpose. Various protocol parsers such as Redis `RESP3` or `msgpack`, have to create hashes, and they know the size in advance. For efficiency, it would be preferable if they could directly allocate a Hash of the necessary size, so that large hashes wouldn't cause many re-alloccations and re-hash. `String` and `Array` both already offer similar APIs: ```ruby String.new(capacity: XXX) Array.new(XX) / rb_ary_new_capa(long) ``` However there's no such public API for Hashes in Ruby land. ### Proposal I think `Hash` should have a way to create a new hash with a `capacity` parameter. The logical signature of `Hash.new(capacity: 1000)` was deemed too incompatible in [Feature #18683]. @Eregon proposed to add `Hash.create(capacity: 1000)`. -- https://bugs.ruby-lang.org/

Issue #19236 has been updated by mame (Yusuke Endoh). Discussed at the dev meeting. @matz said that `Hash.create(capacity: 4096)` is acceptable (unless it conflicts with any major gems). However, several participants including @ko1 were a little cautious about introducing the new terminology "create" into Ruby core, and matz understood that. @matsuda and @mame prefer `Hash.new(capacity: 4096)`. This is a bit incompatible, but we searched gem-codesearch with the query `'\bHash\.new\(\w+: '` and found less than 20 results (manually excluding `Foo::Bar::Hash.new(...)` which is perhaps different from `::Hash`). Moreover, some of the results seemed to misunderstand `Hash.new(foo: 1)` as `{ foo: 1 }`. (The most examples are rspec; maybe because `let(:option) { { foo: 1 } }` looks bad, people inadvertently rewrote it with `let(:option) { Hash.new(foo: 1) }`.) Therefore, how about deprecating giving the keyword to `Hash.new` and then introducing `Hash.new(capacity: 4096)`? @matz said this is also acceptable if the incompatibility is not a big problem. (Off-topic: `Array.new(capacity: 4096)` is not yet available; I wonder if people want `Hash.new(capacity: 4096)` more than Array?) ---------------------------------------- Feature #19236: Allow to create hashes with a specific capacity from Ruby https://bugs.ruby-lang.org/issues/19236#change-101342 * Author: byroot (Jean Boussier) * Status: Open * Priority: Normal * Target version: 3.3 ---------------------------------------- Followup on [Feature #18683] which added a C-API for this purpose. Various protocol parsers such as Redis `RESP3` or `msgpack`, have to create hashes, and they know the size in advance. For efficiency, it would be preferable if they could directly allocate a Hash of the necessary size, so that large hashes wouldn't cause many re-alloccations and re-hash. `String` and `Array` both already offer similar APIs: ```ruby String.new(capacity: XXX) Array.new(XX) / rb_ary_new_capa(long) ``` However there's no such public API for Hashes in Ruby land. ### Proposal I think `Hash` should have a way to create a new hash with a `capacity` parameter. The logical signature of `Hash.new(capacity: 1000)` was deemed too incompatible in [Feature #18683]. @Eregon proposed to add `Hash.create(capacity: 1000)`. -- https://bugs.ruby-lang.org/

Issue #19236 has been updated by byroot (Jean Boussier). Well, `Hash.new(capacity: 4096)` was definitely my first pick, so this is great news IMO.
how about deprecating giving the keyword to Hash.new and then introducing Hash.new(capacity: 4096)?
What would be the timeline? Deprecate in 3.3 and break in 3.4?
Off-topic: Array.new(capacity: 4096) is not yet available; I wonder if people want Hash.new(capacity: 4096) more than Array?
I think it's in part because `Array.new(4096)` while not exactly the same, already somewhat works. I'd be happy to add `Array.new(capacity: 4096)` though, but it has a similar backward compatibility concern doesn't it? ---------------------------------------- Feature #19236: Allow to create hashes with a specific capacity from Ruby https://bugs.ruby-lang.org/issues/19236#change-101358 * Author: byroot (Jean Boussier) * Status: Open * Priority: Normal * Target version: 3.3 ---------------------------------------- Followup on [Feature #18683] which added a C-API for this purpose. Various protocol parsers such as Redis `RESP3` or `msgpack`, have to create hashes, and they know the size in advance. For efficiency, it would be preferable if they could directly allocate a Hash of the necessary size, so that large hashes wouldn't cause many re-alloccations and re-hash. `String` and `Array` both already offer similar APIs: ```ruby String.new(capacity: XXX) Array.new(XX) / rb_ary_new_capa(long) ``` However there's no such public API for Hashes in Ruby land. ### Proposal I think `Hash` should have a way to create a new hash with a `capacity` parameter. The logical signature of `Hash.new(capacity: 1000)` was deemed too incompatible in [Feature #18683]. @Eregon proposed to add `Hash.create(capacity: 1000)`. -- https://bugs.ruby-lang.org/

Issue #19236 has been updated by mame (Yusuke Endoh). byroot (Jean Boussier) wrote in #note-4:
What would be the timeline?
Deprecate in 3.3 and break in 3.4?
That would be the fastest way.
Off-topic: Array.new(capacity: 4096) is not yet available; I wonder if people want Hash.new(capacity: 4096) more than Array?
I think it's in part because `Array.new(4096)` while not exactly the same, already somewhat works. I'd be happy to add `Array.new(capacity: 4096)` though, but it has a similar backward compatibility concern doesn't it?
Fortunately, it raises an error: `Array.new(capacity: 4096) #=> no implicit conversion of Hash into Integer (TypeError)`. So I don't see a big problem with changing this. Anyway, I think we need a separate ticket if we introduce it. ---------------------------------------- Feature #19236: Allow to create hashes with a specific capacity from Ruby https://bugs.ruby-lang.org/issues/19236#change-101421 * Author: byroot (Jean Boussier) * Status: Open * Priority: Normal * Target version: 3.3 ---------------------------------------- Followup on [Feature #18683] which added a C-API for this purpose. Various protocol parsers such as Redis `RESP3` or `msgpack`, have to create hashes, and they know the size in advance. For efficiency, it would be preferable if they could directly allocate a Hash of the necessary size, so that large hashes wouldn't cause many re-alloccations and re-hash. `String` and `Array` both already offer similar APIs: ```ruby String.new(capacity: XXX) Array.new(XX) / rb_ary_new_capa(long) ``` However there's no such public API for Hashes in Ruby land. ### Proposal I think `Hash` should have a way to create a new hash with a `capacity` parameter. The logical signature of `Hash.new(capacity: 1000)` was deemed too incompatible in [Feature #18683]. @Eregon proposed to add `Hash.create(capacity: 1000)`. -- https://bugs.ruby-lang.org/

Issue #19236 has been updated by Eregon (Benoit Daloze). If we use `Hash.new(capacity: 4096)` to set the capacity, then `Hash.new({ capacity: 4096 })` should keep the semantics of: `{ capacity: 4096 }` is the default value of the new Hash. I think it's a little bit hacky/unclear/source of confusion, but still I'm not against it. ---------------------------------------- Feature #19236: Allow to create hashes with a specific capacity from Ruby https://bugs.ruby-lang.org/issues/19236#change-101441 * Author: byroot (Jean Boussier) * Status: Open * Priority: Normal * Target version: 3.3 ---------------------------------------- Followup on [Feature #18683] which added a C-API for this purpose. Various protocol parsers such as Redis `RESP3` or `msgpack`, have to create hashes, and they know the size in advance. For efficiency, it would be preferable if they could directly allocate a Hash of the necessary size, so that large hashes wouldn't cause many re-alloccations and re-hash. `String` and `Array` both already offer similar APIs: ```ruby String.new(capacity: XXX) Array.new(XX) / rb_ary_new_capa(long) ``` However there's no such public API for Hashes in Ruby land. ### Proposal I think `Hash` should have a way to create a new hash with a `capacity` parameter. The logical signature of `Hash.new(capacity: 1000)` was deemed too incompatible in [Feature #18683]. @Eregon proposed to add `Hash.create(capacity: 1000)`. -- https://bugs.ruby-lang.org/

Issue #19236 has been updated by Dan0042 (Daniel DeLorme). Previousy, a `capacity` reader/writer was suggested by @byroot in #18683#note-2 I would like to see this idea considered more seriously because 1. It doesn't need to change anything to the initialize arguments of Array/Hash/String, which are already quite complex enough 2. The same API can be used for any class; it's nicely consistent and easy to remember 3. It's more versatile, as it can be used more than once after object creation, ex: ```ruby buffer = String.new #this example is with String, but the same could apply to Hash/Array while line = gets #increase buffer capacity by chunks of 10k buffer.capacity += 10000 if buffer.capacity < buffer.bytesize + line.bytesize buffer << line end buffer.capacity = 0 #trim buffer to minimal size (aka "right-size") buffer.capacity == buffer.bytesize #=> true ``` ---------------------------------------- Feature #19236: Allow to create hashes with a specific capacity from Ruby https://bugs.ruby-lang.org/issues/19236#change-102979 * Author: byroot (Jean Boussier) * Status: Open * Priority: Normal * Target version: 3.3 ---------------------------------------- Followup on [Feature #18683] which added a C-API for this purpose. Various protocol parsers such as Redis `RESP3` or `msgpack`, have to create hashes, and they know the size in advance. For efficiency, it would be preferable if they could directly allocate a Hash of the necessary size, so that large hashes wouldn't cause many re-alloccations and re-hash. `String` and `Array` both already offer similar APIs: ```ruby String.new(capacity: XXX) Array.new(XX) / rb_ary_new_capa(long) ``` However there's no such public API for Hashes in Ruby land. ### Proposal I think `Hash` should have a way to create a new hash with a `capacity` parameter. The logical signature of `Hash.new(capacity: 1000)` was deemed too incompatible in [Feature #18683]. @Eregon proposed to add `Hash.create(capacity: 1000)`. -- https://bugs.ruby-lang.org/

Issue #19236 has been updated by ianks (Ian Ker-Seymer). I worry that new Rubyists might be confused with the `Hash.new(capacity: n)` semantics. For example, `Hash[capacity: 5]` can look very similar to `Hash.new(capacity: 5)`. It wouldn’t be unreasonable to assume they are the same thing… But you’d be in for an unexpected surprise. To me `Hash.with_capacity` clearly communicates what’s happening. Anyone can understand it at first glance. ---------------------------------------- Feature #19236: Allow to create hashes with a specific capacity from Ruby https://bugs.ruby-lang.org/issues/19236#change-102984 * Author: byroot (Jean Boussier) * Status: Open * Priority: Normal * Target version: 3.3 ---------------------------------------- Followup on [Feature #18683] which added a C-API for this purpose. Various protocol parsers such as Redis `RESP3` or `msgpack`, have to create hashes, and they know the size in advance. For efficiency, it would be preferable if they could directly allocate a Hash of the necessary size, so that large hashes wouldn't cause many re-alloccations and re-hash. `String` and `Array` both already offer similar APIs: ```ruby String.new(capacity: XXX) Array.new(XX) / rb_ary_new_capa(long) ``` However there's no such public API for Hashes in Ruby land. ### Proposal I think `Hash` should have a way to create a new hash with a `capacity` parameter. The logical signature of `Hash.new(capacity: 1000)` was deemed too incompatible in [Feature #18683]. @Eregon proposed to add `Hash.create(capacity: 1000)`. -- https://bugs.ruby-lang.org/

Issue #19236 has been updated by byroot (Jean Boussier).
For example, Hash[capacity: 5] can look very similar to Hash.new(capacity: 5).
That seems like a very handwavy argument to me. I really don't see how the two could possibly be confused. ---------------------------------------- Feature #19236: Allow to create hashes with a specific capacity from Ruby https://bugs.ruby-lang.org/issues/19236#change-102985 * Author: byroot (Jean Boussier) * Status: Open * Priority: Normal * Target version: 3.3 ---------------------------------------- Followup on [Feature #18683] which added a C-API for this purpose. Various protocol parsers such as Redis `RESP3` or `msgpack`, have to create hashes, and they know the size in advance. For efficiency, it would be preferable if they could directly allocate a Hash of the necessary size, so that large hashes wouldn't cause many re-alloccations and re-hash. `String` and `Array` both already offer similar APIs: ```ruby String.new(capacity: XXX) Array.new(XX) / rb_ary_new_capa(long) ``` However there's no such public API for Hashes in Ruby land. ### Proposal I think `Hash` should have a way to create a new hash with a `capacity` parameter. The logical signature of `Hash.new(capacity: 1000)` was deemed too incompatible in [Feature #18683]. @Eregon proposed to add `Hash.create(capacity: 1000)`. -- https://bugs.ruby-lang.org/

Issue #19236 has been updated by Dan0042 (Daniel DeLorme). ianks (Ian Ker-Seymer) wrote in #note-8:
To me `Hash.with_capacity` clearly communicates what’s happening. Anyone can understand it at first glance.
`Hash.with_capacity` is not composable. What should you do if you want a default value/proc AND a capacity? ```ruby h = Hash.with_capacity(100) h.default = default_value #this?? a bit ugly imho ``` `Hash#with_capacity` would be better, then you could do `Hash.new(default_value).with_capacity(400)` similar to `compare_by_identity` usage. But at that point it's imho better to have `Hash.new(default_value).tap{ _1.capacity = 400 }` Or the best: `Hash.new(default_value).tap{ .capacity = 400 }` ;-) ---------------------------------------- Feature #19236: Allow to create hashes with a specific capacity from Ruby https://bugs.ruby-lang.org/issues/19236#change-103007 * Author: byroot (Jean Boussier) * Status: Open * Priority: Normal * Target version: 3.3 ---------------------------------------- Followup on [Feature #18683] which added a C-API for this purpose. Various protocol parsers such as Redis `RESP3` or `msgpack`, have to create hashes, and they know the size in advance. For efficiency, it would be preferable if they could directly allocate a Hash of the necessary size, so that large hashes wouldn't cause many re-alloccations and re-hash. `String` and `Array` both already offer similar APIs: ```ruby String.new(capacity: XXX) Array.new(XX) / rb_ary_new_capa(long) ``` However there's no such public API for Hashes in Ruby land. ### Proposal I think `Hash` should have a way to create a new hash with a `capacity` parameter. The logical signature of `Hash.new(capacity: 1000)` was deemed too incompatible in [Feature #18683]. @Eregon proposed to add `Hash.create(capacity: 1000)`. -- https://bugs.ruby-lang.org/

Issue #19236 has been updated by byroot (Jean Boussier). This was discussed in the last dev meeting. The conclusion was:
In 3.3 it throws error all keyword arguments to Hash.new. Then Ruby 3.4 allows that Hash.new will accept capacity keyword argument.
---------------------------------------- Feature #19236: Allow to create hashes with a specific capacity from Ruby https://bugs.ruby-lang.org/issues/19236#change-103156 * Author: byroot (Jean Boussier) * Status: Open * Priority: Normal * Target version: 3.3 ---------------------------------------- Followup on [Feature #18683] which added a C-API for this purpose. Various protocol parsers such as Redis `RESP3` or `msgpack`, have to create hashes, and they know the size in advance. For efficiency, it would be preferable if they could directly allocate a Hash of the necessary size, so that large hashes wouldn't cause many re-alloccations and re-hash. `String` and `Array` both already offer similar APIs: ```ruby String.new(capacity: XXX) Array.new(XX) / rb_ary_new_capa(long) ``` However there's no such public API for Hashes in Ruby land. ### Proposal I think `Hash` should have a way to create a new hash with a `capacity` parameter. The logical signature of `Hash.new(capacity: 1000)` was deemed too incompatible in [Feature #18683]. @Eregon proposed to add `Hash.create(capacity: 1000)`. -- https://bugs.ruby-lang.org/

Issue #19236 has been updated by byroot (Jean Boussier). Correction:
In 3.3 it throws error all keyword arguments to Hash.new.
Was a misunderstanding. What was actually agreed was a deprecation warning, I modified the pull request accordingly. ---------------------------------------- Feature #19236: Allow to create hashes with a specific capacity from Ruby https://bugs.ruby-lang.org/issues/19236#change-103238 * Author: byroot (Jean Boussier) * Status: Open * Priority: Normal * Target version: 3.3 ---------------------------------------- Followup on [Feature #18683] which added a C-API for this purpose. Various protocol parsers such as Redis `RESP3` or `msgpack`, have to create hashes, and they know the size in advance. For efficiency, it would be preferable if they could directly allocate a Hash of the necessary size, so that large hashes wouldn't cause many re-alloccations and re-hash. `String` and `Array` both already offer similar APIs: ```ruby String.new(capacity: XXX) Array.new(XX) / rb_ary_new_capa(long) ``` However there's no such public API for Hashes in Ruby land. ### Proposal I think `Hash` should have a way to create a new hash with a `capacity` parameter. The logical signature of `Hash.new(capacity: 1000)` was deemed too incompatible in [Feature #18683]. @Eregon proposed to add `Hash.create(capacity: 1000)`. -- https://bugs.ruby-lang.org/

Issue #19236 has been updated by byroot (Jean Boussier). Status changed from Closed to Open Target version deleted (3.3) Reopening as the merged commit is the Ruby 3.3 part. I'll implement the 3.4 next year. ---------------------------------------- Feature #19236: Allow to create hashes with a specific capacity from Ruby https://bugs.ruby-lang.org/issues/19236#change-103242 * Author: byroot (Jean Boussier) * Status: Open * Priority: Normal ---------------------------------------- Followup on [Feature #18683] which added a C-API for this purpose. Various protocol parsers such as Redis `RESP3` or `msgpack`, have to create hashes, and they know the size in advance. For efficiency, it would be preferable if they could directly allocate a Hash of the necessary size, so that large hashes wouldn't cause many re-alloccations and re-hash. `String` and `Array` both already offer similar APIs: ```ruby String.new(capacity: XXX) Array.new(XX) / rb_ary_new_capa(long) ``` However there's no such public API for Hashes in Ruby land. ### Proposal I think `Hash` should have a way to create a new hash with a `capacity` parameter. The logical signature of `Hash.new(capacity: 1000)` was deemed too incompatible in [Feature #18683]. @Eregon proposed to add `Hash.create(capacity: 1000)`. -- https://bugs.ruby-lang.org/

Issue #19236 has been updated by byroot (Jean Boussier). Implemented `Hash.new(capacity:)` in https://github.com/ruby/ruby/pull/10357 ---------------------------------------- Feature #19236: Allow to create hashes with a specific capacity from Ruby https://bugs.ruby-lang.org/issues/19236#change-107465 * Author: byroot (Jean Boussier) * Status: Open ---------------------------------------- Followup on [Feature #18683] which added a C-API for this purpose. Various protocol parsers such as Redis `RESP3` or `msgpack`, have to create hashes, and they know the size in advance. For efficiency, it would be preferable if they could directly allocate a Hash of the necessary size, so that large hashes wouldn't cause many re-alloccations and re-hash. `String` and `Array` both already offer similar APIs: ```ruby String.new(capacity: XXX) Array.new(XX) / rb_ary_new_capa(long) ``` However there's no such public API for Hashes in Ruby land. ### Proposal I think `Hash` should have a way to create a new hash with a `capacity` parameter. The logical signature of `Hash.new(capacity: 1000)` was deemed too incompatible in [Feature #18683]. @Eregon proposed to add `Hash.create(capacity: 1000)`. -- https://bugs.ruby-lang.org/

Issue #19236 has been updated by shan (Shannon Skipper). I'm really looking forward to this feature being available via a Ruby interface. ❤️ ---------------------------------------- Feature #19236: Allow to create hashes with a specific capacity from Ruby https://bugs.ruby-lang.org/issues/19236#change-108982 * Author: byroot (Jean Boussier) * Status: Open ---------------------------------------- Followup on [Feature #18683] which added a C-API for this purpose. Various protocol parsers such as Redis `RESP3` or `msgpack`, have to create hashes, and they know the size in advance. For efficiency, it would be preferable if they could directly allocate a Hash of the necessary size, so that large hashes wouldn't cause many re-alloccations and re-hash. `String` and `Array` both already offer similar APIs: ```ruby String.new(capacity: XXX) Array.new(XX) / rb_ary_new_capa(long) ``` However there's no such public API for Hashes in Ruby land. ### Proposal I think `Hash` should have a way to create a new hash with a `capacity` parameter. The logical signature of `Hash.new(capacity: 1000)` was deemed too incompatible in [Feature #18683]. @Eregon proposed to add `Hash.create(capacity: 1000)`. -- https://bugs.ruby-lang.org/
participants (8)
-
byroot (Jean Boussier)
-
byroot (Jean Boussier)
-
Dan0042 (Daniel DeLorme)
-
Eregon (Benoit Daloze)
-
ianks (Ian Ker-Seymer)
-
janosch-x
-
mame (Yusuke Endoh)
-
shan (Shannon Skipper)