[ruby-core:124478] [Ruby Misc#21833] Switch default hash from SipHash13 to XXH3?
Issue #21833 has been reported by samyron (Scott Myron). ---------------------------------------- Misc #21833: Switch default hash from SipHash13 to XXH3? https://bugs.ruby-lang.org/issues/21833 * Author: samyron (Scott Myron) * Status: Open ---------------------------------------- Has there been any consideration switching to some other hash implementation? I've searched through the issues and haven't found anything related to switching the default hash from SipHash13 to anything else. I created a [branch](https://github.com/ruby/ruby/compare/master...samyron:ruby:sm/xxh3) which switched `rb_memhash` from SipHash13 to [XXH3](https://github.com/Cyan4973/xxHash). I created a few simple benchmarks and ran them on my M1 Macbook Air. The results are very promising. ``` % cat ~/string_hash.yml prelude: | # Generate sets of short vs medium strings TINY_STRINGS = Array.new(100) { Array.new(3).map { (97 + rand(26)).chr }.join }.freeze SMALL_STRINGS = Array.new(100) { Array.new(8).map { (97 + rand(26)).chr }.join }.freeze MED_STRINGS = Array.new(100) { Array.new(20).map { (97 + rand(26)).chr }.join }.freeze LARGE_STRINGS = Array.new(100) { Array.new(200).map { (97 + rand(26)).chr }.join }.freeze HUGE_STRINGS = Array.new(100) { Array.new(65536).map { (97 + rand(26)).chr }.join }.freeze benchmark: tiny_strings: | TINY_STRINGS.each { |s| s.hash } small_strings: | SMALL_STRINGS.each { |s| s.hash } medium_strings: | MED_STRINGS.each { |s| s.hash } large_strings: | LARGE_STRINGS.each { |s| s.hash } huge_strings: | HUGE_STRINGS.each { |s| s.hash % benchmark-driver ~/string_hash.yml \ -e ruby-master::~/.rubies/ruby-master/bin/ruby \ -e ruby-xxhash::~/.rubies/ruby-xxhash/bin/ruby \ --output compare Warming up -------------------------------------- tiny_strings 262.513k i/s - 283.844k times in 1.081258s (3.81μs/i) small_strings 259.803k i/s - 280.445k times in 1.079454s (3.85μs/i) medium_strings 249.553k i/s - 267.531k times in 1.072041s (4.01μs/i) large_strings 116.426k i/s - 126.005k times in 1.082275s (8.59μs/i) huge_strings 498.481 i/s - 500.000 times in 1.003047s (2.01ms/i) Calculating ------------------------------------- ruby-master ruby-xxhash tiny_strings 264.070k 288.960k i/s - 787.538k times in 2.982305s 2.725421s small_strings 259.941k 286.229k i/s - 779.407k times in 2.998394s 2.723019s medium_strings 249.249k 283.952k i/s - 748.658k times in 3.003655s 2.636561s large_strings 116.572k 240.823k i/s - 349.278k times in 2.996244s 1.450351s huge_strings 500.164 5.296k i/s - 1.495k times in 2.989019s 0.282263s Comparison: tiny_strings ruby-xxhash: 288960.1 i/s ruby-master: 264070.2 i/s - 1.09x slower small_strings ruby-xxhash: 286229.0 i/s ruby-master: 259941.5 i/s - 1.10x slower medium_strings ruby-xxhash: 283952.5 i/s ruby-master: 249249.0 i/s - 1.14x slower large_strings ruby-xxhash: 240823.1 i/s ruby-master: 116571.9 i/s - 2.07x slower huge_strings ruby-xxhash: 5296.5 i/s ruby-master: 500.2 i/s - 10.59x slower ``` Running something a bit more real-world: ``` % cat ~/json_parse.yml prelude: | require 'json' activitypub_json_txt = File.read("/Users/scott/Development/json/benchmark/data/activitypub.json") twitter_json_txt = File.read("/Users/scott/Development/json/benchmark/data/twitter.json") citm_catalog_json_txt = File.read("/Users/scott/Development/json/benchmark/data/citm_catalog.json") ohai_json_txt = File.read("/Users/scott/Development/json/benchmark/data/ohai.json") benchmark: parse_activitypub_json: | JSON.parse(activitypub_json_txt) parse_twitter_json_txt: | JSON.parse(twitter_json_txt) parse_citm_catalog_json_txt: | JSON.parse(citm_catalog_json_txt) parse_ohai_json_txt: | JSON.parse(ohai_json_txt) % benchmark-driver ~/json_parse.yml \ -e ruby-master::~/.rubies/ruby-master/bin/ruby \ -e ruby-xxhash::~/.rubies/ruby-xxhash/bin/ruby \ --output compare Warming up -------------------------------------- parse_activitypub_json 10.969k i/s - 12.023k times in 1.096043s (91.16μs/i) parse_twitter_json_txt 1.169k i/s - 1.265k times in 1.082330s (855.60μs/i) parse_citm_catalog_json_txt 591.782 i/s - 600.000 times in 1.013887s (1.69ms/i) parse_ohai_json_txt 12.000k i/s - 12.782k times in 1.065168s (83.33μs/i) Calculating ------------------------------------- ruby-master ruby-xxhash parse_activitypub_json 10.986k 11.071k i/s - 32.908k times in 2.995440s 2.972542s parse_twitter_json_txt 1.162k 1.172k i/s - 3.506k times in 3.016331s 2.991486s parse_citm_catalog_json_txt 588.758 601.926 i/s - 1.775k times in 3.014820s 2.948868s parse_ohai_json_txt 10.747k 12.400k i/s - 35.999k times in 3.349753s 2.903138s Comparison: parse_activitypub_json ruby-xxhash: 11070.7 i/s ruby-master: 10986.0 i/s - 1.01x slower parse_twitter_json_txt ruby-xxhash: 1172.0 i/s ruby-master: 1162.3 i/s - 1.01x slower parse_citm_catalog_json_txt ruby-xxhash: 601.9 i/s ruby-master: 588.8 i/s - 1.02x slower parse_ohai_json_txt ruby-xxhash: 12400.0 i/s ruby-master: 10746.8 i/s - 1.15x slower ``` Admittedly, I'm not a hash expert nor a cryptographer. There doesn't seem to be any known vulnerabilities with XXH3 that I have found. -- https://bugs.ruby-lang.org/
Issue #21833 has been updated by byroot (Jean Boussier).
Has there been any consideration switching to some other hash implementation?
There has been a few in the past, e.g. [Feature #16851]
Admittedly, I'm not a hash expert nor a cryptographer. There doesn't seem to be any known vulnerabilities with XXH3 that I have found.
Well, the main concern is HashDOS, but looking at your branch, it seems you seed the hash function, so it's fine on that front. ---------------------------------------- Misc #21833: Switch default hash from SipHash13 to XXH3? https://bugs.ruby-lang.org/issues/21833#change-116031 * Author: samyron (Scott Myron) * Status: Open ---------------------------------------- Has there been any consideration switching to some other hash implementation? I've searched through the issues and haven't found anything related to switching the default hash from SipHash13 to anything else. I created a [branch](https://github.com/ruby/ruby/compare/master...samyron:ruby:sm/xxh3) which switched `rb_memhash` from SipHash13 to [XXH3](https://github.com/Cyan4973/xxHash). I created a few simple benchmarks and ran them on my M1 Macbook Air. The results are very promising. ``` % cat ~/string_hash.yml prelude: | # Generate sets of short vs medium strings TINY_STRINGS = Array.new(100) { Array.new(3).map { (97 + rand(26)).chr }.join }.freeze SMALL_STRINGS = Array.new(100) { Array.new(8).map { (97 + rand(26)).chr }.join }.freeze MED_STRINGS = Array.new(100) { Array.new(20).map { (97 + rand(26)).chr }.join }.freeze LARGE_STRINGS = Array.new(100) { Array.new(200).map { (97 + rand(26)).chr }.join }.freeze HUGE_STRINGS = Array.new(100) { Array.new(65536).map { (97 + rand(26)).chr }.join }.freeze benchmark: tiny_strings: | TINY_STRINGS.each { |s| s.hash } small_strings: | SMALL_STRINGS.each { |s| s.hash } medium_strings: | MED_STRINGS.each { |s| s.hash } large_strings: | LARGE_STRINGS.each { |s| s.hash } huge_strings: | HUGE_STRINGS.each { |s| s.hash % benchmark-driver ~/string_hash.yml \ -e ruby-master::~/.rubies/ruby-master/bin/ruby \ -e ruby-xxhash::~/.rubies/ruby-xxhash/bin/ruby \ --output compare Warming up -------------------------------------- tiny_strings 262.513k i/s - 283.844k times in 1.081258s (3.81μs/i) small_strings 259.803k i/s - 280.445k times in 1.079454s (3.85μs/i) medium_strings 249.553k i/s - 267.531k times in 1.072041s (4.01μs/i) large_strings 116.426k i/s - 126.005k times in 1.082275s (8.59μs/i) huge_strings 498.481 i/s - 500.000 times in 1.003047s (2.01ms/i) Calculating ------------------------------------- ruby-master ruby-xxhash tiny_strings 264.070k 288.960k i/s - 787.538k times in 2.982305s 2.725421s small_strings 259.941k 286.229k i/s - 779.407k times in 2.998394s 2.723019s medium_strings 249.249k 283.952k i/s - 748.658k times in 3.003655s 2.636561s large_strings 116.572k 240.823k i/s - 349.278k times in 2.996244s 1.450351s huge_strings 500.164 5.296k i/s - 1.495k times in 2.989019s 0.282263s Comparison: tiny_strings ruby-xxhash: 288960.1 i/s ruby-master: 264070.2 i/s - 1.09x slower small_strings ruby-xxhash: 286229.0 i/s ruby-master: 259941.5 i/s - 1.10x slower medium_strings ruby-xxhash: 283952.5 i/s ruby-master: 249249.0 i/s - 1.14x slower large_strings ruby-xxhash: 240823.1 i/s ruby-master: 116571.9 i/s - 2.07x slower huge_strings ruby-xxhash: 5296.5 i/s ruby-master: 500.2 i/s - 10.59x slower ``` Running something a bit more real-world: ``` % cat ~/json_parse.yml prelude: | require 'json' activitypub_json_txt = File.read("/Users/scott/Development/json/benchmark/data/activitypub.json") twitter_json_txt = File.read("/Users/scott/Development/json/benchmark/data/twitter.json") citm_catalog_json_txt = File.read("/Users/scott/Development/json/benchmark/data/citm_catalog.json") ohai_json_txt = File.read("/Users/scott/Development/json/benchmark/data/ohai.json") benchmark: parse_activitypub_json: | JSON.parse(activitypub_json_txt) parse_twitter_json_txt: | JSON.parse(twitter_json_txt) parse_citm_catalog_json_txt: | JSON.parse(citm_catalog_json_txt) parse_ohai_json_txt: | JSON.parse(ohai_json_txt) % benchmark-driver ~/json_parse.yml \ -e ruby-master::~/.rubies/ruby-master/bin/ruby \ -e ruby-xxhash::~/.rubies/ruby-xxhash/bin/ruby \ --output compare Warming up -------------------------------------- parse_activitypub_json 10.969k i/s - 12.023k times in 1.096043s (91.16μs/i) parse_twitter_json_txt 1.169k i/s - 1.265k times in 1.082330s (855.60μs/i) parse_citm_catalog_json_txt 591.782 i/s - 600.000 times in 1.013887s (1.69ms/i) parse_ohai_json_txt 12.000k i/s - 12.782k times in 1.065168s (83.33μs/i) Calculating ------------------------------------- ruby-master ruby-xxhash parse_activitypub_json 10.986k 11.071k i/s - 32.908k times in 2.995440s 2.972542s parse_twitter_json_txt 1.162k 1.172k i/s - 3.506k times in 3.016331s 2.991486s parse_citm_catalog_json_txt 588.758 601.926 i/s - 1.775k times in 3.014820s 2.948868s parse_ohai_json_txt 10.747k 12.400k i/s - 35.999k times in 3.349753s 2.903138s Comparison: parse_activitypub_json ruby-xxhash: 11070.7 i/s ruby-master: 10986.0 i/s - 1.01x slower parse_twitter_json_txt ruby-xxhash: 1172.0 i/s ruby-master: 1162.3 i/s - 1.01x slower parse_citm_catalog_json_txt ruby-xxhash: 601.9 i/s ruby-master: 588.8 i/s - 1.02x slower parse_ohai_json_txt ruby-xxhash: 12400.0 i/s ruby-master: 10746.8 i/s - 1.15x slower ``` Admittedly, I'm not a hash expert nor a cryptographer. There doesn't seem to be any known vulnerabilities with XXH3 that I have found. -- https://bugs.ruby-lang.org/
Issue #21833 has been updated by samyron (Scott Myron). The same benchmarks on an M4 Pro: ``` benchmark-driver ~/Downloads/hash_strings.yml -e "ruby-master::~/.rubies/ruby-master/bin/ruby" -e "ruby-xxhash::~/.rubies/ruby-xxhash/bin/ruby" <snip> Comparison: tiny_hash_creation ruby-xxhash: 12650.7 i/s ruby-master: 11909.3 i/s - 1.06x slower med_hash_creation ruby-xxhash: 13716.7 i/s ruby-master: 12271.1 i/s - 1.12x slower large_hash_creation ruby-xxhash: 11178.4 i/s ruby-master: 7120.5 i/s - 1.57x slower huge_hash_creation ruby-xxhash: 235.8 i/s ruby-master: 43.8 i/s - 5.38x slower ``` ``` benchmark-driver ~/Downloads/json_parse.yml -e "ruby-master::~/.rubies/ruby-master/bin/ruby" -e "ruby-xxhash::~/.rubies/ruby-xxhash/bin/ruby <snip> Comparison: parse_activitypub_json ruby-xxhash: 16495.3 i/s ruby-master: 16192.3 i/s - 1.02x slower parse_twitter_json_txt ruby-xxhash: 1828.2 i/s ruby-master: 1774.9 i/s - 1.03x slower parse_citm_catalog_json_txt ruby-xxhash: 881.7 i/s ruby-master: 844.5 i/s - 1.04x slower parse_ohai_json_txt ruby-xxhash: 18377.8 i/s ruby-master: 17193.5 i/s - 1.07x slower ``` ---------------------------------------- Misc #21833: Switch default hash from SipHash13 to XXH3? https://bugs.ruby-lang.org/issues/21833#change-116033 * Author: samyron (Scott Myron) * Status: Open ---------------------------------------- Has there been any consideration switching to some other hash implementation? I've searched through the issues and haven't found anything related to switching the default hash from SipHash13 to anything else. I created a [branch](https://github.com/ruby/ruby/compare/master...samyron:ruby:sm/xxh3) which switched `rb_memhash` from SipHash13 to [XXH3](https://github.com/Cyan4973/xxHash). I created a few simple benchmarks and ran them on my M1 Macbook Air. The results are very promising. ``` % cat ~/string_hash.yml prelude: | # Generate sets of short vs medium strings TINY_STRINGS = Array.new(100) { Array.new(3).map { (97 + rand(26)).chr }.join }.freeze SMALL_STRINGS = Array.new(100) { Array.new(8).map { (97 + rand(26)).chr }.join }.freeze MED_STRINGS = Array.new(100) { Array.new(20).map { (97 + rand(26)).chr }.join }.freeze LARGE_STRINGS = Array.new(100) { Array.new(200).map { (97 + rand(26)).chr }.join }.freeze HUGE_STRINGS = Array.new(100) { Array.new(65536).map { (97 + rand(26)).chr }.join }.freeze benchmark: tiny_strings: | TINY_STRINGS.each { |s| s.hash } small_strings: | SMALL_STRINGS.each { |s| s.hash } medium_strings: | MED_STRINGS.each { |s| s.hash } large_strings: | LARGE_STRINGS.each { |s| s.hash } huge_strings: | HUGE_STRINGS.each { |s| s.hash % benchmark-driver ~/string_hash.yml \ -e ruby-master::~/.rubies/ruby-master/bin/ruby \ -e ruby-xxhash::~/.rubies/ruby-xxhash/bin/ruby \ --output compare Warming up -------------------------------------- tiny_strings 262.513k i/s - 283.844k times in 1.081258s (3.81μs/i) small_strings 259.803k i/s - 280.445k times in 1.079454s (3.85μs/i) medium_strings 249.553k i/s - 267.531k times in 1.072041s (4.01μs/i) large_strings 116.426k i/s - 126.005k times in 1.082275s (8.59μs/i) huge_strings 498.481 i/s - 500.000 times in 1.003047s (2.01ms/i) Calculating ------------------------------------- ruby-master ruby-xxhash tiny_strings 264.070k 288.960k i/s - 787.538k times in 2.982305s 2.725421s small_strings 259.941k 286.229k i/s - 779.407k times in 2.998394s 2.723019s medium_strings 249.249k 283.952k i/s - 748.658k times in 3.003655s 2.636561s large_strings 116.572k 240.823k i/s - 349.278k times in 2.996244s 1.450351s huge_strings 500.164 5.296k i/s - 1.495k times in 2.989019s 0.282263s Comparison: tiny_strings ruby-xxhash: 288960.1 i/s ruby-master: 264070.2 i/s - 1.09x slower small_strings ruby-xxhash: 286229.0 i/s ruby-master: 259941.5 i/s - 1.10x slower medium_strings ruby-xxhash: 283952.5 i/s ruby-master: 249249.0 i/s - 1.14x slower large_strings ruby-xxhash: 240823.1 i/s ruby-master: 116571.9 i/s - 2.07x slower huge_strings ruby-xxhash: 5296.5 i/s ruby-master: 500.2 i/s - 10.59x slower ``` Running something a bit more real-world: ``` % cat ~/json_parse.yml prelude: | require 'json' activitypub_json_txt = File.read("/Users/scott/Development/json/benchmark/data/activitypub.json") twitter_json_txt = File.read("/Users/scott/Development/json/benchmark/data/twitter.json") citm_catalog_json_txt = File.read("/Users/scott/Development/json/benchmark/data/citm_catalog.json") ohai_json_txt = File.read("/Users/scott/Development/json/benchmark/data/ohai.json") benchmark: parse_activitypub_json: | JSON.parse(activitypub_json_txt) parse_twitter_json_txt: | JSON.parse(twitter_json_txt) parse_citm_catalog_json_txt: | JSON.parse(citm_catalog_json_txt) parse_ohai_json_txt: | JSON.parse(ohai_json_txt) % benchmark-driver ~/json_parse.yml \ -e ruby-master::~/.rubies/ruby-master/bin/ruby \ -e ruby-xxhash::~/.rubies/ruby-xxhash/bin/ruby \ --output compare Warming up -------------------------------------- parse_activitypub_json 10.969k i/s - 12.023k times in 1.096043s (91.16μs/i) parse_twitter_json_txt 1.169k i/s - 1.265k times in 1.082330s (855.60μs/i) parse_citm_catalog_json_txt 591.782 i/s - 600.000 times in 1.013887s (1.69ms/i) parse_ohai_json_txt 12.000k i/s - 12.782k times in 1.065168s (83.33μs/i) Calculating ------------------------------------- ruby-master ruby-xxhash parse_activitypub_json 10.986k 11.071k i/s - 32.908k times in 2.995440s 2.972542s parse_twitter_json_txt 1.162k 1.172k i/s - 3.506k times in 3.016331s 2.991486s parse_citm_catalog_json_txt 588.758 601.926 i/s - 1.775k times in 3.014820s 2.948868s parse_ohai_json_txt 10.747k 12.400k i/s - 35.999k times in 3.349753s 2.903138s Comparison: parse_activitypub_json ruby-xxhash: 11070.7 i/s ruby-master: 10986.0 i/s - 1.01x slower parse_twitter_json_txt ruby-xxhash: 1172.0 i/s ruby-master: 1162.3 i/s - 1.01x slower parse_citm_catalog_json_txt ruby-xxhash: 601.9 i/s ruby-master: 588.8 i/s - 1.02x slower parse_ohai_json_txt ruby-xxhash: 12400.0 i/s ruby-master: 10746.8 i/s - 1.15x slower ``` Admittedly, I'm not a hash expert nor a cryptographer. There doesn't seem to be any known vulnerabilities with XXH3 that I have found. -- https://bugs.ruby-lang.org/
Issue #21833 has been updated by samyron (Scott Myron). byroot (Jean Boussier) wrote in #note-2:
Well, the main concern is HashDOS, but looking at your branch, it seems you seed the hash function, so it's fine on that front.
I did note the HashDOS conversation on https://bugs.ruby-lang.org/issues/13017 so I used XXH3_64bits_withSecret as an attempt to mitigate HashDOS by using the default secret and seed. Reading through the xxhash docs I think `XXH3_64bits_withSecretandSeed` _might_ even be faster but I have not tried it yet. ---------------------------------------- Misc #21833: Switch default hash from SipHash13 to XXH3? https://bugs.ruby-lang.org/issues/21833#change-116034 * Author: samyron (Scott Myron) * Status: Open ---------------------------------------- Has there been any consideration switching to some other hash implementation? I've searched through the issues and haven't found anything related to switching the default hash from SipHash13 to anything else. I created a [branch](https://github.com/ruby/ruby/compare/master...samyron:ruby:sm/xxh3) which switched `rb_memhash` from SipHash13 to [XXH3](https://github.com/Cyan4973/xxHash). I created a few simple benchmarks and ran them on my M1 Macbook Air. The results are very promising. ``` % cat ~/string_hash.yml prelude: | # Generate sets of short vs medium strings TINY_STRINGS = Array.new(100) { Array.new(3).map { (97 + rand(26)).chr }.join }.freeze SMALL_STRINGS = Array.new(100) { Array.new(8).map { (97 + rand(26)).chr }.join }.freeze MED_STRINGS = Array.new(100) { Array.new(20).map { (97 + rand(26)).chr }.join }.freeze LARGE_STRINGS = Array.new(100) { Array.new(200).map { (97 + rand(26)).chr }.join }.freeze HUGE_STRINGS = Array.new(100) { Array.new(65536).map { (97 + rand(26)).chr }.join }.freeze benchmark: tiny_strings: | TINY_STRINGS.each { |s| s.hash } small_strings: | SMALL_STRINGS.each { |s| s.hash } medium_strings: | MED_STRINGS.each { |s| s.hash } large_strings: | LARGE_STRINGS.each { |s| s.hash } huge_strings: | HUGE_STRINGS.each { |s| s.hash % benchmark-driver ~/string_hash.yml \ -e ruby-master::~/.rubies/ruby-master/bin/ruby \ -e ruby-xxhash::~/.rubies/ruby-xxhash/bin/ruby \ --output compare Warming up -------------------------------------- tiny_strings 262.513k i/s - 283.844k times in 1.081258s (3.81μs/i) small_strings 259.803k i/s - 280.445k times in 1.079454s (3.85μs/i) medium_strings 249.553k i/s - 267.531k times in 1.072041s (4.01μs/i) large_strings 116.426k i/s - 126.005k times in 1.082275s (8.59μs/i) huge_strings 498.481 i/s - 500.000 times in 1.003047s (2.01ms/i) Calculating ------------------------------------- ruby-master ruby-xxhash tiny_strings 264.070k 288.960k i/s - 787.538k times in 2.982305s 2.725421s small_strings 259.941k 286.229k i/s - 779.407k times in 2.998394s 2.723019s medium_strings 249.249k 283.952k i/s - 748.658k times in 3.003655s 2.636561s large_strings 116.572k 240.823k i/s - 349.278k times in 2.996244s 1.450351s huge_strings 500.164 5.296k i/s - 1.495k times in 2.989019s 0.282263s Comparison: tiny_strings ruby-xxhash: 288960.1 i/s ruby-master: 264070.2 i/s - 1.09x slower small_strings ruby-xxhash: 286229.0 i/s ruby-master: 259941.5 i/s - 1.10x slower medium_strings ruby-xxhash: 283952.5 i/s ruby-master: 249249.0 i/s - 1.14x slower large_strings ruby-xxhash: 240823.1 i/s ruby-master: 116571.9 i/s - 2.07x slower huge_strings ruby-xxhash: 5296.5 i/s ruby-master: 500.2 i/s - 10.59x slower ``` Running something a bit more real-world: ``` % cat ~/json_parse.yml prelude: | require 'json' activitypub_json_txt = File.read("/Users/scott/Development/json/benchmark/data/activitypub.json") twitter_json_txt = File.read("/Users/scott/Development/json/benchmark/data/twitter.json") citm_catalog_json_txt = File.read("/Users/scott/Development/json/benchmark/data/citm_catalog.json") ohai_json_txt = File.read("/Users/scott/Development/json/benchmark/data/ohai.json") benchmark: parse_activitypub_json: | JSON.parse(activitypub_json_txt) parse_twitter_json_txt: | JSON.parse(twitter_json_txt) parse_citm_catalog_json_txt: | JSON.parse(citm_catalog_json_txt) parse_ohai_json_txt: | JSON.parse(ohai_json_txt) % benchmark-driver ~/json_parse.yml \ -e ruby-master::~/.rubies/ruby-master/bin/ruby \ -e ruby-xxhash::~/.rubies/ruby-xxhash/bin/ruby \ --output compare Warming up -------------------------------------- parse_activitypub_json 10.969k i/s - 12.023k times in 1.096043s (91.16μs/i) parse_twitter_json_txt 1.169k i/s - 1.265k times in 1.082330s (855.60μs/i) parse_citm_catalog_json_txt 591.782 i/s - 600.000 times in 1.013887s (1.69ms/i) parse_ohai_json_txt 12.000k i/s - 12.782k times in 1.065168s (83.33μs/i) Calculating ------------------------------------- ruby-master ruby-xxhash parse_activitypub_json 10.986k 11.071k i/s - 32.908k times in 2.995440s 2.972542s parse_twitter_json_txt 1.162k 1.172k i/s - 3.506k times in 3.016331s 2.991486s parse_citm_catalog_json_txt 588.758 601.926 i/s - 1.775k times in 3.014820s 2.948868s parse_ohai_json_txt 10.747k 12.400k i/s - 35.999k times in 3.349753s 2.903138s Comparison: parse_activitypub_json ruby-xxhash: 11070.7 i/s ruby-master: 10986.0 i/s - 1.01x slower parse_twitter_json_txt ruby-xxhash: 1172.0 i/s ruby-master: 1162.3 i/s - 1.01x slower parse_citm_catalog_json_txt ruby-xxhash: 601.9 i/s ruby-master: 588.8 i/s - 1.02x slower parse_ohai_json_txt ruby-xxhash: 12400.0 i/s ruby-master: 10746.8 i/s - 1.15x slower ``` Admittedly, I'm not a hash expert nor a cryptographer. There doesn't seem to be any known vulnerabilities with XXH3 that I have found. -- https://bugs.ruby-lang.org/
Issue #21833 has been updated by bdewater (Bart de Water). FWIW - https://github.com/Nicoshev/rapidhash claims to be even faster and passes the SMHasher tests - Since Rust 1.36 they switched from SipHash13 to https://github.com/rust-lang/hashbrown for hashmaps ---------------------------------------- Misc #21833: Switch default hash from SipHash13 to XXH3? https://bugs.ruby-lang.org/issues/21833#change-116038 * Author: samyron (Scott Myron) * Status: Open ---------------------------------------- Has there been any consideration switching to some other hash implementation? I've searched through the issues and haven't found anything related to switching the default hash from SipHash13 to anything else. I created a [branch](https://github.com/ruby/ruby/compare/master...samyron:ruby:sm/xxh3) which switched `rb_memhash` from SipHash13 to [XXH3](https://github.com/Cyan4973/xxHash). I created a few simple benchmarks and ran them on my M1 Macbook Air. The results are very promising. ``` % cat ~/string_hash.yml prelude: | # Generate sets of short vs medium strings TINY_STRINGS = Array.new(100) { Array.new(3).map { (97 + rand(26)).chr }.join }.freeze SMALL_STRINGS = Array.new(100) { Array.new(8).map { (97 + rand(26)).chr }.join }.freeze MED_STRINGS = Array.new(100) { Array.new(20).map { (97 + rand(26)).chr }.join }.freeze LARGE_STRINGS = Array.new(100) { Array.new(200).map { (97 + rand(26)).chr }.join }.freeze HUGE_STRINGS = Array.new(100) { Array.new(65536).map { (97 + rand(26)).chr }.join }.freeze benchmark: tiny_strings: | TINY_STRINGS.each { |s| s.hash } small_strings: | SMALL_STRINGS.each { |s| s.hash } medium_strings: | MED_STRINGS.each { |s| s.hash } large_strings: | LARGE_STRINGS.each { |s| s.hash } huge_strings: | HUGE_STRINGS.each { |s| s.hash % benchmark-driver ~/string_hash.yml \ -e ruby-master::~/.rubies/ruby-master/bin/ruby \ -e ruby-xxhash::~/.rubies/ruby-xxhash/bin/ruby \ --output compare Warming up -------------------------------------- tiny_strings 262.513k i/s - 283.844k times in 1.081258s (3.81μs/i) small_strings 259.803k i/s - 280.445k times in 1.079454s (3.85μs/i) medium_strings 249.553k i/s - 267.531k times in 1.072041s (4.01μs/i) large_strings 116.426k i/s - 126.005k times in 1.082275s (8.59μs/i) huge_strings 498.481 i/s - 500.000 times in 1.003047s (2.01ms/i) Calculating ------------------------------------- ruby-master ruby-xxhash tiny_strings 264.070k 288.960k i/s - 787.538k times in 2.982305s 2.725421s small_strings 259.941k 286.229k i/s - 779.407k times in 2.998394s 2.723019s medium_strings 249.249k 283.952k i/s - 748.658k times in 3.003655s 2.636561s large_strings 116.572k 240.823k i/s - 349.278k times in 2.996244s 1.450351s huge_strings 500.164 5.296k i/s - 1.495k times in 2.989019s 0.282263s Comparison: tiny_strings ruby-xxhash: 288960.1 i/s ruby-master: 264070.2 i/s - 1.09x slower small_strings ruby-xxhash: 286229.0 i/s ruby-master: 259941.5 i/s - 1.10x slower medium_strings ruby-xxhash: 283952.5 i/s ruby-master: 249249.0 i/s - 1.14x slower large_strings ruby-xxhash: 240823.1 i/s ruby-master: 116571.9 i/s - 2.07x slower huge_strings ruby-xxhash: 5296.5 i/s ruby-master: 500.2 i/s - 10.59x slower ``` Running something a bit more real-world: ``` % cat ~/json_parse.yml prelude: | require 'json' activitypub_json_txt = File.read("/Users/scott/Development/json/benchmark/data/activitypub.json") twitter_json_txt = File.read("/Users/scott/Development/json/benchmark/data/twitter.json") citm_catalog_json_txt = File.read("/Users/scott/Development/json/benchmark/data/citm_catalog.json") ohai_json_txt = File.read("/Users/scott/Development/json/benchmark/data/ohai.json") benchmark: parse_activitypub_json: | JSON.parse(activitypub_json_txt) parse_twitter_json_txt: | JSON.parse(twitter_json_txt) parse_citm_catalog_json_txt: | JSON.parse(citm_catalog_json_txt) parse_ohai_json_txt: | JSON.parse(ohai_json_txt) % benchmark-driver ~/json_parse.yml \ -e ruby-master::~/.rubies/ruby-master/bin/ruby \ -e ruby-xxhash::~/.rubies/ruby-xxhash/bin/ruby \ --output compare Warming up -------------------------------------- parse_activitypub_json 10.969k i/s - 12.023k times in 1.096043s (91.16μs/i) parse_twitter_json_txt 1.169k i/s - 1.265k times in 1.082330s (855.60μs/i) parse_citm_catalog_json_txt 591.782 i/s - 600.000 times in 1.013887s (1.69ms/i) parse_ohai_json_txt 12.000k i/s - 12.782k times in 1.065168s (83.33μs/i) Calculating ------------------------------------- ruby-master ruby-xxhash parse_activitypub_json 10.986k 11.071k i/s - 32.908k times in 2.995440s 2.972542s parse_twitter_json_txt 1.162k 1.172k i/s - 3.506k times in 3.016331s 2.991486s parse_citm_catalog_json_txt 588.758 601.926 i/s - 1.775k times in 3.014820s 2.948868s parse_ohai_json_txt 10.747k 12.400k i/s - 35.999k times in 3.349753s 2.903138s Comparison: parse_activitypub_json ruby-xxhash: 11070.7 i/s ruby-master: 10986.0 i/s - 1.01x slower parse_twitter_json_txt ruby-xxhash: 1172.0 i/s ruby-master: 1162.3 i/s - 1.01x slower parse_citm_catalog_json_txt ruby-xxhash: 601.9 i/s ruby-master: 588.8 i/s - 1.02x slower parse_ohai_json_txt ruby-xxhash: 12400.0 i/s ruby-master: 10746.8 i/s - 1.15x slower ``` Admittedly, I'm not a hash expert nor a cryptographer. There doesn't seem to be any known vulnerabilities with XXH3 that I have found. -- https://bugs.ruby-lang.org/
On 1/12/26 3:10 PM, bdewater (Bart de Water) via ruby-core wrote:
Issue #21833 has been updated by bdewater (Bart de Water).
FWIW - https://github.com/Nicoshev/rapidhash claims to be even faster and passes the SMHasher tests - Since Rust 1.36 they switched from SipHash13 to https://github.com/rust-lang/hashbrown for hashmaps
Most of the fastest hash functions are based on multiplications as a fast and portable way to mix data value bits. Instead of mixing N bits at a time, you mix NxN bits with a single instruction. However, this is no longer sufficient: the fastest hash functions now mix two data words with a single multiplication. rapidhash, wyHash, and xxHash are exactly this kind of function. rapidhash and wyHash are very weak in terms of collision resistance. Please, just look at https://github.com/wangyi-fudan/wyhash/blob/46cebe9dc4e51f94d0dca287733bc5a9... for wyHash and https://github.com/Nicoshev/rapidhash/blob/d60698faa10916879f85b2799bfdc6996... for rapidhash Basically, they contain the following code: ``` update(state, mum(data64[n]^constant,data64[n+1]^state)) ``` where `mum` is `uint64 mum(a uint64,b uint64) {uint128 r=a*b; return (uint64)r ^ (uint64)(r>>64);}` If `data64[n] == constant`, then `mum` returns zero independently of the value of `data64[n + 1]`. As a result, it is easy to generate many inputs with the same hash value, causing hash tables to exhibit quadratic behavior and enabling denial-of-service attacks on servers that use hash tables. Go uses AES instructions (on some x86 and arm64 CPUs) for map hashing. If AES instructions are unavailable, it uses a hash function “inspired by wyHash,” but without this vulnerability. It contains analogous code: https://github.com/golang/go/blob/532e3203492ebcac67b2f3aa2a52115f49d51997/s... However, instead of constants, Go uses randomly generated values. This considerably decreases hash speed (because it requires additional memory reads), but it makes the hash function much less vulnerable. xxHash is somewhat better than wyHash and rapidhash. It has the following code: https://github.com/Cyan4973/xxHash/blob/66979328cf3f15cecdc61ea58c9f81e6071f... which is essentially: ``` update(state, mum(data64[n]^constant1 + seed,data64[n+1]^constant2 - seed)) ``` If the seed is known, the same type of attack can be performed. Therefore, xxHash should not be used with the default or any other constant seed. The solution to collision attacks against multiplication-based hash functions is either not to mix two data words in a single multiplication, or to detect zero multiplication and always return value which dependent on both values. The first approach significantly reduces hash speed. The second approach has a much smaller performance impact, since modern CPUs allow such code to be vectorized and written without introducing branches. [VMUM V2](https://github.com/vnmakarov/mum-hash) uses the later approach and has performance competitive with wyHash, rapidhash, and xxHash. **In brief, using RapidHash and wyHash is dangerous. XXHash should only be used with randomly generated seeds. SipHash is a safe choice (like VMUM V2), as it is collision-resistant regardless of the seed. (Full disclosure: as the author of VMUM V2, I may be biased.)**
Issue #21833 has been updated by samyron (Scott Myron). bdewater (Bart de Water) wrote in #note-6:
FWIW - https://github.com/Nicoshev/rapidhash claims to be even faster and passes the SMHasher tests - Since Rust 1.36 they switched from SipHash13 to https://github.com/rust-lang/hashbrown for hashmaps
I can give rapidhash a try. It's based on wyHash which Go uses (at least sometimes): https://github.com/golang/go/blob/cbe153806e67a16e362a1cdbbf1741d4ce82e98a/s... ---------------------------------------- Misc #21833: Switch default hash from SipHash13 to XXH3? https://bugs.ruby-lang.org/issues/21833#change-116039 * Author: samyron (Scott Myron) * Status: Open ---------------------------------------- Has there been any consideration switching to some other hash implementation? I've searched through the issues and haven't found anything related to switching the default hash from SipHash13 to anything else. I created a [branch](https://github.com/ruby/ruby/compare/master...samyron:ruby:sm/xxh3) which switched `rb_memhash` from SipHash13 to [XXH3](https://github.com/Cyan4973/xxHash). I created a few simple benchmarks and ran them on my M1 Macbook Air. The results are very promising. ``` % cat ~/string_hash.yml prelude: | # Generate sets of short vs medium strings TINY_STRINGS = Array.new(100) { Array.new(3).map { (97 + rand(26)).chr }.join }.freeze SMALL_STRINGS = Array.new(100) { Array.new(8).map { (97 + rand(26)).chr }.join }.freeze MED_STRINGS = Array.new(100) { Array.new(20).map { (97 + rand(26)).chr }.join }.freeze LARGE_STRINGS = Array.new(100) { Array.new(200).map { (97 + rand(26)).chr }.join }.freeze HUGE_STRINGS = Array.new(100) { Array.new(65536).map { (97 + rand(26)).chr }.join }.freeze benchmark: tiny_strings: | TINY_STRINGS.each { |s| s.hash } small_strings: | SMALL_STRINGS.each { |s| s.hash } medium_strings: | MED_STRINGS.each { |s| s.hash } large_strings: | LARGE_STRINGS.each { |s| s.hash } huge_strings: | HUGE_STRINGS.each { |s| s.hash % benchmark-driver ~/string_hash.yml \ -e ruby-master::~/.rubies/ruby-master/bin/ruby \ -e ruby-xxhash::~/.rubies/ruby-xxhash/bin/ruby \ --output compare Warming up -------------------------------------- tiny_strings 262.513k i/s - 283.844k times in 1.081258s (3.81μs/i) small_strings 259.803k i/s - 280.445k times in 1.079454s (3.85μs/i) medium_strings 249.553k i/s - 267.531k times in 1.072041s (4.01μs/i) large_strings 116.426k i/s - 126.005k times in 1.082275s (8.59μs/i) huge_strings 498.481 i/s - 500.000 times in 1.003047s (2.01ms/i) Calculating ------------------------------------- ruby-master ruby-xxhash tiny_strings 264.070k 288.960k i/s - 787.538k times in 2.982305s 2.725421s small_strings 259.941k 286.229k i/s - 779.407k times in 2.998394s 2.723019s medium_strings 249.249k 283.952k i/s - 748.658k times in 3.003655s 2.636561s large_strings 116.572k 240.823k i/s - 349.278k times in 2.996244s 1.450351s huge_strings 500.164 5.296k i/s - 1.495k times in 2.989019s 0.282263s Comparison: tiny_strings ruby-xxhash: 288960.1 i/s ruby-master: 264070.2 i/s - 1.09x slower small_strings ruby-xxhash: 286229.0 i/s ruby-master: 259941.5 i/s - 1.10x slower medium_strings ruby-xxhash: 283952.5 i/s ruby-master: 249249.0 i/s - 1.14x slower large_strings ruby-xxhash: 240823.1 i/s ruby-master: 116571.9 i/s - 2.07x slower huge_strings ruby-xxhash: 5296.5 i/s ruby-master: 500.2 i/s - 10.59x slower ``` Running something a bit more real-world: ``` % cat ~/json_parse.yml prelude: | require 'json' activitypub_json_txt = File.read("/Users/scott/Development/json/benchmark/data/activitypub.json") twitter_json_txt = File.read("/Users/scott/Development/json/benchmark/data/twitter.json") citm_catalog_json_txt = File.read("/Users/scott/Development/json/benchmark/data/citm_catalog.json") ohai_json_txt = File.read("/Users/scott/Development/json/benchmark/data/ohai.json") benchmark: parse_activitypub_json: | JSON.parse(activitypub_json_txt) parse_twitter_json_txt: | JSON.parse(twitter_json_txt) parse_citm_catalog_json_txt: | JSON.parse(citm_catalog_json_txt) parse_ohai_json_txt: | JSON.parse(ohai_json_txt) % benchmark-driver ~/json_parse.yml \ -e ruby-master::~/.rubies/ruby-master/bin/ruby \ -e ruby-xxhash::~/.rubies/ruby-xxhash/bin/ruby \ --output compare Warming up -------------------------------------- parse_activitypub_json 10.969k i/s - 12.023k times in 1.096043s (91.16μs/i) parse_twitter_json_txt 1.169k i/s - 1.265k times in 1.082330s (855.60μs/i) parse_citm_catalog_json_txt 591.782 i/s - 600.000 times in 1.013887s (1.69ms/i) parse_ohai_json_txt 12.000k i/s - 12.782k times in 1.065168s (83.33μs/i) Calculating ------------------------------------- ruby-master ruby-xxhash parse_activitypub_json 10.986k 11.071k i/s - 32.908k times in 2.995440s 2.972542s parse_twitter_json_txt 1.162k 1.172k i/s - 3.506k times in 3.016331s 2.991486s parse_citm_catalog_json_txt 588.758 601.926 i/s - 1.775k times in 3.014820s 2.948868s parse_ohai_json_txt 10.747k 12.400k i/s - 35.999k times in 3.349753s 2.903138s Comparison: parse_activitypub_json ruby-xxhash: 11070.7 i/s ruby-master: 10986.0 i/s - 1.01x slower parse_twitter_json_txt ruby-xxhash: 1172.0 i/s ruby-master: 1162.3 i/s - 1.01x slower parse_citm_catalog_json_txt ruby-xxhash: 601.9 i/s ruby-master: 588.8 i/s - 1.02x slower parse_ohai_json_txt ruby-xxhash: 12400.0 i/s ruby-master: 10746.8 i/s - 1.15x slower ``` Admittedly, I'm not a hash expert nor a cryptographer. There doesn't seem to be any known vulnerabilities with XXH3 that I have found. -- https://bugs.ruby-lang.org/
Issue #21833 has been updated by samyron (Scott Myron). [rapidhash](https://github.com/Nicoshev/rapidhash) is (mostly) faster than xxh3 on my M1 Macbook Air. The `large_hash_creation` has twice reported that xxhash is faster than rapidhash. Note that rapidhash uses a single 64bit seed. xxh3 uses a 136 byte secret. Hashing strings: ``` tiny_hash_creation ruby-rapidhash: 9267.6 i/s ruby-xxhash: 8970.8 i/s - 1.03x slower ruby-master: 8329.4 i/s - 1.11x slower med_hash_creation ruby-rapidhash: 9276.3 i/s ruby-xxhash: 9274.3 i/s - 1.00x slower ruby-master: 8097.3 i/s - 1.15x slower large_hash_creation ruby-xxhash: 7758.0 i/s ruby-rapidhash: 7597.1 i/s - 1.02x slower ruby-master: 4318.7 i/s - 1.80x slower huge_hash_creation ruby-rapidhash: 187.1 i/s ruby-xxhash: 165.4 i/s - 1.13x slower ruby-master: 25.1 i/s - 7.45x slower ``` JSON Parsing: ``` Comparison: parse_activitypub_json ruby-rapidhash: 11210.7 i/s ruby-xxhash: 11186.6 i/s - 1.00x slower ruby-master: 11146.3 i/s - 1.01x slower parse_twitter_json_txt ruby-rapidhash: 1199.5 i/s ruby-xxhash: 1186.2 i/s - 1.01x slower ruby-master: 1169.7 i/s - 1.03x slower parse_citm_catalog_json_txt ruby-xxhash: 611.1 i/s ruby-rapidhash: 609.1 i/s - 1.00x slower ruby-master: 595.1 i/s - 1.03x slower parse_ohai_json_txt ruby-rapidhash: 12557.8 i/s ruby-xxhash: 12365.8 i/s - 1.02x slower ruby-master: 10824.1 i/s - 1.16x slower ``` ---------------------------------------- Misc #21833: Switch default hash from SipHash13 to XXH3? https://bugs.ruby-lang.org/issues/21833#change-116075 * Author: samyron (Scott Myron) * Status: Open ---------------------------------------- Has there been any consideration switching to some other hash implementation? I've searched through the issues and haven't found anything related to switching the default hash from SipHash13 to anything else. I created a [branch](https://github.com/ruby/ruby/compare/master...samyron:ruby:sm/xxh3) which switched `rb_memhash` from SipHash13 to [XXH3](https://github.com/Cyan4973/xxHash). I created a few simple benchmarks and ran them on my M1 Macbook Air. The results are very promising. ``` % cat ~/string_hash.yml prelude: | # Generate sets of short vs medium strings TINY_STRINGS = Array.new(100) { Array.new(3).map { (97 + rand(26)).chr }.join }.freeze SMALL_STRINGS = Array.new(100) { Array.new(8).map { (97 + rand(26)).chr }.join }.freeze MED_STRINGS = Array.new(100) { Array.new(20).map { (97 + rand(26)).chr }.join }.freeze LARGE_STRINGS = Array.new(100) { Array.new(200).map { (97 + rand(26)).chr }.join }.freeze HUGE_STRINGS = Array.new(100) { Array.new(65536).map { (97 + rand(26)).chr }.join }.freeze benchmark: tiny_strings: | TINY_STRINGS.each { |s| s.hash } small_strings: | SMALL_STRINGS.each { |s| s.hash } medium_strings: | MED_STRINGS.each { |s| s.hash } large_strings: | LARGE_STRINGS.each { |s| s.hash } huge_strings: | HUGE_STRINGS.each { |s| s.hash % benchmark-driver ~/string_hash.yml \ -e ruby-master::~/.rubies/ruby-master/bin/ruby \ -e ruby-xxhash::~/.rubies/ruby-xxhash/bin/ruby \ --output compare Warming up -------------------------------------- tiny_strings 262.513k i/s - 283.844k times in 1.081258s (3.81μs/i) small_strings 259.803k i/s - 280.445k times in 1.079454s (3.85μs/i) medium_strings 249.553k i/s - 267.531k times in 1.072041s (4.01μs/i) large_strings 116.426k i/s - 126.005k times in 1.082275s (8.59μs/i) huge_strings 498.481 i/s - 500.000 times in 1.003047s (2.01ms/i) Calculating ------------------------------------- ruby-master ruby-xxhash tiny_strings 264.070k 288.960k i/s - 787.538k times in 2.982305s 2.725421s small_strings 259.941k 286.229k i/s - 779.407k times in 2.998394s 2.723019s medium_strings 249.249k 283.952k i/s - 748.658k times in 3.003655s 2.636561s large_strings 116.572k 240.823k i/s - 349.278k times in 2.996244s 1.450351s huge_strings 500.164 5.296k i/s - 1.495k times in 2.989019s 0.282263s Comparison: tiny_strings ruby-xxhash: 288960.1 i/s ruby-master: 264070.2 i/s - 1.09x slower small_strings ruby-xxhash: 286229.0 i/s ruby-master: 259941.5 i/s - 1.10x slower medium_strings ruby-xxhash: 283952.5 i/s ruby-master: 249249.0 i/s - 1.14x slower large_strings ruby-xxhash: 240823.1 i/s ruby-master: 116571.9 i/s - 2.07x slower huge_strings ruby-xxhash: 5296.5 i/s ruby-master: 500.2 i/s - 10.59x slower ``` Running something a bit more real-world: ``` % cat ~/json_parse.yml prelude: | require 'json' activitypub_json_txt = File.read("/Users/scott/Development/json/benchmark/data/activitypub.json") twitter_json_txt = File.read("/Users/scott/Development/json/benchmark/data/twitter.json") citm_catalog_json_txt = File.read("/Users/scott/Development/json/benchmark/data/citm_catalog.json") ohai_json_txt = File.read("/Users/scott/Development/json/benchmark/data/ohai.json") benchmark: parse_activitypub_json: | JSON.parse(activitypub_json_txt) parse_twitter_json_txt: | JSON.parse(twitter_json_txt) parse_citm_catalog_json_txt: | JSON.parse(citm_catalog_json_txt) parse_ohai_json_txt: | JSON.parse(ohai_json_txt) % benchmark-driver ~/json_parse.yml \ -e ruby-master::~/.rubies/ruby-master/bin/ruby \ -e ruby-xxhash::~/.rubies/ruby-xxhash/bin/ruby \ --output compare Warming up -------------------------------------- parse_activitypub_json 10.969k i/s - 12.023k times in 1.096043s (91.16μs/i) parse_twitter_json_txt 1.169k i/s - 1.265k times in 1.082330s (855.60μs/i) parse_citm_catalog_json_txt 591.782 i/s - 600.000 times in 1.013887s (1.69ms/i) parse_ohai_json_txt 12.000k i/s - 12.782k times in 1.065168s (83.33μs/i) Calculating ------------------------------------- ruby-master ruby-xxhash parse_activitypub_json 10.986k 11.071k i/s - 32.908k times in 2.995440s 2.972542s parse_twitter_json_txt 1.162k 1.172k i/s - 3.506k times in 3.016331s 2.991486s parse_citm_catalog_json_txt 588.758 601.926 i/s - 1.775k times in 3.014820s 2.948868s parse_ohai_json_txt 10.747k 12.400k i/s - 35.999k times in 3.349753s 2.903138s Comparison: parse_activitypub_json ruby-xxhash: 11070.7 i/s ruby-master: 10986.0 i/s - 1.01x slower parse_twitter_json_txt ruby-xxhash: 1172.0 i/s ruby-master: 1162.3 i/s - 1.01x slower parse_citm_catalog_json_txt ruby-xxhash: 601.9 i/s ruby-master: 588.8 i/s - 1.02x slower parse_ohai_json_txt ruby-xxhash: 12400.0 i/s ruby-master: 10746.8 i/s - 1.15x slower ``` Admittedly, I'm not a hash expert nor a cryptographer. There doesn't seem to be any known vulnerabilities with XXH3 that I have found. -- https://bugs.ruby-lang.org/
Issue #21833 has been updated by samyron (Scott Myron). Anonymous wrote in #note-9:
<snip> **In brief, using RapidHash and wyHash is dangerous. XXHash should only be used with randomly generated seeds. SipHash is a safe choice (like VMUM V2), as it is collision-resistant regardless of the seed. (Full disclosure: as the author of VMUM V2, I may be biased.)**
______________________________________________
Thank you for this awesome reply! I learned a lot from it. I do appreciate the thorough explanation of the issue with these multiplication based hash functions. Note that the [a5hash](https://github.com/avaneev/a5hash) algorithm explains this same problem (I believe) which is called "Blinding Multiplication". This is new terminology to me so I'm leaving it here in the event others find it helpful. I used both a [random secret and seed](https://github.com/samyron/ruby/blob/1e4ff4ae311b7b1d0bc1dd4eb0e6750da714edc...) when incorporating XXH3 into ruby. ---------------------------------------- Misc #21833: Switch default hash from SipHash13 to XXH3? https://bugs.ruby-lang.org/issues/21833#change-116375 * Author: samyron (Scott Myron) * Status: Open ---------------------------------------- Has there been any consideration switching to some other hash implementation? I've searched through the issues and haven't found anything related to switching the default hash from SipHash13 to anything else. I created a [branch](https://github.com/ruby/ruby/compare/master...samyron:ruby:sm/xxh3) which switched `rb_memhash` from SipHash13 to [XXH3](https://github.com/Cyan4973/xxHash). I created a few simple benchmarks and ran them on my M1 Macbook Air. The results are very promising. ``` % cat ~/string_hash.yml prelude: | # Generate sets of short vs medium strings TINY_STRINGS = Array.new(100) { Array.new(3).map { (97 + rand(26)).chr }.join }.freeze SMALL_STRINGS = Array.new(100) { Array.new(8).map { (97 + rand(26)).chr }.join }.freeze MED_STRINGS = Array.new(100) { Array.new(20).map { (97 + rand(26)).chr }.join }.freeze LARGE_STRINGS = Array.new(100) { Array.new(200).map { (97 + rand(26)).chr }.join }.freeze HUGE_STRINGS = Array.new(100) { Array.new(65536).map { (97 + rand(26)).chr }.join }.freeze benchmark: tiny_strings: | TINY_STRINGS.each { |s| s.hash } small_strings: | SMALL_STRINGS.each { |s| s.hash } medium_strings: | MED_STRINGS.each { |s| s.hash } large_strings: | LARGE_STRINGS.each { |s| s.hash } huge_strings: | HUGE_STRINGS.each { |s| s.hash % benchmark-driver ~/string_hash.yml \ -e ruby-master::~/.rubies/ruby-master/bin/ruby \ -e ruby-xxhash::~/.rubies/ruby-xxhash/bin/ruby \ --output compare Warming up -------------------------------------- tiny_strings 262.513k i/s - 283.844k times in 1.081258s (3.81μs/i) small_strings 259.803k i/s - 280.445k times in 1.079454s (3.85μs/i) medium_strings 249.553k i/s - 267.531k times in 1.072041s (4.01μs/i) large_strings 116.426k i/s - 126.005k times in 1.082275s (8.59μs/i) huge_strings 498.481 i/s - 500.000 times in 1.003047s (2.01ms/i) Calculating ------------------------------------- ruby-master ruby-xxhash tiny_strings 264.070k 288.960k i/s - 787.538k times in 2.982305s 2.725421s small_strings 259.941k 286.229k i/s - 779.407k times in 2.998394s 2.723019s medium_strings 249.249k 283.952k i/s - 748.658k times in 3.003655s 2.636561s large_strings 116.572k 240.823k i/s - 349.278k times in 2.996244s 1.450351s huge_strings 500.164 5.296k i/s - 1.495k times in 2.989019s 0.282263s Comparison: tiny_strings ruby-xxhash: 288960.1 i/s ruby-master: 264070.2 i/s - 1.09x slower small_strings ruby-xxhash: 286229.0 i/s ruby-master: 259941.5 i/s - 1.10x slower medium_strings ruby-xxhash: 283952.5 i/s ruby-master: 249249.0 i/s - 1.14x slower large_strings ruby-xxhash: 240823.1 i/s ruby-master: 116571.9 i/s - 2.07x slower huge_strings ruby-xxhash: 5296.5 i/s ruby-master: 500.2 i/s - 10.59x slower ``` Running something a bit more real-world: ``` % cat ~/json_parse.yml prelude: | require 'json' activitypub_json_txt = File.read("/Users/scott/Development/json/benchmark/data/activitypub.json") twitter_json_txt = File.read("/Users/scott/Development/json/benchmark/data/twitter.json") citm_catalog_json_txt = File.read("/Users/scott/Development/json/benchmark/data/citm_catalog.json") ohai_json_txt = File.read("/Users/scott/Development/json/benchmark/data/ohai.json") benchmark: parse_activitypub_json: | JSON.parse(activitypub_json_txt) parse_twitter_json_txt: | JSON.parse(twitter_json_txt) parse_citm_catalog_json_txt: | JSON.parse(citm_catalog_json_txt) parse_ohai_json_txt: | JSON.parse(ohai_json_txt) % benchmark-driver ~/json_parse.yml \ -e ruby-master::~/.rubies/ruby-master/bin/ruby \ -e ruby-xxhash::~/.rubies/ruby-xxhash/bin/ruby \ --output compare Warming up -------------------------------------- parse_activitypub_json 10.969k i/s - 12.023k times in 1.096043s (91.16μs/i) parse_twitter_json_txt 1.169k i/s - 1.265k times in 1.082330s (855.60μs/i) parse_citm_catalog_json_txt 591.782 i/s - 600.000 times in 1.013887s (1.69ms/i) parse_ohai_json_txt 12.000k i/s - 12.782k times in 1.065168s (83.33μs/i) Calculating ------------------------------------- ruby-master ruby-xxhash parse_activitypub_json 10.986k 11.071k i/s - 32.908k times in 2.995440s 2.972542s parse_twitter_json_txt 1.162k 1.172k i/s - 3.506k times in 3.016331s 2.991486s parse_citm_catalog_json_txt 588.758 601.926 i/s - 1.775k times in 3.014820s 2.948868s parse_ohai_json_txt 10.747k 12.400k i/s - 35.999k times in 3.349753s 2.903138s Comparison: parse_activitypub_json ruby-xxhash: 11070.7 i/s ruby-master: 10986.0 i/s - 1.01x slower parse_twitter_json_txt ruby-xxhash: 1172.0 i/s ruby-master: 1162.3 i/s - 1.01x slower parse_citm_catalog_json_txt ruby-xxhash: 601.9 i/s ruby-master: 588.8 i/s - 1.02x slower parse_ohai_json_txt ruby-xxhash: 12400.0 i/s ruby-master: 10746.8 i/s - 1.15x slower ``` Admittedly, I'm not a hash expert nor a cryptographer. There doesn't seem to be any known vulnerabilities with XXH3 that I have found. -- https://bugs.ruby-lang.org/
Issue #21833 has been updated by byroot (Jean Boussier). @samyron if you wish to bring this forward, I'd suggest to try ruby-bench headline benchmarks: https://github.com/ruby/ruby-bench?tab=readme-ov-file#specific-categories If your patch does show an improvement on these much more general and meaty benchmark I think it would be strong argument in favor. And either way, if you want this patch to get attention, you'll have to add it to the devmeeting agenda. ---------------------------------------- Misc #21833: Switch default hash from SipHash13 to XXH3? https://bugs.ruby-lang.org/issues/21833#change-116376 * Author: samyron (Scott Myron) * Status: Open ---------------------------------------- Has there been any consideration switching to some other hash implementation? I've searched through the issues and haven't found anything related to switching the default hash from SipHash13 to anything else. I created a [branch](https://github.com/ruby/ruby/compare/master...samyron:ruby:sm/xxh3) which switched `rb_memhash` from SipHash13 to [XXH3](https://github.com/Cyan4973/xxHash). I created a few simple benchmarks and ran them on my M1 Macbook Air. The results are very promising. ``` % cat ~/string_hash.yml prelude: | # Generate sets of short vs medium strings TINY_STRINGS = Array.new(100) { Array.new(3).map { (97 + rand(26)).chr }.join }.freeze SMALL_STRINGS = Array.new(100) { Array.new(8).map { (97 + rand(26)).chr }.join }.freeze MED_STRINGS = Array.new(100) { Array.new(20).map { (97 + rand(26)).chr }.join }.freeze LARGE_STRINGS = Array.new(100) { Array.new(200).map { (97 + rand(26)).chr }.join }.freeze HUGE_STRINGS = Array.new(100) { Array.new(65536).map { (97 + rand(26)).chr }.join }.freeze benchmark: tiny_strings: | TINY_STRINGS.each { |s| s.hash } small_strings: | SMALL_STRINGS.each { |s| s.hash } medium_strings: | MED_STRINGS.each { |s| s.hash } large_strings: | LARGE_STRINGS.each { |s| s.hash } huge_strings: | HUGE_STRINGS.each { |s| s.hash % benchmark-driver ~/string_hash.yml \ -e ruby-master::~/.rubies/ruby-master/bin/ruby \ -e ruby-xxhash::~/.rubies/ruby-xxhash/bin/ruby \ --output compare Warming up -------------------------------------- tiny_strings 262.513k i/s - 283.844k times in 1.081258s (3.81μs/i) small_strings 259.803k i/s - 280.445k times in 1.079454s (3.85μs/i) medium_strings 249.553k i/s - 267.531k times in 1.072041s (4.01μs/i) large_strings 116.426k i/s - 126.005k times in 1.082275s (8.59μs/i) huge_strings 498.481 i/s - 500.000 times in 1.003047s (2.01ms/i) Calculating ------------------------------------- ruby-master ruby-xxhash tiny_strings 264.070k 288.960k i/s - 787.538k times in 2.982305s 2.725421s small_strings 259.941k 286.229k i/s - 779.407k times in 2.998394s 2.723019s medium_strings 249.249k 283.952k i/s - 748.658k times in 3.003655s 2.636561s large_strings 116.572k 240.823k i/s - 349.278k times in 2.996244s 1.450351s huge_strings 500.164 5.296k i/s - 1.495k times in 2.989019s 0.282263s Comparison: tiny_strings ruby-xxhash: 288960.1 i/s ruby-master: 264070.2 i/s - 1.09x slower small_strings ruby-xxhash: 286229.0 i/s ruby-master: 259941.5 i/s - 1.10x slower medium_strings ruby-xxhash: 283952.5 i/s ruby-master: 249249.0 i/s - 1.14x slower large_strings ruby-xxhash: 240823.1 i/s ruby-master: 116571.9 i/s - 2.07x slower huge_strings ruby-xxhash: 5296.5 i/s ruby-master: 500.2 i/s - 10.59x slower ``` Running something a bit more real-world: ``` % cat ~/json_parse.yml prelude: | require 'json' activitypub_json_txt = File.read("/Users/scott/Development/json/benchmark/data/activitypub.json") twitter_json_txt = File.read("/Users/scott/Development/json/benchmark/data/twitter.json") citm_catalog_json_txt = File.read("/Users/scott/Development/json/benchmark/data/citm_catalog.json") ohai_json_txt = File.read("/Users/scott/Development/json/benchmark/data/ohai.json") benchmark: parse_activitypub_json: | JSON.parse(activitypub_json_txt) parse_twitter_json_txt: | JSON.parse(twitter_json_txt) parse_citm_catalog_json_txt: | JSON.parse(citm_catalog_json_txt) parse_ohai_json_txt: | JSON.parse(ohai_json_txt) % benchmark-driver ~/json_parse.yml \ -e ruby-master::~/.rubies/ruby-master/bin/ruby \ -e ruby-xxhash::~/.rubies/ruby-xxhash/bin/ruby \ --output compare Warming up -------------------------------------- parse_activitypub_json 10.969k i/s - 12.023k times in 1.096043s (91.16μs/i) parse_twitter_json_txt 1.169k i/s - 1.265k times in 1.082330s (855.60μs/i) parse_citm_catalog_json_txt 591.782 i/s - 600.000 times in 1.013887s (1.69ms/i) parse_ohai_json_txt 12.000k i/s - 12.782k times in 1.065168s (83.33μs/i) Calculating ------------------------------------- ruby-master ruby-xxhash parse_activitypub_json 10.986k 11.071k i/s - 32.908k times in 2.995440s 2.972542s parse_twitter_json_txt 1.162k 1.172k i/s - 3.506k times in 3.016331s 2.991486s parse_citm_catalog_json_txt 588.758 601.926 i/s - 1.775k times in 3.014820s 2.948868s parse_ohai_json_txt 10.747k 12.400k i/s - 35.999k times in 3.349753s 2.903138s Comparison: parse_activitypub_json ruby-xxhash: 11070.7 i/s ruby-master: 10986.0 i/s - 1.01x slower parse_twitter_json_txt ruby-xxhash: 1172.0 i/s ruby-master: 1162.3 i/s - 1.01x slower parse_citm_catalog_json_txt ruby-xxhash: 601.9 i/s ruby-master: 588.8 i/s - 1.02x slower parse_ohai_json_txt ruby-xxhash: 12400.0 i/s ruby-master: 10746.8 i/s - 1.15x slower ``` Admittedly, I'm not a hash expert nor a cryptographer. There doesn't seem to be any known vulnerabilities with XXH3 that I have found. -- https://bugs.ruby-lang.org/
participants (4)
-
bdewater (Bart de Water) -
byroot (Jean Boussier) -
samyron (Scott Myron) -
Vladimir Makarov