On 1/12/26 3:10 PM, bdewater (Bart de Water) via ruby-core wrote:
Issue #21833 has been updated by bdewater (Bart de Water).
FWIW - https://github.com/Nicoshev/rapidhash claims to be even faster and passes the SMHasher tests - Since Rust 1.36 they switched from SipHash13 to https://github.com/rust-lang/hashbrown for hashmaps
Most of the fastest hash functions are based on multiplications as a fast and portable way to mix data value bits. Instead of mixing N bits at a time, you mix NxN bits with a single instruction. However, this is no longer sufficient: the fastest hash functions now mix two data words with a single multiplication. rapidhash, wyHash, and xxHash are exactly this kind of function. rapidhash and wyHash are very weak in terms of collision resistance. Please, just look at https://github.com/wangyi-fudan/wyhash/blob/46cebe9dc4e51f94d0dca287733bc5a9... for wyHash and https://github.com/Nicoshev/rapidhash/blob/d60698faa10916879f85b2799bfdc6996... for rapidhash Basically, they contain the following code: ``` update(state, mum(data64[n]^constant,data64[n+1]^state)) ``` where `mum` is `uint64 mum(a uint64,b uint64) {uint128 r=a*b; return (uint64)r ^ (uint64)(r>>64);}` If `data64[n] == constant`, then `mum` returns zero independently of the value of `data64[n + 1]`. As a result, it is easy to generate many inputs with the same hash value, causing hash tables to exhibit quadratic behavior and enabling denial-of-service attacks on servers that use hash tables. Go uses AES instructions (on some x86 and arm64 CPUs) for map hashing. If AES instructions are unavailable, it uses a hash function “inspired by wyHash,” but without this vulnerability. It contains analogous code: https://github.com/golang/go/blob/532e3203492ebcac67b2f3aa2a52115f49d51997/s... However, instead of constants, Go uses randomly generated values. This considerably decreases hash speed (because it requires additional memory reads), but it makes the hash function much less vulnerable. xxHash is somewhat better than wyHash and rapidhash. It has the following code: https://github.com/Cyan4973/xxHash/blob/66979328cf3f15cecdc61ea58c9f81e6071f... which is essentially: ``` update(state, mum(data64[n]^constant1 + seed,data64[n+1]^constant2 - seed)) ``` If the seed is known, the same type of attack can be performed. Therefore, xxHash should not be used with the default or any other constant seed. The solution to collision attacks against multiplication-based hash functions is either not to mix two data words in a single multiplication, or to detect zero multiplication and always return value which dependent on both values. The first approach significantly reduces hash speed. The second approach has a much smaller performance impact, since modern CPUs allow such code to be vectorized and written without introducing branches. [VMUM V2](https://github.com/vnmakarov/mum-hash) uses the later approach and has performance competitive with wyHash, rapidhash, and xxHash. **In brief, using RapidHash and wyHash is dangerous. XXHash should only be used with randomly generated seeds. SipHash is a safe choice (like VMUM V2), as it is collision-resistant regardless of the seed. (Full disclosure: as the author of VMUM V2, I may be biased.)**