
Issue #19875 has been updated by Freaky (Thomas Hurst). nobu (Nobuyoshi Nakada) wrote in #note-11:
Freaky (Thomas Hurst) wrote in #note-9:
Oh. And you know what this reminds me of? That [one time](https://github.com/Freaky/fast-bytecount) I ported Rust's bytecount crate to C.
Thank you, tried with it. https://github.com/nobu/ruby/tree/mm_bytecount
Thanks for trying it!
But it seems no significant difference on x86_64-darwin.
I see a difference if I configure with cflags=-msse4.2 - it's off by default. We probably want some runtime CPU feature detection if people are actually going to use it. There's also an AVX version of the algorithm that operates in 32-byte chunks instead of SSE's 16. I measure it as twice as fast again on my Ryzen 5700X - I might port that too if there's interest. ---------------------------------------- Bug #19875: Ruby 3.0 -> 3.1 Performance regression in String#count https://bugs.ruby-lang.org/issues/19875#change-104647 * Author: iz (Illia Zub) * Status: Open * Priority: Normal * ruby -v: 3.2.2 * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- `String#count` became slower since Ruby 3.1. Originally found by `@Freaky`: https://github.com/ruby/ruby/pull/4001#issuecomment-1714779781 Compared using the [`benchmark-driver` gem](https://github.com/benchmark-driver/benchmark-driver). ``` $ benchmark-driver tmp/string_count_benchmark_driver.yml --rbenv '3.1.1;3.1.4;2.7.2;3.2.2;3.0.6' Calculating ------------------------------------- 3.1.1 3.1.4 2.7.2 3.2.2 3.0.6 count 465.804 463.741 865.783 462.711 857.395 i/s - 10.000k times in 21.468251s 21.563768s 11.550239s 21.611783s 11.663235s Comparison: count 2.7.2: 865.8 i/s 3.0.6: 857.4 i/s - 1.01x slower 3.1.1: 465.8 i/s - 1.86x slower 3.1.4: 463.7 i/s - 1.87x slower 3.2.2: 462.7 i/s - 1.87x slower ``` Benchmark: ```yml $ cat ./tmp/string_count_benchmark_driver.yml loop_count: 10_000 prelude: | html = "\nruby\n" * 1024 * 1024 benchmark: count: html.count($/) ``` --- *Initially, I noticed the difference between `str.count($/)` and `str.lines.size` when working on the performance improvement: https://serpapi.com/blog/lines-count-failed-deployments/* ---Files-------------------------------- rb_str_len.fast (31.9 KB) rb_str_len.slow (34 KB) revert-4001.patch (1.71 KB) rb_str_count.S (11.8 KB) -- https://bugs.ruby-lang.org/