[ruby-core:111526] [Ruby master Bug#19288] Ractor JSON parsing significantly slower than linear parsing

Issue #19288 has been reported by maciej.mensfeld (Maciej Mensfeld). ---------------------------------------- Bug #19288: Ractor JSON parsing significantly slower than linear parsing https://bugs.ruby-lang.org/issues/19288 * Author: maciej.mensfeld (Maciej Mensfeld) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.0 (2022-12-25 revision a528908271) [x86_64-linux] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- a simple benchmark: ```ruby require 'json' require 'benchmark' CONCURRENT = 5 RACTORS = true ELEMENTS = 100_000 data = CONCURRENT.times.map do ELEMENTS.times.map do { rand => rand, rand => rand, rand => rand, rand => rand }.to_json end end ractors = CONCURRENT.times.map do Ractor.new do Ractor.receive.each { JSON.parse(_1) } end end result = Benchmark.measure do if RACTORS CONCURRENT.times do |i| ractors[i].send(data[i], move: false) end ractors.each(&:take) else # Linear without any threads data.each do |piece| piece.each { JSON.parse(_1) } end end end puts result ``` Gives following results on my 8 core machine: ```shell # without ractors: 2.731748 0.003993 2.735741 ( 2.736349) # with ractors 12.580452 5.089802 17.670254 ( 5.209755) ``` I would expect Ractors not to be two times slower on the CPU intense work. -- https://bugs.ruby-lang.org/

Issue #19288 has been updated by Eregon (Benoit Daloze). It would be more fair to `Ractor.make_shareable(data)` first. But even with that Ractor is slower: ``` no Ractor: 2.748311 0.003002 2.751313 ( 2.763541) Ractor 9.939530 5.816431 15.755961 ( 4.289792) ``` This high `system` time seems strange. Probably lock contention for allocations? ---------------------------------------- Bug #19288: Ractor JSON parsing significantly slower than linear parsing https://bugs.ruby-lang.org/issues/19288#change-100901 * Author: maciej.mensfeld (Maciej Mensfeld) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.0 (2022-12-25 revision a528908271) [x86_64-linux] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- a simple benchmark: ```ruby require 'json' require 'benchmark' CONCURRENT = 5 RACTORS = true ELEMENTS = 100_000 data = CONCURRENT.times.map do ELEMENTS.times.map do { rand => rand, rand => rand, rand => rand, rand => rand }.to_json end end ractors = CONCURRENT.times.map do Ractor.new do Ractor.receive.each { JSON.parse(_1) } end end result = Benchmark.measure do if RACTORS CONCURRENT.times do |i| ractors[i].send(data[i], move: false) end ractors.each(&:take) else # Linear without any threads data.each do |piece| piece.each { JSON.parse(_1) } end end end puts result ``` Gives following results on my 8 core machine: ```shell # without ractors: 2.731748 0.003993 2.735741 ( 2.736349) # with ractors 12.580452 5.089802 17.670254 ( 5.209755) ``` I would expect Ractors not to be two times slower on the CPU intense work. -- https://bugs.ruby-lang.org/

Issue #19288 has been updated by Eregon (Benoit Daloze). Also that script creates Ractors even in "linear" mode. With the fixed script below: ``` 2.040496 0.002988 2.043484 ( 2.048731) ``` i.e. it's also quite a bit slower if any Ractor is created. Script: ```ruby require 'json' require 'benchmark' CONCURRENT = 5 RACTORS = ARGV.first == "ractor" ELEMENTS = 100_000 data = CONCURRENT.times.map do ELEMENTS.times.map do { rand => rand, rand => rand, rand => rand, rand => rand }.to_json end end if RACTORS Ractor.make_shareable(data) ractors = CONCURRENT.times.map do Ractor.new do Ractor.receive.each { JSON.parse(_1) } end end end result = Benchmark.measure do if RACTORS CONCURRENT.times do |i| ractors[i].send(data[i], move: false) end ractors.each(&:take) else # Linear without any threads data.each do |piece| piece.each { JSON.parse(_1) } end end end puts result ``` ---------------------------------------- Bug #19288: Ractor JSON parsing significantly slower than linear parsing https://bugs.ruby-lang.org/issues/19288#change-100902 * Author: maciej.mensfeld (Maciej Mensfeld) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.0 (2022-12-25 revision a528908271) [x86_64-linux] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- a simple benchmark: ```ruby require 'json' require 'benchmark' CONCURRENT = 5 RACTORS = true ELEMENTS = 100_000 data = CONCURRENT.times.map do ELEMENTS.times.map do { rand => rand, rand => rand, rand => rand, rand => rand }.to_json end end ractors = CONCURRENT.times.map do Ractor.new do Ractor.receive.each { JSON.parse(_1) } end end result = Benchmark.measure do if RACTORS CONCURRENT.times do |i| ractors[i].send(data[i], move: false) end ractors.each(&:take) else # Linear without any threads data.each do |piece| piece.each { JSON.parse(_1) } end end end puts result ``` Gives following results on my 8 core machine: ```shell # without ractors: 2.731748 0.003993 2.735741 ( 2.736349) # with ractors 12.580452 5.089802 17.670254 ( 5.209755) ``` I would expect Ractors not to be two times slower on the CPU intense work. -- https://bugs.ruby-lang.org/

Issue #19288 has been updated by luke-gru (Luke Gruber). I just took a look at this and it looks like the culprit is the c dtoa function that's called in the json parser, specifically a helper function `Balloc`. It uses a lock for some reason (shrug). ---------------------------------------- Bug #19288: Ractor JSON parsing significantly slower than linear parsing https://bugs.ruby-lang.org/issues/19288#change-101304 * Author: maciej.mensfeld (Maciej Mensfeld) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.0 (2022-12-25 revision a528908271) [x86_64-linux] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- a simple benchmark: ```ruby require 'json' require 'benchmark' CONCURRENT = 5 RACTORS = true ELEMENTS = 100_000 data = CONCURRENT.times.map do ELEMENTS.times.map do { rand => rand, rand => rand, rand => rand, rand => rand }.to_json end end ractors = CONCURRENT.times.map do Ractor.new do Ractor.receive.each { JSON.parse(_1) } end end result = Benchmark.measure do if RACTORS CONCURRENT.times do |i| ractors[i].send(data[i], move: false) end ractors.each(&:take) else # Linear without any threads data.each do |piece| piece.each { JSON.parse(_1) } end end end puts result ``` Gives following results on my 8 core machine: ```shell # without ractors: 2.731748 0.003993 2.735741 ( 2.736349) # with ractors 12.580452 5.089802 17.670254 ( 5.209755) ``` I would expect Ractors not to be two times slower on the CPU intense work. -- https://bugs.ruby-lang.org/

Issue #19288 has been updated by maciej.mensfeld (Maciej Mensfeld). I find this issue important and if mitigated, it would allow me to release production-grade functionalities that would benefit users of the Ruby language. I run an OSS project called Karafka (https://github.com/karafka/karafka) that allows for processing Kafka messages using multiple threads in parallel. For non-IO bound cases, the majority of the time of users whom use-cases I know is spent on data deserialization (> 80%). JSON is by far the most popular format that is also conveniently supported natively by Ruby. While providing true parallelism around the whole processing may not be easy due to a ton of synchronization around the whole process, the atomicity of messages deserialization makes it an ideal case of using Ractors. - Data can be sent there, and results can be transferred without interdependencies. - Each message is atomic; hence their deserialization can run in parallel. - All message deserialization requests can be sent to a generic queue from which Ractors could consume. I am not an expert in the Ruby code, but if there is anything I could help with to move this forward, please just ping me. ---------------------------------------- Bug #19288: Ractor JSON parsing significantly slower than linear parsing https://bugs.ruby-lang.org/issues/19288#change-101424 * Author: maciej.mensfeld (Maciej Mensfeld) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.0 (2022-12-25 revision a528908271) [x86_64-linux] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- a simple benchmark: ```ruby require 'json' require 'benchmark' CONCURRENT = 5 RACTORS = true ELEMENTS = 100_000 data = CONCURRENT.times.map do ELEMENTS.times.map do { rand => rand, rand => rand, rand => rand, rand => rand }.to_json end end ractors = CONCURRENT.times.map do Ractor.new do Ractor.receive.each { JSON.parse(_1) } end end result = Benchmark.measure do if RACTORS CONCURRENT.times do |i| ractors[i].send(data[i], move: false) end ractors.each(&:take) else # Linear without any threads data.each do |piece| piece.each { JSON.parse(_1) } end end end puts result ``` Gives following results on my 8 core machine: ```shell # without ractors: 2.731748 0.003993 2.735741 ( 2.736349) # with ractors 12.580452 5.089802 17.670254 ( 5.209755) ``` I would expect Ractors not to be two times slower on the CPU intense work. -- https://bugs.ruby-lang.org/

Issue #19288 has been updated by luke-gru (Luke Gruber). I've notified the flori/json people (https://github.com/flori/json/issues/511) If anyone else can confirm using a performance tool that the issue is the `Balloc` function of the `dtoa` function, that would be helpful. If that is the case, it would be good to not call `dtoa` when parsing integers. I'll look into the ragel code but I'm not so familiar with ragel. The flori/json people should be able to help here. ---------------------------------------- Bug #19288: Ractor JSON parsing significantly slower than linear parsing https://bugs.ruby-lang.org/issues/19288#change-101429 * Author: maciej.mensfeld (Maciej Mensfeld) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.0 (2022-12-25 revision a528908271) [x86_64-linux] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- a simple benchmark: ```ruby require 'json' require 'benchmark' CONCURRENT = 5 RACTORS = true ELEMENTS = 100_000 data = CONCURRENT.times.map do ELEMENTS.times.map do { rand => rand, rand => rand, rand => rand, rand => rand }.to_json end end ractors = CONCURRENT.times.map do Ractor.new do Ractor.receive.each { JSON.parse(_1) } end end result = Benchmark.measure do if RACTORS CONCURRENT.times do |i| ractors[i].send(data[i], move: false) end ractors.each(&:take) else # Linear without any threads data.each do |piece| piece.each { JSON.parse(_1) } end end end puts result ``` Gives following results on my 8 core machine: ```shell # without ractors: 2.731748 0.003993 2.735741 ( 2.736349) # with ractors 12.580452 5.089802 17.670254 ( 5.209755) ``` I would expect Ractors not to be two times slower on the CPU intense work. -- https://bugs.ruby-lang.org/

Issue #19288 has been updated by luke-gru (Luke Gruber). ```ruby RACTORS = ARGV.first == "ractor" r = rand J = { rand => rand }.to_json Ractor.make_shareable(J) if RACTORS rs = [] 10.times.each do rs << Ractor.new(r) do |num| data = receive() i = 0 while i < 10_000 JSON.parse(data) i+=1 end end end rs.each do |r| r.send(J) end rs.each(&:take) else 1_000_000.times do JSON.parse(J) end end ``` $ time ./ruby json_test.rb real 0m5.622s user 0m5.396s $ time ./ruby json_test.rb ractor real 0m0.890s user 0m0.798s This is with debug build of ruby too, so should be faster, but this shows that the issue is with the large data being sent with Ractor#send. Chunk the data into smaller pieces (more calls send and receive in a loop) and it will be faster. ---------------------------------------- Bug #19288: Ractor JSON parsing significantly slower than linear parsing https://bugs.ruby-lang.org/issues/19288#change-101434 * Author: maciej.mensfeld (Maciej Mensfeld) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.0 (2022-12-25 revision a528908271) [x86_64-linux] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- a simple benchmark: ```ruby require 'json' require 'benchmark' CONCURRENT = 5 RACTORS = true ELEMENTS = 100_000 data = CONCURRENT.times.map do ELEMENTS.times.map do { rand => rand, rand => rand, rand => rand, rand => rand }.to_json end end ractors = CONCURRENT.times.map do Ractor.new do Ractor.receive.each { JSON.parse(_1) } end end result = Benchmark.measure do if RACTORS CONCURRENT.times do |i| ractors[i].send(data[i], move: false) end ractors.each(&:take) else # Linear without any threads data.each do |piece| piece.each { JSON.parse(_1) } end end end puts result ``` Gives following results on my 8 core machine: ```shell # without ractors: 2.731748 0.003993 2.735741 ( 2.736349) # with ractors 12.580452 5.089802 17.670254 ( 5.209755) ``` I would expect Ractors not to be two times slower on the CPU intense work. -- https://bugs.ruby-lang.org/

Issue #19288 has been updated by luke-gru (Luke Gruber). Here's a simple reproduction showing that the problem is not send/receive: ```ruby RACTORS = ARGV.first == "ractor" J = { rand => rand }.to_json Ractor.make_shareable(J) if RACTORS rs = [] 10.times.each do rs << Ractor.new do i = 0 while i < 100_000 JSON.parse(J) i+=1 end end end rs.each(&:take) else 1_000_000.times do JSON.parse(J) end end ``` The ractor example should take less time, but it doesn't. ---------------------------------------- Bug #19288: Ractor JSON parsing significantly slower than linear parsing https://bugs.ruby-lang.org/issues/19288#change-101435 * Author: maciej.mensfeld (Maciej Mensfeld) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.0 (2022-12-25 revision a528908271) [x86_64-linux] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- a simple benchmark: ```ruby require 'json' require 'benchmark' CONCURRENT = 5 RACTORS = true ELEMENTS = 100_000 data = CONCURRENT.times.map do ELEMENTS.times.map do { rand => rand, rand => rand, rand => rand, rand => rand }.to_json end end ractors = CONCURRENT.times.map do Ractor.new do Ractor.receive.each { JSON.parse(_1) } end end result = Benchmark.measure do if RACTORS CONCURRENT.times do |i| ractors[i].send(data[i], move: false) end ractors.each(&:take) else # Linear without any threads data.each do |piece| piece.each { JSON.parse(_1) } end end end puts result ``` Gives following results on my 8 core machine: ```shell # without ractors: 2.731748 0.003993 2.735741 ( 2.736349) # with ractors 12.580452 5.089802 17.670254 ( 5.209755) ``` I would expect Ractors not to be two times slower on the CPU intense work. -- https://bugs.ruby-lang.org/

Issue #19288 has been updated by Eregon (Benoit Daloze). maciej.mensfeld (Maciej Mensfeld) wrote in #note-4:
I find this issue important and if mitigated, it would allow me to release production-grade functionalities that would benefit users of the Ruby language.
Note that Ractor is far from production-ready. It has many issues as can be found on this bug tracker and when using it and as the warning says (`Also there are many implementation issues.`). Also the fact that the main Ruby test suites don't run any Ractor test in the same process also seems an indication of instability. And then of course there is the issue that Ractor is incompatible with most gems/code out there. While JSON loading might work, any non-trivial processing after using a gem is unlikely to work well. Other Rubies have solved this in a much more efficient, usable and reliable way, by having no GVL. ---------------------------------------- Bug #19288: Ractor JSON parsing significantly slower than linear parsing https://bugs.ruby-lang.org/issues/19288#change-101436 * Author: maciej.mensfeld (Maciej Mensfeld) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.0 (2022-12-25 revision a528908271) [x86_64-linux] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- a simple benchmark: ```ruby require 'json' require 'benchmark' CONCURRENT = 5 RACTORS = true ELEMENTS = 100_000 data = CONCURRENT.times.map do ELEMENTS.times.map do { rand => rand, rand => rand, rand => rand, rand => rand }.to_json end end ractors = CONCURRENT.times.map do Ractor.new do Ractor.receive.each { JSON.parse(_1) } end end result = Benchmark.measure do if RACTORS CONCURRENT.times do |i| ractors[i].send(data[i], move: false) end ractors.each(&:take) else # Linear without any threads data.each do |piece| piece.each { JSON.parse(_1) } end end end puts result ``` Gives following results on my 8 core machine: ```shell # without ractors: 2.731748 0.003993 2.735741 ( 2.736349) # with ractors 12.580452 5.089802 17.670254 ( 5.209755) ``` I would expect Ractors not to be two times slower on the CPU intense work. -- https://bugs.ruby-lang.org/

Issue #19288 has been updated by maciej.mensfeld (Maciej Mensfeld).
Note that Ractor is far from production-ready.
I am well aware. I just provide a justification and since my case seems to fit the limited scope of this functionality, I wanted to raise the attention.
While JSON loading might work, any non-trivial processing after using a gem is unlikely to work well.
This is exactly why I want to get a limited functionality that anyhow would allow me to parallelize the processing.
Other Rubies have solved this in a much more efficient, usable and reliable way, by having no GVL.
I am also aware of this :) ---------------------------------------- Bug #19288: Ractor JSON parsing significantly slower than linear parsing https://bugs.ruby-lang.org/issues/19288#change-101437 * Author: maciej.mensfeld (Maciej Mensfeld) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.0 (2022-12-25 revision a528908271) [x86_64-linux] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- a simple benchmark: ```ruby require 'json' require 'benchmark' CONCURRENT = 5 RACTORS = true ELEMENTS = 100_000 data = CONCURRENT.times.map do ELEMENTS.times.map do { rand => rand, rand => rand, rand => rand, rand => rand }.to_json end end ractors = CONCURRENT.times.map do Ractor.new do Ractor.receive.each { JSON.parse(_1) } end end result = Benchmark.measure do if RACTORS CONCURRENT.times do |i| ractors[i].send(data[i], move: false) end ractors.each(&:take) else # Linear without any threads data.each do |piece| piece.each { JSON.parse(_1) } end end end puts result ``` Gives following results on my 8 core machine: ```shell # without ractors: 2.731748 0.003993 2.735741 ( 2.736349) # with ractors 12.580452 5.089802 17.670254 ( 5.209755) ``` I would expect Ractors not to be two times slower on the CPU intense work. -- https://bugs.ruby-lang.org/

Issue #19288 has been updated by luke-gru (Luke Gruber).
It has many issues as can be found on this bug tracker and when using it and as the warning says (Also there are many implementation issues.).
I think the implementation issues are solvable but the bigger picture issue of adoption is of course up in the air. IMO if are allowed to have an API, for example, of Ractor.disable_isolation_checks! { ... } for use around thread-safe code, that would be a big win in my book. Also about the test-suite, I do want to add in-process ractor tests. I hope the ruby core team isn't against it. ---------------------------------------- Bug #19288: Ractor JSON parsing significantly slower than linear parsing https://bugs.ruby-lang.org/issues/19288#change-101445 * Author: maciej.mensfeld (Maciej Mensfeld) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.0 (2022-12-25 revision a528908271) [x86_64-linux] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- a simple benchmark: ```ruby require 'json' require 'benchmark' CONCURRENT = 5 RACTORS = true ELEMENTS = 100_000 data = CONCURRENT.times.map do ELEMENTS.times.map do { rand => rand, rand => rand, rand => rand, rand => rand }.to_json end end ractors = CONCURRENT.times.map do Ractor.new do Ractor.receive.each { JSON.parse(_1) } end end result = Benchmark.measure do if RACTORS CONCURRENT.times do |i| ractors[i].send(data[i], move: false) end ractors.each(&:take) else # Linear without any threads data.each do |piece| piece.each { JSON.parse(_1) } end end end puts result ``` Gives following results on my 8 core machine: ```shell # without ractors: 2.731748 0.003993 2.735741 ( 2.736349) # with ractors 12.580452 5.089802 17.670254 ( 5.209755) ``` I would expect Ractors not to be two times slower on the CPU intense work. -- https://bugs.ruby-lang.org/

Issue #19288 has been updated by maciej.mensfeld (Maciej Mensfeld).
I think the implementation issues are solvable but the bigger picture issue of adoption is of course up in the air.
The first step to adoption is to have a case for it that could be used. I believe the case I presented is viable and should be considered. ---------------------------------------- Bug #19288: Ractor JSON parsing significantly slower than linear parsing https://bugs.ruby-lang.org/issues/19288#change-101447 * Author: maciej.mensfeld (Maciej Mensfeld) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.0 (2022-12-25 revision a528908271) [x86_64-linux] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- a simple benchmark: ```ruby require 'json' require 'benchmark' CONCURRENT = 5 RACTORS = true ELEMENTS = 100_000 data = CONCURRENT.times.map do ELEMENTS.times.map do { rand => rand, rand => rand, rand => rand, rand => rand }.to_json end end ractors = CONCURRENT.times.map do Ractor.new do Ractor.receive.each { JSON.parse(_1) } end end result = Benchmark.measure do if RACTORS CONCURRENT.times do |i| ractors[i].send(data[i], move: false) end ractors.each(&:take) else # Linear without any threads data.each do |piece| piece.each { JSON.parse(_1) } end end end puts result ``` Gives following results on my 8 core machine: ```shell # without ractors: 2.731748 0.003993 2.735741 ( 2.736349) # with ractors 12.580452 5.089802 17.670254 ( 5.209755) ``` I would expect Ractors not to be two times slower on the CPU intense work. -- https://bugs.ruby-lang.org/

Issue #19288 has been updated by luke-gru (Luke Gruber). This PR I made to JSON repository is related: https://github.com/flori/json/pull/512 ---------------------------------------- Bug #19288: Ractor JSON parsing significantly slower than linear parsing https://bugs.ruby-lang.org/issues/19288#change-101541 * Author: maciej.mensfeld (Maciej Mensfeld) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.0 (2022-12-25 revision a528908271) [x86_64-linux] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- a simple benchmark: ```ruby require 'json' require 'benchmark' CONCURRENT = 5 RACTORS = true ELEMENTS = 100_000 data = CONCURRENT.times.map do ELEMENTS.times.map do { rand => rand, rand => rand, rand => rand, rand => rand }.to_json end end ractors = CONCURRENT.times.map do Ractor.new do Ractor.receive.each { JSON.parse(_1) } end end result = Benchmark.measure do if RACTORS CONCURRENT.times do |i| ractors[i].send(data[i], move: false) end ractors.each(&:take) else # Linear without any threads data.each do |piece| piece.each { JSON.parse(_1) } end end end puts result ``` Gives following results on my 8 core machine: ```shell # without ractors: 2.731748 0.003993 2.735741 ( 2.736349) # with ractors 12.580452 5.089802 17.670254 ( 5.209755) ``` I would expect Ractors not to be two times slower on the CPU intense work. -- https://bugs.ruby-lang.org/

Issue #19288 has been updated by duerst (Martin Dürst). Eregon (Benoit Daloze) wrote in #note-7:
And then of course there is the issue that Ractor is incompatible with most gems/code out there. While JSON loading might work, any non-trivial processing after using a gem is unlikely to work well. Other Rubies have solved this in a much more efficient, usable and reliable way, by having no GVL.
But don't other Rubies rely on the programmer to know how to program with threads? That's only usable if you're used to programming with threads and avoid the related issues. The idea (where the implementation and many gems may still have to catch up) behind Ractor is that thread-related issues such as data races can be avoided at the level of the programming model. ---------------------------------------- Bug #19288: Ractor JSON parsing significantly slower than linear parsing https://bugs.ruby-lang.org/issues/19288#change-101546 * Author: maciej.mensfeld (Maciej Mensfeld) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.0 (2022-12-25 revision a528908271) [x86_64-linux] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- a simple benchmark: ```ruby require 'json' require 'benchmark' CONCURRENT = 5 RACTORS = true ELEMENTS = 100_000 data = CONCURRENT.times.map do ELEMENTS.times.map do { rand => rand, rand => rand, rand => rand, rand => rand }.to_json end end ractors = CONCURRENT.times.map do Ractor.new do Ractor.receive.each { JSON.parse(_1) } end end result = Benchmark.measure do if RACTORS CONCURRENT.times do |i| ractors[i].send(data[i], move: false) end ractors.each(&:take) else # Linear without any threads data.each do |piece| piece.each { JSON.parse(_1) } end end end puts result ``` Gives following results on my 8 core machine: ```shell # without ractors: 2.731748 0.003993 2.735741 ( 2.736349) # with ractors 12.580452 5.089802 17.670254 ( 5.209755) ``` I would expect Ractors not to be two times slower on the CPU intense work. -- https://bugs.ruby-lang.org/

Issue #19288 has been updated by maciej.mensfeld (Maciej Mensfeld).
And then of course there is the issue that Ractor is incompatible with most gems/code out there. While JSON loading might work, any non-trivial processing after using a gem is unlikely to work well.
We need to start somewhere. Even if trivial/isolated cases work, if they work well, they can act as the first milestone to usage of this API for commercial benefits and I am willing to take the risk ;) ---------------------------------------- Bug #19288: Ractor JSON parsing significantly slower than linear parsing https://bugs.ruby-lang.org/issues/19288#change-102496 * Author: maciej.mensfeld (Maciej Mensfeld) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.0 (2022-12-25 revision a528908271) [x86_64-linux] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- a simple benchmark: ```ruby require 'json' require 'benchmark' CONCURRENT = 5 RACTORS = true ELEMENTS = 100_000 data = CONCURRENT.times.map do ELEMENTS.times.map do { rand => rand, rand => rand, rand => rand, rand => rand }.to_json end end ractors = CONCURRENT.times.map do Ractor.new do Ractor.receive.each { JSON.parse(_1) } end end result = Benchmark.measure do if RACTORS CONCURRENT.times do |i| ractors[i].send(data[i], move: false) end ractors.each(&:take) else # Linear without any threads data.each do |piece| piece.each { JSON.parse(_1) } end end end puts result ``` Gives following results on my 8 core machine: ```shell # without ractors: 2.731748 0.003993 2.735741 ( 2.736349) # with ractors 12.580452 5.089802 17.670254 ( 5.209755) ``` I would expect Ractors not to be two times slower on the CPU intense work. -- https://bugs.ruby-lang.org/

Issue #19288 has been updated by Eregon (Benoit Daloze). duerst (Martin Dürst) wrote in #note-12:
But don't other Rubies rely on the programmer to know how to program with threads? That's only usable if you're used to programming with threads and avoid the related issues. The idea (where the implementation and many gems may still have to catch up) behind Ractor is that thread-related issues such as data races can be avoided at the level of the programming model.
We're getting a bit off-topic, but I believe not necessarily. And the GIL doesn't prevent most Ruby-level threading issues, so in that matter it's almost the same on CRuby. For example I would think many Ruby on Rails devs don't know well threading, and they don't need to, even though webservers like Puma use threads. Deep knowledge of multithreading is needed e.g. when creating concurrent data structures, but using them OTOH doesn't require much. I would think for most programmers, using threads is much easier and more intuitive than having the big limitations of Ractor which prevent sharing any state, especially in an imperative and stateful language like Ruby where almost everything is mutable. IMO Ractors are way more difficult to use than threads. They can also have some sorts of race conditions due to message order, so it's not that much safer either. And it's a lot less efficient for any communication between ractors vs threads (Ractor copy or move both need a object graph walk). ---------------------------------------- Bug #19288: Ractor JSON parsing significantly slower than linear parsing https://bugs.ruby-lang.org/issues/19288#change-102498 * Author: maciej.mensfeld (Maciej Mensfeld) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.0 (2022-12-25 revision a528908271) [x86_64-linux] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- a simple benchmark: ```ruby require 'json' require 'benchmark' CONCURRENT = 5 RACTORS = true ELEMENTS = 100_000 data = CONCURRENT.times.map do ELEMENTS.times.map do { rand => rand, rand => rand, rand => rand, rand => rand }.to_json end end ractors = CONCURRENT.times.map do Ractor.new do Ractor.receive.each { JSON.parse(_1) } end end result = Benchmark.measure do if RACTORS CONCURRENT.times do |i| ractors[i].send(data[i], move: false) end ractors.each(&:take) else # Linear without any threads data.each do |piece| piece.each { JSON.parse(_1) } end end end puts result ``` Gives following results on my 8 core machine: ```shell # without ractors: 2.731748 0.003993 2.735741 ( 2.736349) # with ractors 12.580452 5.089802 17.670254 ( 5.209755) ``` I would expect Ractors not to be two times slower on the CPU intense work. -- https://bugs.ruby-lang.org/

Issue #19288 has been updated by maciej.mensfeld (Maciej Mensfeld). I want to revisit our discussion about leveraging Ruby Ractors for parallel JSON parsing. It appears there hasn't been much activity on this thread for a long time. I found it pertinent to mention that during the recent RubyKaigi conference, Koichi Sasada highlighted the need for real-life/commercial use-cases to showcase Ractors' potential. To that end, I wanted to bring forth that I do have a practical, commercial scenario. Karafka handles parsing of thousands or more of JSONs in parallel. Having Ractors support in such a context could substantially enhance performance, providing a tangible benefit to the end users. Given this real-life use case, are there any updates or plans to continue work on allowing Ractors to operate faster in the presented-by-me scenario? It would indeed be invaluable for many of users working with Kafka in Ruby. While the end-user processing of data still will have to happen in a single Ractor, parsing seems like a great example where immutable raw payload can be shipped to independent ractors and frozen deserialized payloads can be shipped back. ---------------------------------------- Bug #19288: Ractor JSON parsing significantly slower than linear parsing https://bugs.ruby-lang.org/issues/19288#change-104792 * Author: maciej.mensfeld (Maciej Mensfeld) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.0 (2022-12-25 revision a528908271) [x86_64-linux] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- a simple benchmark: ```ruby require 'json' require 'benchmark' CONCURRENT = 5 RACTORS = true ELEMENTS = 100_000 data = CONCURRENT.times.map do ELEMENTS.times.map do { rand => rand, rand => rand, rand => rand, rand => rand }.to_json end end ractors = CONCURRENT.times.map do Ractor.new do Ractor.receive.each { JSON.parse(_1) } end end result = Benchmark.measure do if RACTORS CONCURRENT.times do |i| ractors[i].send(data[i], move: false) end ractors.each(&:take) else # Linear without any threads data.each do |piece| piece.each { JSON.parse(_1) } end end end puts result ``` Gives following results on my 8 core machine: ```shell # without ractors: 2.731748 0.003993 2.735741 ( 2.736349) # with ractors 12.580452 5.089802 17.670254 ( 5.209755) ``` I would expect Ractors not to be two times slower on the CPU intense work. -- https://bugs.ruby-lang.org/

Issue #19288 has been updated by byroot (Jean Boussier). I profiled this repro out of curiosity, and ractors spend 32% of their time waiting for the VM lock (`vm_lock_enter` + the unlock) to be able to lookup in the `fstring` table: https://share.firefox.dev/4152X8a Currently this is done explicitly by the `json` gem, but even if `json` wasn't attempting to do it, Ruby would do the same thing once we're trying to insert string keys: https://share.firefox.dev/4hsVPbC I suppose there are various solutions to this with their own tradeoffs: - Don't intern hash keys when not on the main ractor. - Protect the `fstring` table with its own dedicated Read-write lock, so that concurrent Ractors can lookup string (but only one insert). And also reduce contention for other areas still protected by the remaining VM lock. - Somehow replace the fstring table by a truly lockfree hash table. ---------------------------------------- Bug #19288: Ractor JSON parsing significantly slower than linear parsing https://bugs.ruby-lang.org/issues/19288#change-111806 * Author: maciej.mensfeld (Maciej Mensfeld) * Status: Open * ruby -v: ruby 3.2.0 (2022-12-25 revision a528908271) [x86_64-linux] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- a simple benchmark: ```ruby require 'json' require 'benchmark' CONCURRENT = 5 RACTORS = true ELEMENTS = 100_000 data = CONCURRENT.times.map do ELEMENTS.times.map do { rand => rand, rand => rand, rand => rand, rand => rand }.to_json end end ractors = CONCURRENT.times.map do Ractor.new do Ractor.receive.each { JSON.parse(_1) } end end result = Benchmark.measure do if RACTORS CONCURRENT.times do |i| ractors[i].send(data[i], move: false) end ractors.each(&:take) else # Linear without any threads data.each do |piece| piece.each { JSON.parse(_1) } end end end puts result ``` Gives following results on my 8 core machine: ```shell # without ractors: 2.731748 0.003993 2.735741 ( 2.736349) # with ractors 12.580452 5.089802 17.670254 ( 5.209755) ``` I would expect Ractors not to be two times slower on the CPU intense work. -- https://bugs.ruby-lang.org/

Issue #19288 has been updated by tenderlovemaking (Aaron Patterson). byroot (Jean Boussier) wrote in #note-16:
I profiled this repro out of curiosity, and ractors spend 32% of their time waiting for the VM lock (`vm_lock_enter` + the unlock) to be able to lookup in the `fstring` table: https://share.firefox.dev/4152X8a
Currently this is done explicitly by the `json` gem, but even if `json` wasn't attempting to do it, Ruby would do the same thing once we're trying to insert string keys: https://share.firefox.dev/4hsVPbC
I suppose there are various solutions to this with their own tradeoffs:
- Don't intern hash keys when not on the main ractor. - Protect the `fstring` table with its own dedicated Read-write lock, so that concurrent Ractors can lookup string (but only one insert). And also reduce contention for other areas still protected by the remaining VM lock.
I'm not sure if this one is possible. If some Ractor is updating the hash, it could be in an inconsistent state when another Ractor is trying to read. Maybe st table updates are atomic, but I don't know (and I kind of doubt it).
- Somehow replace the fstring table by a truly lockfree hash table.
This would be ideal IMO, but seems hard 😅 We could also add an fstring table to each Ractor. I know the purpose of the fstring table is to limit the number of instances of a string, but at least we would be limited to the number of Ractors (rather than no limit when there are multiple Ractors). ---------------------------------------- Bug #19288: Ractor JSON parsing significantly slower than linear parsing https://bugs.ruby-lang.org/issues/19288#change-111810 * Author: maciej.mensfeld (Maciej Mensfeld) * Status: Open * ruby -v: ruby 3.2.0 (2022-12-25 revision a528908271) [x86_64-linux] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- a simple benchmark: ```ruby require 'json' require 'benchmark' CONCURRENT = 5 RACTORS = true ELEMENTS = 100_000 data = CONCURRENT.times.map do ELEMENTS.times.map do { rand => rand, rand => rand, rand => rand, rand => rand }.to_json end end ractors = CONCURRENT.times.map do Ractor.new do Ractor.receive.each { JSON.parse(_1) } end end result = Benchmark.measure do if RACTORS CONCURRENT.times do |i| ractors[i].send(data[i], move: false) end ractors.each(&:take) else # Linear without any threads data.each do |piece| piece.each { JSON.parse(_1) } end end end puts result ``` Gives following results on my 8 core machine: ```shell # without ractors: 2.731748 0.003993 2.735741 ( 2.736349) # with ractors 12.580452 5.089802 17.670254 ( 5.209755) ``` I would expect Ractors not to be two times slower on the CPU intense work. -- https://bugs.ruby-lang.org/

Issue #19288 has been updated by byroot (Jean Boussier).
I'm not sure if this one is possible. If some Ractor is updating the hash, it could be in an inconsistent state when another Ractor is trying to read.
A RW-lock doesn't allow reads while the write lock is held.
We could also add an fstring table to each Ractor.
That's an interesting idea. ---------------------------------------- Bug #19288: Ractor JSON parsing significantly slower than linear parsing https://bugs.ruby-lang.org/issues/19288#change-111811 * Author: maciej.mensfeld (Maciej Mensfeld) * Status: Open * ruby -v: ruby 3.2.0 (2022-12-25 revision a528908271) [x86_64-linux] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- a simple benchmark: ```ruby require 'json' require 'benchmark' CONCURRENT = 5 RACTORS = true ELEMENTS = 100_000 data = CONCURRENT.times.map do ELEMENTS.times.map do { rand => rand, rand => rand, rand => rand, rand => rand }.to_json end end ractors = CONCURRENT.times.map do Ractor.new do Ractor.receive.each { JSON.parse(_1) } end end result = Benchmark.measure do if RACTORS CONCURRENT.times do |i| ractors[i].send(data[i], move: false) end ractors.each(&:take) else # Linear without any threads data.each do |piece| piece.each { JSON.parse(_1) } end end end puts result ``` Gives following results on my 8 core machine: ```shell # without ractors: 2.731748 0.003993 2.735741 ( 2.736349) # with ractors 12.580452 5.089802 17.670254 ( 5.209755) ``` I would expect Ractors not to be two times slower on the CPU intense work. -- https://bugs.ruby-lang.org/

Issue #19288 has been updated by Eregon (Benoit Daloze). tenderlovemaking (Aaron Patterson) wrote in #note-17:
We could also add an fstring table to each Ractor [...] but at least we would be limited to the number of Ractors
The number of Ractors can be high though since M-N threads. Also one might rely that the result of `String#-@` is always the same object for equivalent strings, even across Ractors. Concretely this means a string that is interned in one Ractor won't be interned in another Ractor, that seems surprising. (FWIW TruffleRuby solves this using a ConcurrentHashMap with weak values) ---------------------------------------- Bug #19288: Ractor JSON parsing significantly slower than linear parsing https://bugs.ruby-lang.org/issues/19288#change-111823 * Author: maciej.mensfeld (Maciej Mensfeld) * Status: Open * ruby -v: ruby 3.2.0 (2022-12-25 revision a528908271) [x86_64-linux] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- a simple benchmark: ```ruby require 'json' require 'benchmark' CONCURRENT = 5 RACTORS = true ELEMENTS = 100_000 data = CONCURRENT.times.map do ELEMENTS.times.map do { rand => rand, rand => rand, rand => rand, rand => rand }.to_json end end ractors = CONCURRENT.times.map do Ractor.new do Ractor.receive.each { JSON.parse(_1) } end end result = Benchmark.measure do if RACTORS CONCURRENT.times do |i| ractors[i].send(data[i], move: false) end ractors.each(&:take) else # Linear without any threads data.each do |piece| piece.each { JSON.parse(_1) } end end end puts result ``` Gives following results on my 8 core machine: ```shell # without ractors: 2.731748 0.003993 2.735741 ( 2.736349) # with ractors 12.580452 5.089802 17.670254 ( 5.209755) ``` I would expect Ractors not to be two times slower on the CPU intense work. -- https://bugs.ruby-lang.org/

Issue #19288 has been updated by jhawthorn (John Hawthorn). Status changed from Open to Closed I've made a couple changes which should improve this. This is a bit of an odd benchmark as the keys are always random (which I think would be quite uncommon in real JSON parsing situations). But this makes it an interesting benchmark for some worst case behaviours. The two things this benchmark is exercising, as others have pointed out, is float parsing and insertion into the fstring table. The first change is https://github.com/ruby/ruby/pull/12991, which removes a spinlock in `dtoa.c` for bignum memory management (Balloc/Bfree). The second change is https://bugs.ruby-lang.org/issues/21268 / https://github.com/ruby/ruby/pull/12921, which I've just merged (NB. it **isn't** in preview1). With these two I believe the main bottlenecks of this benchmark have been removed. Though performance can always be improved, these are now faster under Ractors than serial. The main contention is now GC pressure caused by the JSON being parsed (in particular the random string keys, I would expect a more reasonable set of JSON to perform even better). --- **Original benchmark** ``` Serial - Ruby 3.4.2 0.995186 0.030131 1.025317 ( 1.025439) Serial - Ruby master@3a29e835e69187330bb3fee035f1289561de5dad 0.933650 0.014119 0.947769 ( 0.947780) Ractors - Ruby 3.4.2 10.875507 1.600797 12.476304 ( 3.380926) Ractors - Ruby master@3a29e835e69187330bb3fee035f1289561de5dad 1.490369 0.192624 1.682993 ( 0.749747) ``` **Eregon's improved benchmark - with shareable objects** ``` Serial - Ruby 3.4.2 0.982018 0.035003 1.017021 ( 1.017816) Serial - Ruby master@3a29e835e69187330bb3fee035f1289561de5dad 0.981966 0.026635 1.008601 ( 1.008593) Ractors - Ruby 3.4.2 12.232766 1.772653 14.005419 ( 3.578559) Ractors - Ruby master@3a29e835e69187330bb3fee035f1289561de5dad 1.301200 0.122725 1.423925 ( 0.470395) ``` **Luke-gru's Benchmark - with no send/recv** ``` Benchmark 1: ruby 3.4.2 serial Time (mean ± σ): 459.4 ms ± 10.5 ms [User: 449.8 ms, System: 8.8 ms] Range (min … max): 450.7 ms … 478.2 ms 10 runs Benchmark 2: ruby master serial Time (mean ± σ): 391.2 ms ± 60.4 ms [User: 381.7 ms, System: 8.5 ms] Range (min … max): 220.0 ms … 419.0 ms 10 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Benchmark 3: ruby 3.4.2 ractors Time (mean ± σ): 2.258 s ± 0.685 s [User: 13.585 s, System: 0.367 s] Range (min … max): 0.310 s … 2.484 s 10 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Benchmark 4: ruby master ractors Time (mean ± σ): 191.2 ms ± 13.8 ms [User: 547.1 ms, System: 110.3 ms] Range (min … max): 144.3 ms … 201.5 ms 14 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Summary ruby master ractors ran 2.05 ± 0.35 times faster than ruby master serial 2.40 ± 0.18 times faster than ruby 3.4.2 serial 11.81 ± 3.68 times faster than ruby 3.4.2 ractors ``` ---------------------------------------- Bug #19288: Ractor JSON parsing significantly slower than linear parsing https://bugs.ruby-lang.org/issues/19288#change-112734 * Author: maciej.mensfeld (Maciej Mensfeld) * Status: Closed * ruby -v: ruby 3.2.0 (2022-12-25 revision a528908271) [x86_64-linux] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- a simple benchmark: ```ruby require 'json' require 'benchmark' CONCURRENT = 5 RACTORS = true ELEMENTS = 100_000 data = CONCURRENT.times.map do ELEMENTS.times.map do { rand => rand, rand => rand, rand => rand, rand => rand }.to_json end end ractors = CONCURRENT.times.map do Ractor.new do Ractor.receive.each { JSON.parse(_1) } end end result = Benchmark.measure do if RACTORS CONCURRENT.times do |i| ractors[i].send(data[i], move: false) end ractors.each(&:take) else # Linear without any threads data.each do |piece| piece.each { JSON.parse(_1) } end end end puts result ``` Gives following results on my 8 core machine: ```shell # without ractors: 2.731748 0.003993 2.735741 ( 2.736349) # with ractors 12.580452 5.089802 17.670254 ( 5.209755) ``` I would expect Ractors not to be two times slower on the CPU intense work. -- https://bugs.ruby-lang.org/
participants (9)
-
byroot (Jean Boussier)
-
duerst
-
Eregon (Benoit Daloze)
-
Eregon (Benoit Daloze)
-
jhawthorn (John Hawthorn)
-
luke-gru (Luke Gruber)
-
maciej.mensfeld (Maciej Mensfeld)
-
maciej.mensfeld (Maciej Mensfeld)
-
tenderlovemaking (Aaron Patterson)