[ruby-core:123136] [Ruby Feature#21557] Ractor.shareable_proc to make sharable Proc objects, safely and flexibly

Issue #21557 has been reported by Eregon (Benoit Daloze). ---------------------------------------- Feature #21557: Ractor.shareable_proc to make sharable Proc objects, safely and flexibly https://bugs.ruby-lang.org/issues/21557 * Author: Eregon (Benoit Daloze) * Status: Open ---------------------------------------- Following #21039 and #21550, this is a complete proposal which does not require reading these previous proposals (since that caused some confusion). That way, it is hopefully as clear as possible. It also explains how it solves everything we discussed in the previous tickets. To use Ractor effectively, one needs to create Procs which are shareable between Ractors. Of course, such Procs must not refer to any unshareable object (otherwise the Ractor invariant is broken and segfaults follow). One key feature of blocks/Procs is to be able to capture outer variables, e.g.: ```ruby data = ... task = -> { do_work(data) } ``` Ractor shareable procs should be able to use captured variables, because this is one of the most elegant ways to pass data/input in Ruby. But there is a fundamental conflict there, reassigning captured variables cannot be honored by shareable procs, otherwise it breaks the Ractor invariant. So creating a shareable proc internally makes a shallow copy of the environment, to not break the Ractor invariant. We cannot prevent assigning local variables (i.e. raise an exception on `foo = value`), that would be way to weird. But we can raise an error when trying to create a shareable proc in an incompatible situation, that makes it safe by preventing the unsafe cases. ## Reassigning a captured variable inside the block Concretely, it seems we all already agree that this should be a `Ractor::IsolationError`: ```ruby def example a = 1 b = proc { v = a; a += 1; v } r = Ractor.shareable_proc(&b) # Ractor::IsolationError: cannot isolate a block because it accesses outer variables (a) which are reassigned inside the block [b, r] end example.map(&:call) ``` And that's because without the error the result would be `[1, 1]` which is unexpected (it should be `[1, 2]`), `r.call` should have updated `a` to 2 but it only updated `a` in its environment copy. That basically breaks the lexical scoping of variables captured by blocks. We can check this by static analysis, in fact we already use static analysis for `Ractor.new`: `a = 0; Ractor.new { a = 2 }` which gives `can not isolate a Proc because it accesses outer variables (a). (ArgumentError)`. ## Reassigning a captured variable outside the block The second problematic case is: ```ruby # error: the code clearly assumes it can reassigns `a` but the `shareable_proc` would not respect it, i.e. `shareable_proc` would break Ruby block semantics # Also note the Ractor.shareable_proc call might be far away from the block, so one can't tell when looking at the block that it would be broken by `shareable_proc` (if no error for this case) def example a = 1 b = proc { a } Ractor.shareable_proc(&b) # Ractor::IsolationError: cannot isolate a block because it accesses outer variables (a) which are reassigned outside the block a = 2 end ``` This is very similar (it is the symmetric case), the `shareable_proc` cannot honor the `a = 2` assignment, so it should not allow creating a `shareable_proc` in that context and should be `Ractor::IsolationError`. If you don't see the issue in that small example, let's use this example: ```ruby page_views = 0 background_jobs.schedule_every(5.seconds) { puts "#{page_views} page views so far" } threaded_webserver.on("/") do page_views += 1 "Hello" end ``` If `background_jobs` uses `Thread`, everything is fine. If it uses `Ractor`, it needs to make that `schedule_every` block shareable, and if we don't add this safety check then it will always incorrectly print `0 page views so far`. This is what I mean by breaking Ruby block semantics. In this proposal, we prevent this broken semantics situation by `Ractor::IsolationError` when trying to make that `schedule_every` block shareable. One more reason here to forbid this case is that a block that is made shareable is never executed immediately on the current Ractor, because there is no need to make it shareable for that case. And so it means the block will be executed later, by some other Ractor. And that block, if it expects to be executed later, then definitely expects to see up-to-date captured variables (as in the author of the block expects that). We would check this situation by static analysis. There are multiple ways to go about it with trade-offs between precision and implementation complexity. I think we could simplify to: disallow `Ractor.shareable_proc` for any block which captures a variable which is potentially reassigned. In other words, only allow `Ractor.shareable_proc` if all the variables it captures are assigned (exactly) once. More on that later in section `Edge Cases`. ## Ractor.new Note that everything about `Ractor.shareable_proc` should also apply to `Ractor.new`, that way it's convenient to pass data via captured variables for `Ractor.new` too, example: ```ruby x = ... y = ... Ractor.new { compute(x, y) } ``` Currently `Ractor.new` does not allow capturing outer variables at all and needs workarounds such as: ```ruby x = ... y = ... Ractor.new(x, y) { |x, y| compute(x, y) } ``` ## define_method `define_method` (and of course `define_singleton_method` too) have been an issue since the beginning of Ractors, because methods defined by `define_method` just couldn't be called from a Ractor (because the block/Proc wouldn't be shareable and so can't be called from other Ractors). A workaround is to make the block/Proc shareable, but this is inconvenient, verbose and shouldn't be necessary: ```ruby def new_ostruct_member!(name) # :nodoc: unless @table.key?(name) || is_method_protected!(name) if defined?(::Ractor) getter_proc = nil.instance_eval{ Proc.new { @table[name] } } setter_proc = nil.instance_eval{ Proc.new {|x| @table[name] = x} } ::Ractor.make_shareable(getter_proc) ::Ractor.make_shareable(setter_proc) else getter_proc = Proc.new { @table[name] } setter_proc = Proc.new {|x| @table[name] = x} end define_singleton_method!(name, &getter_proc) define_singleton_method!("#{name}=", &setter_proc) end end ``` Instead, this proposal brings the idea for `define_method` to automatically call `Ractor.shareable_proc` on the given block/Proc (and fallback to the original Proc if it would raise), as if it was defined like: ```ruby def define_method(name, &body) body = Ractor.shareable_proc(self: nil, body) rescue body Primitive.define_method(name, &body) end ``` (note that `define_method` knows the `body` Proc's `self` won't be the original `self` anyway, so it's fine to change it to `nil`) This way workarounds like above are no longer needed and the code can be as simple as it used to be: ```ruby def new_ostruct_member!(name) # :nodoc: unless @table.key?(name) || is_method_protected!(name) define_singleton_method!(name) { @table[name] } define_singleton_method!("#{name}=") { |x| @table[name] = x } end end ``` Much nicer, and solves a longstanding issue with Ractor. There should be no compatibility issue since the block is only made shareable when it's safe to do so. This is another argument for making `Ractor.shareable_proc` safe. ## Ractor.shareable_proc and Ractor.shareable_lambda I believe we don't need `Ractor.shareable_lambda` (mentioned in other tickets). `Ractor.shareable_proc` should always preserve the lambda-ness (`Proc#lambda?`) of the given Proc. The role of `Ractor.shareable_proc` is to make the Proc shareable, not change arguments handling. If one wants a shareable lambda they can just use `Ractor.shareable_proc(&-> { ... })`. BTW, the added value of `Ractor.shareable_proc(self: nil, &proc)` vs just `Ractor.make_shareable(proc, copy: true)` is that it enables changing the receiver of the Proc without needing `nil.instance_eval { ... }` around, and it is much clearer. `Ractor.make_shareable(proc)` should be an error [as mentioned here](https://bugs.ruby-lang.org/issues/21039#note-14), because it would mutate the proc inplace and that's too surprising and unsafe (e.g. it would break `Proc#binding` on that Proc instance). `Ractor.make_shareable(proc, copy: true)` can be the same as `Ractor.shareable_proc(self: self, &proc)` (only works if `self` is shareable then), or an error. ## Edge Cases For these examples I'll use `enqueue`, which defines a block to execute later, either in a Thread or Ractor. For the Ractor case, `enqueue` would make the block shareable and send it to a Ractor to execute it. This is a bit more realistic than using plain `Ractor.shareable_proc` instead of `enqueue`, since it makes it clearer the block won't be executed right away on the main Ractor but later on some other Ractor. ### Nested Block Cases If the assignment is in a nested block, it's an error (this case is already detected for `Ractor.new` BTW): ```ruby a = 1 enqueue { proc { a = 1 } } # Ractor::IsolationError: cannot isolate a block because it accesses outer variables (a) which are reassigned inside the block ``` Similarly, if the assignment is in an some block outside, it's the same as if it was assigned directly outside: ```ruby a = 1 p = proc { a = 2 } enqueue { a } # Ractor::IsolationError: cannot isolate a block because it accesses outer variables (a) which are reassigned outside the block ``` ### Loop Cases This would be a `Ractor::IsolationError`, because `a` is reassigned. It would read a stale value and silently ignore reassignments if there was no `Ractor::IsolationError`. ```ruby a = 0 while condition enqueue { p a } # Ractor::IsolationError: cannot isolate a block because it accesses outer variables (a) which are reassigned outside the block a += 1 end ``` This is the same case, using a rescue-retry loop: ```ruby begin enqueue { p a } # Ractor::IsolationError: cannot isolate a block because it accesses outer variables (a) which are reassigned outside a += 1 raise rescue retry if condition end ``` A `for` loop is like a while `loop` because the LHS variable (`a`) and all variables in the loop body are actually declared outside (weird, but that's how it is). ```ruby for a in enum b = rand enqueue { p a } # Ractor::IsolationError: cannot isolate a block because it accesses outer variables (a) which are reassigned outside the block end binding.local_variables # => [:a, :b] ``` Any assignment inside one of these loops can potentially happen multiple times, so any variable assigned inside one of these loops cannot be captured by a shareable block (i.e., `Ractor::IsolationError` when trying to make a shareable block in such a case). We will need the static analysis to detect such loops. That probably doesn't need a full Control Flow Graph, we just need to determine if an assignment is "inside a while/for/retry" loop (up to a scope barrier like `def`/`class`/`module`). Regular "loops" using blocks are fine though, because they create a new environment/frame for each iteration. These 2 blocks will always see `[0, 1]` and `[0, 2]`, whether shareable or not: ```ruby a = 0 [1, 2].each do |e| enqueue { p [a, e] } # OK, each of these variables is assigned only once end ``` ### eval and binding Static analysis cannot detect `eval` or `binding`. In such an extreme and very rare case the fact that `shareable_proc` makes a copy of the environment is visible: ```ruby a = 1 b = proc { a } s = Ractor.shareable_proc(&b) eval("a = 2") # or binding.local_variable_set(:a, 2), or b.binding.local_variable_set(:a, 2) b.call # => 2 s.call # => 1 ``` This seems unavoidable, unless we prevent shareable procs to use captured variables at all (quite restrictive). BTW, `Proc#binding` is already not supported for a `shareable_proc`: ``` $ ruby -e 'nil.instance_exec { a = 1; b = proc { a }; b2 = Ractor.make_shareable(b); p b2.binding }' -e:1:in `binding': Can't create Binding from isolated Proc (ArgumentError) ``` So `binding`/`eval` is in general already not fully respected with Ractor anyway (and cannot be). ### Multiple Assignments Before This simple example assigns `a` twice. It would be safe because `a` is always assigned before (in execution, not necessarily in source order) creating the block instance/Proc instance, but it is not so easy to detect. Depending on how precise the static analysis is it might allow this case. We can always allow more and start with something simple. ```ruby a = 1 a = 2 Ractor.shareable_proc { a } # Ractor::IsolationError if using the single-assignment-only static analysis, seems OK because not so common ``` ### Error Message Differentiating `... which are reassigned inside/outside the block` might be needlessly complicated, in such a case I think it's fine to simplify the error message and omit the part after `... which are reassigned`. The important part is that the outer variable is reassigned, not whether it's inside or outside. ## Alternatives ### Relaxing the checks for literal blocks `Kernel#lambda` for example has behavior which depends on whether it's given a literal block or not: ```ruby lambda(&proc {}) # => the lambda method requires a literal block (ArgumentError) ``` We could have such a difference, but I don't think it's very useful, if a variable is reassigned, it seems a bad idea to capture and shallow-copy with a shareable proc (unclear, unsafe, etc). The semantics are also simpler if they are the same whether the block is literal or not. ### Removing the checks for reassigning a captured variable outside the block That's basically option 1 of #21550. This would allow known unsafe behavior and break the examples shown above (i.e. it would break code in a nasty way: some assignments are silently ignored, good luck to debug that). It would be hard to then forbid such cases later as it could then be considered incompatible. In my opinion we would commit a big language design mistake if we just give up and allow known unsafe cases like that, people wouldn't be able to trust that local variable assignments are respected (pretty fundamental, isn't it?) and that Ruby blocks behave as they always have been (with lexical scoping for local variables). It would also do nothing to help with `define_method`. `Ractor.shareable_proc(Proc)` is currently unsafe (the main point of #21039), let's address it, not ignore known problems, especially after a lot of discussion and thoughts on how to solve it properly. -- https://bugs.ruby-lang.org/
participants (1)
-
Eregon (Benoit Daloze)