[ruby-core:125782] [Ruby Feature#22119] Thread: inherit storage on child threads
Issue #22119 has been reported by chucke (Tiago Cardoso). ---------------------------------------- Feature #22119: Thread: inherit storage on child threads https://bugs.ruby-lang.org/issues/22119 * Author: chucke (Tiago Cardoso) * Status: Open ---------------------------------------- ## Motivation At work, we've been dealing with the need of propagating context across threads. The use cases are several, ranging from propagating logging / observability context, region / database shards, etc, on worker threads for specific workloads which require that implicit knowledge. We've historically used thread variables for it, via the `Thread#thread_variable_get/set` family of functions, and have been enforcing / gating usage of threads via a wrapper function which does the context propagation heavy-lifting. Something like: ``` ruby def new_thread do logging_context = Logging.current_context.deep_dup database_context = DB.current_context.deep_dup # etc, etc Thread.new do Logging.set_context(logging_context) DB.set_context/database_context) yield end end ``` alongside a plethora of checks and rubocops for forbid direct usage of `Thread.new`. However, we recently found out that that's not enough, since 3rd party libraries that we use will make their own use of `Thread.new` without our custom context propagation, which broke our logic in cases where we'd require p.ex. correcy database context. In order to fix that, we've resorted to monkeypatching `Thread.new` (and `Fiber.new`), which feels necessary for this use case, but not right. This seems like a feature missing from ruby. ## Research `ruby` does have prior art for "inheritable" storage: `Fiber#storage`; when a new fiber is created, by default, it'll have the same storage from the fiber it was created from (unless `Fiber.new(storage: nil)`). This is exactly what's missing here. Another example, with a little more ceremony, of the same problem solved in ruby is concurrent-ruby's [ThreadLocalVar](https://ruby-concurrency.github.io/concurrent-ruby/master/Concurrent/ThreadL...), which seems to be inspired in Java's own [InheritableThreadLocal](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Inher...). ## Proposal `ruby` should have an OTTB solution for inheritable thread storage. Some options to consider: ### 1: :storage kwarg An option would be to align with `Fiber.new` and support a `storage` kwarg: ``` ruby Thread.current.thread_variable_set(:foo, "bar") Thread.new(storage: nil) do # this has to be default Thread.current.thread_variable_get(:foo) #=> nil end Thread.new(storage: Thread.current.storage) do # would be a new method Thread.current.thread_variable_get(:foo) #=> nil end ``` as mentioned above, for backwards compatibility reasons, by default storage wouldn't be inherited, which wouldn't fit our use-case that well, unless the default could be changed viat env var, i.e. `RUBY_THREAD_STORAGE=inherit` or something of the kind (which ruby historically hasn't supported). ### 2. Thread::Variable A new primitive, `Thread::Variable`, could be integrated, that would have the same semantics as concurrent-ruby's `ThreadLocalVar`: ``` ruby VAL = Thread::Variable.new(2) Thread.new do puts VAL.value #=> 2 VAL.value = 3 puts VAL.value #=> 3 end.join puts VAL.value #=> 2 ``` I feel like 1. is more aligned with how ruby has dealt with the problem space, but the backwards compatibility issues may be a problem, so 2 is probably more feasible, despite its Java'iness. -- https://bugs.ruby-lang.org/
Issue #22119 has been updated by nobu (Nobuyoshi Nakada). chucke (Tiago Cardoso) wrote:
### 1: :storage kwarg
An option would be to align with `Fiber.new` and support a `storage` kwarg:
as mentioned above, for backwards compatibility reasons, by default storage wouldn't be inherited, which wouldn't fit our use-case that well, unless the default could be changed viat env var, i.e. `RUBY_THREAD_STORAGE=inherit` or something of the kind (which ruby historically hasn't supported).
I don't think this feature itself bad, but I'm afraid that changing the behavior by an environment variable could be too intrusive and might break other libraries. chucke (Tiago Cardoso) wrote:
### 2. Thread::Variable
A new primitive, `Thread::Variable`, could be integrated, that would have the same semantics as concurrent-ruby's `ThreadLocalVar`:
I seem to recall there being a similar proposal before, but I don't remember exactly. ---------------------------------------- Feature #22119: Thread: inherit storage on child threads https://bugs.ruby-lang.org/issues/22119#change-117696 * Author: chucke (Tiago Cardoso) * Status: Open ---------------------------------------- ## Motivation At work, we've been dealing with the need of propagating context across threads. The use cases are several, ranging from propagating logging / observability context, region / database shards, etc, on worker threads for specific workloads which require that implicit knowledge. We've historically used thread variables for it, via the `Thread#thread_variable_get/set` family of functions, and have been enforcing / gating usage of threads via a wrapper function which does the context propagation heavy-lifting. Something like: ``` ruby def new_thread do logging_context = Logging.current_context.deep_dup database_context = DB.current_context.deep_dup # etc, etc Thread.new do Logging.set_context(logging_context) DB.set_context/database_context) yield end end ``` alongside a plethora of checks and rubocops for forbid direct usage of `Thread.new`. However, we recently found out that that's not enough, since 3rd party libraries that we use will make their own use of `Thread.new` without our custom context propagation, which broke our logic in cases where we'd require p.ex. correcy database context. In order to fix that, we've resorted to monkeypatching `Thread.new` (and `Fiber.new`), which feels necessary for this use case, but not right. This seems like a feature missing from ruby. ## Research `ruby` does have prior art for "inheritable" storage: `Fiber#storage`; when a new fiber is created, by default, it'll have the same storage from the fiber it was created from (unless `Fiber.new(storage: nil)`). This is exactly what's missing here. Another example, with a little more ceremony, of the same problem solved in ruby is concurrent-ruby's [ThreadLocalVar](https://ruby-concurrency.github.io/concurrent-ruby/master/Concurrent/ThreadL...), which seems to be inspired in Java's own [InheritableThreadLocal](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Inher...). ## Proposal `ruby` should have an OTTB solution for inheritable thread storage. Some options to consider: ### 1: :storage kwarg An option would be to align with `Fiber.new` and support a `storage` kwarg: ``` ruby Thread.current.thread_variable_set(:foo, "bar") Thread.new(storage: nil) do # this has to be default Thread.current.thread_variable_get(:foo) #=> nil end Thread.new(storage: Thread.current.storage) do # would be a new method Thread.current.thread_variable_get(:foo) #=> nil end ``` as mentioned above, for backwards compatibility reasons, by default storage wouldn't be inherited, which wouldn't fit our use-case that well, unless the default could be changed viat env var, i.e. `RUBY_THREAD_STORAGE=inherit` or something of the kind (which ruby historically hasn't supported). ### 2. Thread::Variable A new primitive, `Thread::Variable`, could be integrated, that would have the same semantics as concurrent-ruby's `ThreadLocalVar`: ``` ruby VAL = Thread::Variable.new(2) Thread.new do puts VAL.value #=> 2 VAL.value = 3 puts VAL.value #=> 3 end.join puts VAL.value #=> 2 ``` I feel like 1. is more aligned with how ruby has dealt with the problem space, but the backwards compatibility issues may be a problem, so 2 is probably more feasible, despite its Java'iness. -- https://bugs.ruby-lang.org/
Issue #22119 has been updated by nobu (Nobuyoshi Nakada). Fiber-local storages are inherited across Threads. Wouldn't this be sufficient for your use case? ```ruby Fiber[:a] = 10 Thread.new { Fiber[:a] }.value #=> 10 ``` ---------------------------------------- Feature #22119: Thread: inherit storage on child threads https://bugs.ruby-lang.org/issues/22119#change-117697 * Author: chucke (Tiago Cardoso) * Status: Open ---------------------------------------- ## Motivation At work, we've been dealing with the need of propagating context across threads. The use cases are several, ranging from propagating logging / observability context, region / database shards, etc, on worker threads for specific workloads which require that implicit knowledge. We've historically used thread variables for it, via the `Thread#thread_variable_get/set` family of functions, and have been enforcing / gating usage of threads via a wrapper function which does the context propagation heavy-lifting. Something like: ``` ruby def new_thread do logging_context = Logging.current_context.deep_dup database_context = DB.current_context.deep_dup # etc, etc Thread.new do Logging.set_context(logging_context) DB.set_context/database_context) yield end end ``` alongside a plethora of checks and rubocops for forbid direct usage of `Thread.new`. However, we recently found out that that's not enough, since 3rd party libraries that we use will make their own use of `Thread.new` without our custom context propagation, which broke our logic in cases where we'd require p.ex. correcy database context. In order to fix that, we've resorted to monkeypatching `Thread.new` (and `Fiber.new`), which feels necessary for this use case, but not right. This seems like a feature missing from ruby. ## Research `ruby` does have prior art for "inheritable" storage: `Fiber#storage`; when a new fiber is created, by default, it'll have the same storage from the fiber it was created from (unless `Fiber.new(storage: nil)`). This is exactly what's missing here. Another example, with a little more ceremony, of the same problem solved in ruby is concurrent-ruby's [ThreadLocalVar](https://ruby-concurrency.github.io/concurrent-ruby/master/Concurrent/ThreadL...), which seems to be inspired in Java's own [InheritableThreadLocal](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Inher...). ## Proposal `ruby` should have an OTTB solution for inheritable thread storage. Some options to consider: ### 1: :storage kwarg An option would be to align with `Fiber.new` and support a `storage` kwarg: ``` ruby Thread.current.thread_variable_set(:foo, "bar") Thread.new(storage: nil) do # this has to be default Thread.current.thread_variable_get(:foo) #=> nil end Thread.new(storage: Thread.current.storage) do # would be a new method Thread.current.thread_variable_get(:foo) #=> nil end ``` as mentioned above, for backwards compatibility reasons, by default storage wouldn't be inherited, which wouldn't fit our use-case that well, unless the default could be changed viat env var, i.e. `RUBY_THREAD_STORAGE=inherit` or something of the kind (which ruby historically hasn't supported). ### 2. Thread::Variable A new primitive, `Thread::Variable`, could be integrated, that would have the same semantics as concurrent-ruby's `ThreadLocalVar`: ``` ruby VAL = Thread::Variable.new(2) Thread.new do puts VAL.value #=> 2 VAL.value = 3 puts VAL.value #=> 3 end.join puts VAL.value #=> 2 ``` I feel like 1. is more aligned with how ruby has dealt with the problem space, but the backwards compatibility issues may be a problem, so 2 is probably more feasible, despite its Java'iness. -- https://bugs.ruby-lang.org/
Issue #22119 has been updated by jhawthorn (John Hawthorn). It was also initially a surprise to me that Fiber storage is inherited across threads as well. In my experience, inheritable Fiber storage has mostly been a mistake and cause of bugs, I don't think we should spread that to threads. New threads/fibers spawned implicitly inherit Fiber storage in the state it was in at that time. If these Threads/Fibers are long lived or spawn additional Threads/Fibers, they will retain whatever values were set in that storage at the time for their lifetime (which for ex. a thread pool is the life of the process) Worse, anything in Fiber[:whatever] is potentially shared mutable state, however it is rarely treated as such. Anything mutable in Fiber storage should require a Mutex guard, and whatever else one would otherwise have used for mutable, shared, global state. I would have preferred we'd never introduced inheritable Fiber storage into the language and instead developed a framework-level convention (say, some sort of helper method to dup and copy :context or :request_context) instead of pushing this into the language (which would also be problematic, but at least contained and not incorrect by default). ---------------------------------------- Feature #22119: Thread: inherit storage on child threads https://bugs.ruby-lang.org/issues/22119#change-117701 * Author: chucke (Tiago Cardoso) * Status: Open ---------------------------------------- ## Motivation At work, we've been dealing with the need of propagating context across threads. The use cases are several, ranging from propagating logging / observability context, region / database shards, etc, on worker threads for specific workloads which require that implicit knowledge. We've historically used thread variables for it, via the `Thread#thread_variable_get/set` family of functions, and have been enforcing / gating usage of threads via a wrapper function which does the context propagation heavy-lifting. Something like: ``` ruby def new_thread do logging_context = Logging.current_context.deep_dup database_context = DB.current_context.deep_dup # etc, etc Thread.new do Logging.set_context(logging_context) DB.set_context/database_context) yield end end ``` alongside a plethora of checks and rubocops for forbid direct usage of `Thread.new`. However, we recently found out that that's not enough, since 3rd party libraries that we use will make their own use of `Thread.new` without our custom context propagation, which broke our logic in cases where we'd require p.ex. correcy database context. In order to fix that, we've resorted to monkeypatching `Thread.new` (and `Fiber.new`), which feels necessary for this use case, but not right. This seems like a feature missing from ruby. ## Research `ruby` does have prior art for "inheritable" storage: `Fiber#storage`; when a new fiber is created, by default, it'll have the same storage from the fiber it was created from (unless `Fiber.new(storage: nil)`). This is exactly what's missing here. Another example, with a little more ceremony, of the same problem solved in ruby is concurrent-ruby's [ThreadLocalVar](https://ruby-concurrency.github.io/concurrent-ruby/master/Concurrent/ThreadL...), which seems to be inspired in Java's own [InheritableThreadLocal](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Inher...). ## Proposal `ruby` should have an OTTB solution for inheritable thread storage. Some options to consider: ### 1: :storage kwarg An option would be to align with `Fiber.new` and support a `storage` kwarg: ``` ruby Thread.current.thread_variable_set(:foo, "bar") Thread.new(storage: nil) do # this has to be default Thread.current.thread_variable_get(:foo) #=> nil end Thread.new(storage: Thread.current.storage) do # would be a new method Thread.current.thread_variable_get(:foo) #=> nil end ``` as mentioned above, for backwards compatibility reasons, by default storage wouldn't be inherited, which wouldn't fit our use-case that well, unless the default could be changed viat env var, i.e. `RUBY_THREAD_STORAGE=inherit` or something of the kind (which ruby historically hasn't supported). ### 2. Thread::Variable A new primitive, `Thread::Variable`, could be integrated, that would have the same semantics as concurrent-ruby's `ThreadLocalVar`: ``` ruby VAL = Thread::Variable.new(2) Thread.new do puts VAL.value #=> 2 VAL.value = 3 puts VAL.value #=> 3 end.join puts VAL.value #=> 2 ``` I feel like 1. is more aligned with how ruby has dealt with the problem space, but the backwards compatibility issues may be a problem, so 2 is probably more feasible, despite its Java'iness. -- https://bugs.ruby-lang.org/
Issue #22119 has been updated by chucke (Tiago Cardoso).
Fiber-local storages are inherited across Threads.
Wow, that was really unexpected, I didn't know about that! I guess that would work for my use case. Is this specified or accidental behaviour?
Worse, anything in Fiber[:whatever] is potentially shared mutable state, however it is rarely treated as such.
Doesn't the same apply to `Thread.current` and thread variables? Nothing stops anyone from setting a mutable hash and changing it across threads.
I would have preferred we'd never introduced inheritable Fiber storage into the language and instead developed a framework-level convention (say, some sort of helper method to dup and copy :context or :request_context)
I understand where the sentiment is coming from, as I'm dealing with a similar situation in [this PR](https://github.com/ruby/logger/pull/132). My two cents is that ruby needs a #deep_dup (and a #deep_freeze, but that's a different discussion / feature request :) ) as a core feature. FWIW we're doing something of the kind internally, which we may open-source at some point. ---------------------------------------- Feature #22119: Thread: inherit storage on child threads https://bugs.ruby-lang.org/issues/22119#change-117702 * Author: chucke (Tiago Cardoso) * Status: Open ---------------------------------------- ## Motivation At work, we've been dealing with the need of propagating context across threads. The use cases are several, ranging from propagating logging / observability context, region / database shards, etc, on worker threads for specific workloads which require that implicit knowledge. We've historically used thread variables for it, via the `Thread#thread_variable_get/set` family of functions, and have been enforcing / gating usage of threads via a wrapper function which does the context propagation heavy-lifting. Something like: ``` ruby def new_thread do logging_context = Logging.current_context.deep_dup database_context = DB.current_context.deep_dup # etc, etc Thread.new do Logging.set_context(logging_context) DB.set_context/database_context) yield end end ``` alongside a plethora of checks and rubocops for forbid direct usage of `Thread.new`. However, we recently found out that that's not enough, since 3rd party libraries that we use will make their own use of `Thread.new` without our custom context propagation, which broke our logic in cases where we'd require p.ex. correcy database context. In order to fix that, we've resorted to monkeypatching `Thread.new` (and `Fiber.new`), which feels necessary for this use case, but not right. This seems like a feature missing from ruby. ## Research `ruby` does have prior art for "inheritable" storage: `Fiber#storage`; when a new fiber is created, by default, it'll have the same storage from the fiber it was created from (unless `Fiber.new(storage: nil)`). This is exactly what's missing here. Another example, with a little more ceremony, of the same problem solved in ruby is concurrent-ruby's [ThreadLocalVar](https://ruby-concurrency.github.io/concurrent-ruby/master/Concurrent/ThreadL...), which seems to be inspired in Java's own [InheritableThreadLocal](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Inher...). ## Proposal `ruby` should have an OTTB solution for inheritable thread storage. Some options to consider: ### 1: :storage kwarg An option would be to align with `Fiber.new` and support a `storage` kwarg: ``` ruby Thread.current.thread_variable_set(:foo, "bar") Thread.new(storage: nil) do # this has to be default Thread.current.thread_variable_get(:foo) #=> nil end Thread.new(storage: Thread.current.storage) do # would be a new method Thread.current.thread_variable_get(:foo) #=> nil end ``` as mentioned above, for backwards compatibility reasons, by default storage wouldn't be inherited, which wouldn't fit our use-case that well, unless the default could be changed viat env var, i.e. `RUBY_THREAD_STORAGE=inherit` or something of the kind (which ruby historically hasn't supported). ### 2. Thread::Variable A new primitive, `Thread::Variable`, could be integrated, that would have the same semantics as concurrent-ruby's `ThreadLocalVar`: ``` ruby VAL = Thread::Variable.new(2) Thread.new do puts VAL.value #=> 2 VAL.value = 3 puts VAL.value #=> 3 end.join puts VAL.value #=> 2 ``` I feel like 1. is more aligned with how ruby has dealt with the problem space, but the backwards compatibility issues may be a problem, so 2 is probably more feasible, despite its Java'iness. -- https://bugs.ruby-lang.org/
participants (3)
-
chucke (Tiago Cardoso) -
jhawthorn (John Hawthorn) -
nobu (Nobuyoshi Nakada)