Issue #19842 has been updated by ioquatix (Samuel Williams).
This is an interesting proposal, thanks for writing it up. I have some thoughts and
questions, in no particular order.
Can we use MaNy without `Ractor`? Last time I tried `Ractor`, I ran into significant
problems. So, if this depends on creating `Ractor`s for application code, I'm
concerned it may be difficult to use in practice. When I first started working on fiber
scheduling about 6 years ago, multi-process was the only logical way to achieve true
parallelism. So, the fiber scheduler always felt like the simplest model was M processes :
N fibers. The main advantage of `Ractor` is memory usage in this case. But what is the
advantage of MaNy, since we already have the fiber scheduler?
When `Ractor` was merged, due to the TLS of Ruby state, there was a performance hit in
some cases, especially to Fiber context switching. This was quickly fixed, and I hope that
MaNy does not introduce other performance regressions to existing code. It sounds like
you've thought about compatibility w.r.t. making it a feature of `Ractor`. However, in
my experience, there are other significant blockers to high performance concurrency,
namely, garbage collection. Are you confident you can introduce this feature without
performance regressions? Are there any areas where we can improve performance?
Why not use the fiber scheduler interface for "managed blocking" operations?
This would bring you several mature schedulers without essentially replicating all the
work done adding hooks in the right places. In your managed blocking operations, there are
actually a lot of operations which the fiber scheduler handles in addition to your list:
`Process#wait`, `Addrinfo.getaddrinfo` (and all related methods which do name resolution).
Also, it's worth noting that `io_uring` provides asynchronous open/read/write, so
things like "I/O (can not detect block-able or not by multiplexing API), open on
FIFO, close on NFS, ..." are not problem going forward. Regarding scheduling threads,
the `io-event` gem takes a "fiber" argument, but in fact, you can pass any
object that implements `#transfer` for re-scheduling.
Regarding "flock and other locking mechanism" and similar (e.g. `fallocate`) - I
think going forward, we will see many of these operations supported by `io_uring`.
However, it can be tricky in practice - an uncontested operation can proceed faster than
an asynchronous call, so it's often better to do something like: `read -> EAGAIN
-> io_uring` than directly schedule the read into the `io_uring`. I think we can
continue to expand the lexicon of "managed blocking" operations as need arises.
I've been referring to this as progressive concurrency - depending on the platform and
supported features, more or less operations may be executed concurrently - but it
doesn't affect user code.
I'm also a little concerned about the messaging/public image. Threads are primarily
considered a tool for parallelism. But this proposal introduces significant changes to how
they work. When Matz talked about Threads in the past, he was not positive: "I regret
introducing Thread". TruffleRuby and JRuby have both implemented threads that run
with true parallelism. It also looks like there was a proposal for Python regarding
removing the "GVL" and allowing free threading. Since the Fiber Scheduler
already provides a similar "green threading" implementation, and is used in
production today, what are the main advantages of MaNy?
More specifically, do you want to encourage people to write e.g.
`Thread.new{Net::HTTP.get(...)}.join`? How will we deal with concurrency vs parallel
execution and thread safety? The fiber scheduler deliberately chooses to deal with
**concurrency** to avoid issues relating to **parallelism**. e.g. `Async{}` was designed
to be safe while allowing practical levels of concurrency. I do appreciate there is an
overlap between concurrency problems and parallel problems, but parallel execution is far
more tricky in real world programs. I don't think we can say `Thread` is safe, in
general, especially if you consider all major implementations of Ruby. I have personally
experimented with JRuby and TruffleRuby and at least at the time, neither of them were
totally thread safe (it was possible to corrupt internal data). One of the reasons
`Ractor` is appealing is because it isolates parallel execution and presents a safe
interface. Can we do the same for `Thread`?
----------------------------------------
Feature #19842: Introduce M:N threads
https://bugs.ruby-lang.org/issues/19842#change-104393
* Author: ko1 (Koichi Sasada)
* Status: Open
* Priority: Normal
* Assignee: ko1 (Koichi Sasada)
----------------------------------------
This ticket proposes to introduce M:N threads to improve Threads/Ractors performance.
## Background
Ruby threads (RT in short) are implemented from old Ruby versions and they have the
following features:
* Can be created with simple notation `Thread.new{}`
* Can be switched to another ready Ruby thread by:
* Time-slice.
* I/O blocking.
* Synchronization such as Mutex features.
* And other blocking reasons.
* Can be interruptible by:
* OS-deliver signals (only for the main thread).
* `Thread#kill`.
* Can be terminated by:
* the end of each Ruby thread.
* the end of the main thread (and other Ruby threads are killed).
Ruby 1.8 and erlier versions uses M:1 threads (green threads, user level threads, .... the
word 1:N threads is more popular but to make this explanation consistent I use
"M:1" term here) which manages multiple Ruby threads on 1 native thread.
(Native threads are provided by C interfaces such as Pthreads. In many cases, native
threads are OS threads, but there are also user-level implementations, such as user-level
pthread libraries in theory. Therefore, they are referred to as native threads in this
article and NT in short)
If a Ruby thread RT1 blocked because of a I/O operation, Ruby interpreter switches to the
next ready Ruby thread RT2. The I/O operation will be monitors by a `select()` (or
similar) functionality and if the I/O is ready, RT1 is marked as a ready thread and RT1
will be resumed soon. However, when a Ruby thread issues some other blocking operations
such as `gethostbyname()`, Ruby interpreter can not swtich to any other Ruby thread while
`gethostbyname()` is not finished.
We named two types blocking operations:
* Managed blocking operations
* I/O (most of read/write)
* manage by I/O multiplexing API (select, poll, epoll, kqueue, IOCP, io_uring, ...)
* Sleeping
* Synchronization (Mutex, Queue, ...)
* Unmanaged operations
* All other blocking operations not listed above, written in C
* Huge number calculation like `Bignum#*`
* DNS lookup
* I/O (can not detect block-able or not by multiplexing API)
* open on FIFO, close on NFS, ...
* flock and other locking mechanism
* library call which uses blocking operations
* `libfoo` has `foo_func()` and `foo_func()` waits DNS lookup. A Ruby extension
`foo-ruby` can call `foo_func()`.
With these terms we can say that M:1 threads can suport managed blocking operations but
can not support unmanaged operations (can not make progress other Ruby threads) without
further tricks.
Note that if the `select()`-like system calls say a `fd` is ready, but the I/O opeartion
for `fd` can be blocked because of some contention (read by another thread or process, for
example).
M:1 threads has another disadvantage that it can not run in parallel because only a native
thread is used.
From Ruby 1.9 we had implemented 1:1 thread which means a Ruby thread has a corresponding
native thread. To make implementation easy we also introduced a GVL. Only a Ruby thread
acquires GVL can run. With 1:1 model, we can support managed blocking oprations and
unmanaged blocking operations by releasing GVL. When a Ruby thread want to issue a
blocking operation, the Ruby thread releases GVL and another ready Ruby threads continue
to run. We don't care the blocking operation is managed or unmanaged.
(We can not make some of unmanaged blocking operations interruptible (stop by Ctrl-C for
example)).
Advantages of 1:1 threads to the M:1 threads is:
* Easy to handle blocking operations by releasing GVL.
* We can utilize parallelism with multiple native threads by releasing GVL.
Disadvantages of 1:1 threads to the M:1 threads is:
* Overhead to make many native threads for many Ruby threads
* We can not make huge number of Ruby threads and Ractors on 1:1 threads.
* Thread switching overhead by GVL because inter-core communication is needed.
From Ruby 3.0 we introduced fiber scheduler mechanism to maintain multiple fibers
Differences between Ruby 1.8 M:1 threads are:
* No timeslice (only switch fibers by managed blocking operations)
* Ruby users can make own schedulers for apps with favorite underlying mechanism
Disadvantages are similar to M:1 threads. Another disadvantages is we need to consider
about Fiber's behavior.
From Ruby 3.0 we also introduced Ractors. Ractors can run in parallel because of
separating most of objects. 1 Ractor creates 1 Ruby thread, so Ractors has same
disadvantages of 1:1 threads. For example, we can not make huge number of Ractors.
## Goal
Our goal is making lightweight Ractors on lightweight Ruby threads. To enable this goal we
propose to implement M:N threads on MRI.
M:N threads manages M Ruby threads on N native threads, with limited N (~= CPU core
numbers for example).
Advantages of M:N threads are:
1. We can run M ractors on N native threads simultaneously if the machine has N cores.
2. We can make huge number of Ruby threads or Ractors because we don't need huge
number of native threads
3. We can support unmanaged blocking operations by locking a native thread to a Ruby
thread which issues an unmanaged blocking operation.
4. We can make our own Ruby threads or Ractors scheduler instead of the native thread (OS)
scheduler.
Disadvantages of M:N threads are:
1. It is complicated implmentation and it can be hard.
2. It can introduce incompatibility especaially on TLS (Thread local storage).
3. We need to maitain our own scheduler.
Without using multiple Ractors, it is similar to Ruby 1.8 M:1 threads. The difference with
M:1 threads are locking NT mechanism to support unmanaged blocking operations. Another
advantage is that it is easy to fallback to 1:1 threads by locking all of corresponding
native threads to Ruby threads.
## Proposed design
### User facing changes
If a program only has a main Ractor (i.e., most Ruby programs), the user will not face any
changes by default.
On main Ractor, all threads are 1:1 threads by default and there is no compatibility
issue.
`RUBY_MN_THREADS=1` envrionment variable is given, main Ractor enables M:N threads.
Note that the main thread locks NT by default because the initial NT is special in some
case. I'm not sure we can relax this limitation.
On the multiple Ractors, N (+ alpha) native threads run M ractors. Now there is no way to
disable M:N threads on multiple Ractors because there are only a few multi-Ractor programs
and no compatibility issues.
Maximum number of N can be specified by `RUBY_MAX_PROC=N`. 8 by default but this value
should be specified with the number of CPU processors (cores).
### TLS issue
On M:N threads a Ruby thread (RT1) migrates from a native thread (NT1) to NT2, ... so that
TLS on native code can be a problem.
For example, RT1 calls a library function `foo()` and it set TLS1 on NT1. After migrating
RT1 to NT2, RT1 calls `foo()` again but there is no TLS1 record because TLS1 is recorded
only on NT1.
On this case, RT1 should be run on NT1 while using native library foo. To avoid such
prbolem, we need the following features:
* 1:1 threads on main Ractor by default
* functionality to lock the NT for RT, maybe `Thread#lock_native_thread` and
`Thread#unlock_native_thread` API is needed. For example, Go language has
`runtime.LockOSThread()` and `runtime.UnlockOSThread()` for this purpose.
* Or C-API only for this purpose? (not fixed yet)
Thankfully, the same problem can occur with Fiber scheduler (and of course Ruby 1.8 M:1
threads), but I have not heard of it being much of a problem, so I expect that TLS will
not be much of an issue.
### Unmanaged blocking operations
From Ruby 1.9 (1:1 threads), the `nogvl(func)` API is used for most blocking operations to
keep the threading system healthy. In other words, `nogvl(func)` represents that the given
function is blocking operation. To support unmanaged blocking operations, we lock a native
thread for the Ruby thread which issues blocking operation.
If the blocking operations doesn't finish soon, other Ruby threads can not run because
a RT locks NT. In this case, another system monitoring thread named "Timer
thread" (historical name and TT in short) creates another NT to run ready other Ruby
threads.
This TT's behavior is the same as the behavior of "sysmon" in the Go
language.
We named locked NT as dedicated native threads (DNT) and other NT as shared native threads
(SNT). The upper bound by `RUBY_MAX_PROC` affects the number of SNT. In other words, the
number of DNT is not limited (it is same that the number of NT on 1:1 threads are not
limited).
### Managed blocking operations
Managed blocking operations are multiplexing by `select()`-like functions on the Timer
thread.. Now only `epoll()` is supported.
I/O operation flow (read on fd1) on Ruby thread RT1:
1. check the ready-ness of fd1 by `poll(timeout = 0)`, goto step 4.
2. register fd1 to Timer thread (TT) epoll and resume another ready Ruby thread.
3. If TT detects that the fd1 is ready, make RT1 as ready thread.
4. When RT1 is resumed, then do `read()` by locking corresponding NT1.
`sleep(n)` operation flow on Ruby thread RT1:
1. register timeout of RT1 to TT epoll.
2. If TT detects the timeout of RT1 (n seconds), TT makes RT1 as a ready Ruby thread.
### Internal design
* 2 level scheduling
* Ruby threads of a Ractor is managed by M:1 threads
* Ruby threads of different Ractors are managed by M:N threads
* Timer thread has several duties
1. Monitoring I/O (or other event) ready-ness
2. Monitoring timeout
3. Produce timeslice signals
4. Help OS signal delivering
(On pthread environment) recent Ruby doesn't make timer thread but MaNy implementation
makes TT anytime. it can be improved.
## Implementation
The code name is MaNy project, it is from MN threads.
https://github.com/ko1/ruby/tree/many2
The implementation is not matured (debugging now).
## Measurements
See RubyKaigi 2023 slides:
https://atdot.net/~ko1/activities/2023_rubykaigi2023.pdf
## Discussion
* Enable/disable
* default behavior
* how to switch the behavior
* Should we lock the NT for main thread anytime?
* Ruby/C API to lock the native threads
## Misc
This description will be improved more later.
--
https://bugs.ruby-lang.org/