New subject: [ruby-core:114425] [Ruby master Feature#19842] Introduce M:N threads

21 Aug 2023

Issue #19842 has been reported by ko1 (Koichi Sasada).

----------------------------------------
Feature #19842: Intorduce M:N threads
https://bugs.ruby-lang.org/issues/19842

* Author: ko1 (Koichi Sasada)
* Status: Open
* Priority: Normal
* Assignee: ko1 (Koichi Sasada)
----------------------------------------
This ticket proposes to introduce M:N threads to improve Threads/Ractors performance.

## Background

Ruby threads (RT in short) are implemented from old Ruby versions and they have the
following features:

* Can be created with simple notation `Thread.new{}`
* Can be switched to another ready Ruby thread by:
  * Time-slice.
  * I/O blocking.
  * Synchronization such as Mutex features.
  * And other blocking reasons.
* Can be interruptible by:
  * OS-deliver signals (only for the main thread).
  * `Thread#kill`.
* Can be terminated by:
  * the end of each Ruby thread.
  * the end of the main thread (and other Ruby threads are killed).

Ruby 1.8 and erlier versions uses M:1 threads (green threads, user level threads, .... the
word 1:N threads is more popular but to make this explanation consistent I use
"M:1" term here) which manages multiple Ruby threads on 1 native thread.

(Native threads are provided by C interfaces such as Pthreads. In many cases, native
threads are OS threads, but there are also user-level implementations, such as user-level
pthread libraries in theory. Therefore, they are referred to as native threads in this
article and NT in short)

If a Ruby thread T1 blocked because of a I/O operation, Ruby interpreter switches to the
next ready Ruby thread T2. The I/O operation will be monitors by a `select()` (or similar)
functionality and if the I/O is ready, T1 is marked as a ready thread and T1 will be
resumed soon. However, when a Ruby thread issues some other blocking operations such as
`gethostbyname()`, Ruby interpreter can not swtich to any other Ruby thread while
`gethostbyname()` is not finished.

We named two types blocking operations:

* Managed blocking operations
  * I/O (most of read/write)
    * manage by I/O multiplexing API (select, poll, epoll, kqueue, IOCP, io_uring, ...)
  * Sleeping
  * Synchronization (Mutex, Queue, ...)
* Unmanaged operations
  * All other blocking operations not listed above, written in C
    * Huge number calculation like `Bignum#*`
    * DNS lookup
    * I/O (can not detect block-able or not by multiplexing API)
      * open on FIFO, close on NFS, ...
    * flock and other locking mechanism
    * library call which uses blocking operations
      * `libfoo` has `foo_func()` and `foo_func()` waits DNS lookup. A Ruby extension
`foo-ruby` can call `foo_func()`.

With these terms we can say that M:1 threads can suport managed blocking operations but
can not support unmanaged operations (can not make progress other Ruby threads) without
further tricks.

Note that if the `select()`-like system calls say a `fd` is ready, but the I/O opeartion
for `fd` can be blocked because of some contention (read by another thread or process, for
example).

M:1 threads has another disadvantage that it can not run in parallel because only a native
thread is used.

From Ruby 1.9 we had implemented 1:1 thread which means a Ruby thread has a corresponding
native thread. To make implementation easy we also introduced a GVL. Only a Ruby thread
acquires GVL can run. With 1:1 model, we can support managed blocking oprations and
unmanaged blocking operations by releasing GVL. When a Ruby thread want to issue a
blocking operation, the Ruby thread releases GVL and another ready Ruby threads continue
to run. We don't care the blocking operation is managed or unmanaged.

(We can not make some of unmanaged blocking operations interruptible (stop by Ctrl-C for
example)).

Advantages of 1:1 threads to the M:1 threads is:

* Easy to handle blocking operations by releasing GVL.
* We can utilize parallelism with multiple native threads by releasing GVL.

Disadvantages of 1:1 threads to the M:1 threads is:

* Overhead to make many native threads for many Ruby threads
  * We can not make huge number of Ruby threads and Ractors on 1:1 threads.
* Thread switching overhead by GVL because inter-core communication is needed.

From Ruby 3.0 we introduced fiber scheduler mechanism to maintain multiple fibers 

Differences between Ruby 1.8 M:1 threads are:

* No timeslice (only switch fibers by managed blocking operations)
* Ruby users can make own schedulers for apps with favorite underlying mechanism

Disadvantages are similar to M:1 threads. Another disadvantages is we need to consider
about Fiber's behavior.

From Ruby 3.0 we also introduced Ractors. Ractors can run in parallel because of
separating most of objects. 1 Ractor creates 1 Ruby thread, so Ractors has same
disadvantages of 1:1 threads. For example, we can not make huge number of Ractors.

## Goal

Our goal is making lightweight Ractors on lightweight Ruby threads. To enable this goal we
propose to implement M:N threads on MRI.

M:N threads manages M Ruby threads on N native threads, with limited N (~= CPU core
numbers for example).

Advantages of M:N threads are:

1. We can run N ractors on N native threads simultaneously if the machine has N cores.
2. We can make huge number of Ruby threads or Ractors because we don't need huge
number of native threads
3. We can support unmanaged blocking operations by locking a native thread to a Ruby
thread which issues an unmanaged blocking operation.
4. We can make our own Ruby threads or Ractors scheduler instead of the native thread (OS)
scheduler.

Disadvantages of M:N threads are:

1. It is complex implmentation and it can be hard.
2. It can introduce incompatibility especaially on TLS (Thread local storage).
3. We need to maitain our own scheduler.

Without using multiple Ractors, it is similar to Ruby 1.8 M:1 threads. The difference with
M:1 threads are locking NT mechanism to support unmanaged blocking operations. Another
advantage is that it is easy to fallback to 1:1 threads by locking all of corresponding
native threads to Ruby threads.

## Proposed design

### User facing changes

If a program only has a main Ractor (i.e., most Ruby programs), the user will not face any
changes by default.
On main Ractor, all threads are 1:1 threads by default and there is no compatibility
issue.

`RUBY_MN_THREADS=1` envrionment variable is given, main Ractor enables M:N threads.
Note that the main thread locks NT by default because the initial NT is special in some
case. I'm not sure we can relax this limitation.

On the multiple Ractors, N (+ alpha) native threads run M ractors. Now there is no way to
disable M:N threads on multiple Ractors because there are only a few multi-Ractor programs
and no compatibility issues.

Maximum number of N can be specified by `RUBY_MAX_PROC=N`. 8 by default but this value
should be specified with the number of CPU processors (cores).

### TLS issue

On M:N threads a Ruby thread (RT1) migrates from a native thread (NT1) to NT2, ... so that
TLS on native code can be a problem.
For example, RT1 calls a library function `foo()` and it set TLS1 on NT1. After migrating
RT1 to NT2, RT1 calls `foo()` again but there is no TLS1 record because TLS1 is recorded
only on NT1.

On this case, RT1 should be run on NT1 while using native library foo. To avoid such
prbolem, we need the following features:

* 1:1 threads on main Ractor by default
* functionality to lock the NT for RT, maybe `Thread#lock_native_thread` and
`Thread#unlock_native_thread` API is needed. For example, Go language has
`runtime.LockOSThread()` and `runtime.UnlockOSThread()` for this purpose.
* Or C-API only for this purpose? (not fixed yet)

Thankfully, the same problem can occur with Fiber scheduler (and of course Ruby 1.8 M:1
threads), but I have not heard of it being much of a problem, so I expect that TLS will
not be much of an issue.

### Unmanaged blocking operations

From Ruby 1.9 (1:1 threads), the `nogvl(func)` API is used for most blocking operations to
keep the threading system healthy. In other words, `nogvl(func)` represents that the given
function is blocking operation. To support unmanaged blocking operations, we lock a native
thread for the Ruby thread which issues blocking operation.

If the blocking operations doesn't finish soon, other Ruby threads can not run because
a RT locks NT. In this case, another system monitoring thread named "Timer
thread" (historical name and TT in short) creates another NT to run ready other Ruby
threads.

This TT's behavior is the same as the behavior of "sysmon" in the Go
language.

We named locked NT as dedicated native threads (DNT) and other NT as shared native threads
(SNT). The upper bound by `RUBY_MAX_PROC` affects the number of SNT. In other words, the
number of DNT is not limited (it is same that the number of NT on 1:1 threads are not
limited).

### Managed blocking operations

Managed blocking operations are multiplexing by `select()`-like functions on the Timer
thread.. Now only `epoll()` is supported.

I/O operation flow (read on fd1) on Ruby thread RT1:

1. check the ready-ness of fd1 by `poll(timeout = 0)`, goto step 4.
2. register fd1 to Timer thread (TT) epoll and resume another ready Ruby thread.
3. If TT detects that the fd1 is ready, make RT1 as ready thread.
4. When RT1 is resumed, then do `read()` by locking corresponding NT1.

`sleep(n)` operation flow on Ruby thread RT1:

1. register timeout of RT1 to TT epoll.
2. If TT detects the timeout of RT1 (n seconds), TT makes RT1 as a ready Ruby thread.

### Internal design

* 2 level scheduling
  * Ruby threads of a Ractor is managed by 1:N threads
  * Ruby threads of different Ractors are managed by M:N threads
* Timer thread has several duties
  1. Monitoring I/O (or other event) ready-ness
  2. Monitoring timeout
  3. Produce timeslice signals
  4. Help OS signal delivering

(On pthread environment) recent Ruby doesn't make timer thread but MaNy implementation
makes TT anytime. it can be improved.

## Implementation

The code name is MaNy project, it is from MN threads.

https://github.com/ko1/ruby/tree/many2

The implementation is not matured (debugging now).

## Measurements

See RubyKaigi 2023 slides: https://atdot.net/~ko1/activities/2023_rubykaigi2023.pdf

## Discussion

* Enable/disable
  * default behavior
  * how to switch the behavior
* Should we lock the NT for main thread anytime?
* Ruby/C API to lock the native threads

## Misc

This description will be improved more later.

-- 
https://bugs.ruby-lang.org/

[ruby-core:114422] [Ruby master Feature#19842] Intorduce M:N threads