[ruby-core:122031] [Ruby Feature#21311] Namespace on read (revised)

13 May 2025

      Issue #21311 has been updated by jhawthorn (John Hawthorn).

Sorry, I didn't get a chance to review this before it was merged. I really don't think adding this level of indirection to RCLASS_EXT access and similar is a good idea.

Let's look at `rb_class_get_superclass` as an example. This should be approximately the same code inserted every time `RCLASS_SUPER` is used. Here's how it looks before namespaces were merged:

```
❯ llvm-objdump -d --symbolize-operands --x86-asm-syntax=intel --no-show-raw-insn --no-leading-addr --disassemble-symbols=rb_class_get_superclass -S ./miniruby

./miniruby:     file format elf64-x86-64

Disassembly of section .text:

<rb_class_get_superclass>:
;     return RCLASS(klass)->super;
                mov     rax, qword ptr [rdi + 0x10]
; }
                ret
```

This is how something this common in the VM should ideally look. Just one instruction (a couple would be fine) to read the data we need and returning it. No branching, no locking, easily inlineable or taught to the JIT compiler.

Here's how `rb_class_get_superclass` looks now:

```
❯ llvm-objdump -d --symbolize-operands --x86-asm-syntax=intel --no-show-raw-insn --no-leading-addr --disassemble-symbols=rb_class_get_superclass -S ./miniruby

./miniruby:     file format elf64-x86-64

Disassembly of section .text:

<rb_class_get_superclass>:
; {
                push    rbx
                sub     rsp, 0x10
                mov     rbx, qword ptr fs:[0x28]
                mov     qword ptr [rsp + 0x8], rbx
                mov     rbx, rdi
;     if (RCLASS_PRIME_CLASSEXT_READABLE_P(obj)) {
                cmp     qword ptr [rdi + 0x10], 0x0
                je       <L0>
;     ns = rb_current_namespace();
                call     <rb_current_namespace>
;     if (!ns || NAMESPACE_BUILTIN_P(ns)) {
                test    rax, rax
                je       <L0>
                cmp     byte ptr [rax + 0x78], 0x0
                je       <L1>
<L0>:
;     return RCLASS_EXT_PRIME(obj);
                lea     rax, [rbx + 0x18]
<L3>:
;     return RCLASS_SUPER(klass);
                mov     rax, qword ptr [rax + 0x8]
; }
                mov     rdx, qword ptr [rsp + 0x8]
                sub     rdx, qword ptr fs:[0x28]
                jne      <L2>
                add     rsp, 0x10
                pop     rbx
                ret
                nop     dword ptr [rax + rax]
<L1>:
;     st_table *classext_tbl = RCLASS_CLASSEXT_TBL(obj);
                mov     rdi, qword ptr [rbx + 0x10]
;     if (classext_tbl) {
                test    rdi, rdi
                je       <L0>
;         if (rb_st_lookup(classext_tbl, (st_data_t)ns->ns_object, &classext_ptr)) {
                mov     rsi, qword ptr [rax]
                mov     rdx, rsp
                call     <rb_st_lookup>
                test    eax, eax
                je       <L0>
;             return (rb_classext_t *)classext_ptr;
                mov     rax, qword ptr [rsp]
;     if (ext)
                test    rax, rax
                jne      <L3>
                jmp      <L0>
<L2>:
; }
                call     <__stack_chk_fail@plt>
```

It's tempting to assume that because we're not taking the branches with the more expensive operation there isn't a lot of cost, however because we might be calling all of these other methods, the C compiler has to be pessimistic and generate code assuming the method calls were taken each time. So not only is our one instruction now 14 instructions with two conditional branches, any of the hundreds of locations this is inlined, the surrounding code the C compiler generates will be worse.

I don't share the optimism that this can be made efficient with more work. In fact I expect this to become even slower as bugs are fixed. For example even this huge disassembly is not correct because it [does not hold the VM lock where it should](https://github.com/ruby/ruby/pull/13226#discussion_r2085329412) and will almost certainly segfault under Ractors.

I really think we need an implementation where T_CLASS again refers to a unique implementation (rather than needing to look up the implementation for the current namespace), and RCLASS_EXT is once against just pointer arithmetic. Otherwise I don't see possible how with this implementation it's possible to have an efficient JIT compiler (unless it disables itself when there are namespaces), an efficient implementation of Ractors (we can't write any lock-free access to class-related contents, because classes cannot be accessed without a lock and hash lookup), or even a fast interpreter (the 5% slowdown we already see here).

----------------------------------------
Feature #21311: Namespace on read (revised)
https://bugs.ruby-lang.org/issues/21311#change-113181

* Author: tagomoris (Satoshi Tagomori)
* Status: Assigned
* Assignee: tagomoris (Satoshi Tagomori)
* Target version: 3.5
----------------------------------------
This replaces #19744 

## Concept

This proposes a new feature to define virtual top-level namespaces in Ruby. Those namespaces can require/load libraries (either .rb or native extension) separately from other namespaces. Dependencies of required/loaded libraries are also required/loaded in the namespace.

This feature will be disabled by default at first, and will be enabled by an env variable `RUBY_NAMESPACE=1` as an experimental feature.
(It could be enabled by default in the future possibly.)

### "on read" approach

The "on write" approach here is the design to define namespaces on the loaded side. For example, Java packages are defined in the .java files and it is required to separate namespaces from each other. It can be implemented very easily, but it requires all libraries to be updated with the package declaration. (In my opinion, it's almost impossible in the Ruby ecosystem.)

The "on read" approach is to create namespaces and then require/load applications and libraries in them. Programmers can control namespace separation at the "read" time. So, we can introduce the namespace separation incrementally.

## Motivation

The "namespace on read" can solve the 2 problems below, and can make a path to solve another problem:

* Avoiding name conflicts between libraries
  * Applications can require two different libraries safely which use the same module name.
* Avoiding unexpected globally shared modules/objects
  * Applications can make an independent/unshared module instance.
* Multiple versions of gems can be required
  * Application developers will have fewer version conflicts between gem dependencies if rubygems/bundler will support the namespace on read. (Support from RubyGems/Bundler and/or other packaging systems will be needed)

For the motivation details, see [Feature #19744].

## How we can use Namespace

```ruby
# app1.rb
PORT = 2048
class App
  def self.port = ::PORT
  def val = PORT.to_s
end

p App.port # 2048

# app2.rb
class Number
  def double = self * 2
end

PORT = 2048.double
class App
  def self.port = ::PORT
  def val = PORT.double.to_s
end

p App.port # 4096

# main.rb - executed as `ruby main.rb`
ns1 = Namespace.new
ns1.require('./app1') # 2048
ns2 = Namespace.new
ns2.require('./app2') # 4096

PORT = 8080
class App
  def self.port = ::PORT
  def val = PORT.to_s
end

p App.port # 8080
p App.new.val # "8080"

p ns1::App.port # 2048
p ns1::App.new.val # "2048"

p ns2::App.port # 4096
p ns2::App.new.val # "8192"

1.double # NoMethodError
```

## Namespace specification

### Types of namespaces

There are two namespace types, "root" and "user" namespace. "Root" namespace exists solely in a Ruby process, and "user" namespaces can be created as many as Ruby programmers want.

### Root namespace

Root namespace is a unique namespace to be defined when a Ruby process starts. It only contains built-in classes/modules/constants, which are available without any `require` calls, including RubyGems itself (when `--disable-gems` is not specified).

At here, "builtin" classes/modules are classes/modules accessible when users' script evaluation starts, without any require/load calls.

### User namespace

User namespace is a namespace to run users' Ruby scripts. The "main" namespace is the namespace to run the user's `.rb` script specified by the `ruby` command-line argument. Other user namespaces ("optional" namespaces) can be created by `Namespace.new` call.

In user namespace (both main and optional namespaces), built-in class/module definitions are copied from the root namespace, and other new classes/modules are defined in the namespace, separately from other (root/user) namespaces.
The newly defined classes/modules are top-level classes/modules in the main namespace like `App`, but in optional namespaces, classes/modules are defined under the namespace (subclass of Module), like `ns::App`.

In that namespace `ns`, `ns::App` is accessible as `App` (or `::App`). There is no way to access `App` in the main namespace from the code in the different namespace `ns`.

### Constants, class variables and global variables

Constants, Class variables of built-in classes and global variables are also separated by namespace. Values set to class/global variables in a namespace are invisible in other namespaces.

### Methods and procs

Methods defined in a namespace run with the defined namespace, even when called from other namespaces.
Procs created in a namespace run with the defined namespace too.

### Dynamic link libraries

Dynamic link libraries (typically .so files) are also loaded in namespaces as well as .rb files.

### Open class (Changes on built-in classes)

In user namespaces, built-in class definitions can be modified. But those operations are processed as copy-on-write of class definition from the root namespace, and the changed definitions are visible only in the (user) namespace.

Definitions in the root namespace are not modifiable from other namespaces. Methods defined in the root namespace run only with root-namespace definitions.

## Enabling Namespace

Specify `RUBY_NAMESPACE=1` environment variable when starting Ruby processes. `1` is the only valid value here.

Namespace feature can be enabled only when Ruby processes start. Setting `RUBY_NAMESPACE=1` after starting Ruby scripts performs nothing.

## Pull-request

https://github.com/ruby/ruby/pull/13226

-- 
https://bugs.ruby-lang.org/