[ruby-core:125192] [Ruby Feature#21982] Add `Decimal` as a core numeric class

4 Apr 2026

      Issue #21982 has been reported by shan (Shannon Skipper).

----------------------------------------
Feature #21982: Add `Decimal` as a core numeric class
https://bugs.ruby-lang.org/issues/21982

* Author: shan (Shannon Skipper)
* Status: Open
----------------------------------------
# Feature: Add `Decimal` as a core numeric class

## Abstract

Add `Decimal < Numeric` to Ruby core: exact base-10 arithmetic using a tagged immediate VALUE (like Fixnum) for small values, promoting to a 128-bit heap object for larger values.

## Background

Ruby apps that handle money, tax rates or measurements often use Integers with extra business logic since Float can't represent most base-10 fractions exactly:

```ruby
0.1 + 0.2 == 0.3     #=> false
0.1 + 0.2            #=> 0.30000000000000004

0.1d + 0.2d == 0.3d  #=> true
0.1d + 0.2d          #=> 0.3d
```

Alternatives have tradeoffs:

- **BigDecimal**: correct but 8x slower than Float on compound interest.
- **Rational**: correct but equally slow, and `Rational("19.99").to_s` gives `"1999/100"`.
- **Integer cents**: correct and fast but pushes formatting and decimal-point tracking into application code.

## Proposal

```ruby
# Literal syntax
price = 19.99d
tax_rate = 0.0875d
total = (price * (1d + tax_rate)).round(2)   #=> 21.74d

# Kernel converter (like Integer(), Float())
Decimal("29.99")    #=> 29.99d
Decimal(42)         #=> 42.0d

# Value semantics: frozen, Ractor-shareable
19.99d.frozen?      #=> true

# Full numeric protocol
19.99d + 1          #=> 20.99d
19.99d <=> 20.0d    #=> -1
19.99d.round(1)     #=> 20.0d

# Human-focused string interpolation
"$#{19.99d}"                         #=> "$19.99"
"$#{BigDecimal("19.99").to_s("F")}"  #=> "$19.99"
```

### Features

- **`d` literal suffix**: `42d`, `3.14d`, `0.1d` (matching `r` for Rational, `i` for Complex)
- **Frozen and Ractor-shareable**: value semantics like Rational
- **18 decimal places**: fixed precision, full signed 128-bit range
- **`Kernel#Decimal()` converter**: with `exception: false` support
- **Full numeric protocol**: arithmetic, comparison, coercion, rounding, `to_i`/`to_f`/`to_r`/`to_s`, pattern matching

## Performance

Apple M4 with YJIT. All values pre-allocated outside the measurement loop.

### Compound interest: 360 monthly iterations

`balance = (balance * (1 + rate)).round(2)` repeated 360 times. A tight loop of multiply, add, round. Both Decimal and Float produce $60,225.61.

| Type | YJIT | No JIT |
|------|------|--------|
| Decimal (BID) | 93K i/s | 55K i/s |
| Float | 83K i/s | 60K i/s |
| Rational | 10.7K i/s | 9.4K i/s |
| BigDecimal | 10.1K i/s | 9.3K i/s |

With YJIT, Decimal is 1.12x faster than Float on compound interest. Without YJIT, Float is 1.1x faster. YJIT helps Decimal more (1.7x speedup vs Float's 1.4x) because BOPs and unchecked entry points skip the per-call type checks. Rational and BigDecimal are ~9x slower than Decimal either way.

### Per-operation (benchmark-driver, YJIT)

| Operation | Decimal (BID) | Float | Ratio |
|-----------|---------------|-------|-------|
| add | 147M i/s | 160M i/s | 1.09x slower |
| mul | 159M i/s | 158M i/s | ~parity |
| round(2) | 118M i/s | 78M i/s | 1.5x faster |
| div (inexact) | 49M i/s | 140M i/s | 2.9x slower |
| parse | 34M i/s | 34M i/s | parity |
| to_s | 32M i/s | 9M i/s | 3.4x faster |
| sum(1000) | 1.27M i/s | 794K i/s | 1.6x faster |

Add and mul are near Float parity. Round is 1.5x faster, to_s 3.4x faster. Division is 2.9x slower (inexact results need wide arithmetic).

## Design

Two-tier storage, mirroring Fixnum/Bignum:

```
Significand <= 2^51 - 1: 8 bytes, no allocation

  63    62                    12 11  8 7     0
  +---+------------------------+-----+-------+
  | 0 |         1999           |  2  | 0x84  |
  +---+------------------------+-----+-------+
  sign    significand (51 bits)  scale   tag

  64 bits encode sign, significand, decimal position and type tag.
  The value IS the VALUE, like Fixnum. All 15-digit significands fit.
  Some 16-digit significands fit (up to 2,251,799,813,685,247).

Significand > 2^51 - 1: heap allocated

  +--------+     +----------------+----------------+--------------------------------+
  |  ptr   | --> | flags + klass                   | value * 10**18                 |
  +--------+     |         16 bytes                |         16 bytes               |
   VALUE         +----------------+----------------+--------------------------------+
   8 bytes            object header                 full i128 range, 18 decimal places

Standard Ruby object header with embedded i128 payload.
```

`Decimal("12.34")` is an immediate. No object, no allocation, no GC.
`Decimal("9_999_999_999_999_999.99")` promotes to heap (significand exceeds 51 bits).
`Decimal("123_456_789_012_345_678_901_234_567_890_123_456.78")` raises RangeError (exceeds 128-bit range).

### Optimization layers

The prototype implements analogous layers to Float and Integer:

- 13 BOPs with `DECIMAL_REDEFINED_OP_FLAG`
- Interpreter fast paths in `vm_opt_plus/minus/mult/div/mod`, `vm_opt_lt/le/gt/ge`, `opt_equality_specialized`
- YJIT `Type::Decimal` with inline BID add/sub and BOP guard paths
- ZJIT `types::Decimal` with profiler support and method annotations
- Unchecked `_dd` entry points for YJIT and interpreter
- Reciprocal lookup tables for division-free scale reduction

### Heap arithmetic

Heap multiply and divide use 256-bit widening (same algorithm as Roc). Optional fast paths exploit the fact that SCALE (10^18) fits in u64: schoolbook two-division `wide_div`, single-operand `wide_mul_64` and Barrett reduction for the u128 case. These improve heap multiply by ~25% and heap division by ~50%. All are removable without affecting correctness or BID performance.

## Type coercion

When Decimal interacts with other numeric types:

```ruby
1.5d + 1      #=> 2.5d   (Integer promotes to Decimal)
1.5d + 0.5    #=> 2.0    (Decimal demotes to Float)
1.5d + 1/4r   #=> 1.75d  (Rational promotes to Decimal)
1.5d + 1/3r   # ArgumentError (1/3 exceeds 18 decimal places)
1.5d == 1.5   #=> true   (compared via Rational)
1.5d == 3/2r  #=> true   (Rational comparison via <=>)
```

Decimal + Integer returns Decimal (lossless). Decimal + Float returns Float (caller chose approximate arithmetic). Decimal + Rational returns Decimal when the Rational is exactly representable in 18 decimal places, raises ArgumentError otherwise. Conversion is exact. Only arithmetic results (`*`, `/`) truncate.

## Relationship to BigDecimal

Decimal and BigDecimal serve different needs:

- **Decimal**: fixed precision (18 places), core type, immediate encoding, JIT-optimized. For the common case: prices, percentages, measurements.
- **BigDecimal**: arbitrary precision, bundled gem, heap-allocated. For when you need more than 18 decimal places or unbounded digit counts.

They can coexist. The Decimal conversion method is `to_dec` to avoid conflict with `bigdecimal/util`, which defines `to_d`. If `to_d` can be shared or BigDecimal's deprecated, `to_d` would be more natural.

## Why two Decimal tiers

Intel's BID64 gives a 64-bit immediate with 16 digits. Roc's Dec gives a 128-bit fixed-point value with 39 digits. Both are proven designs.

Ruby's approach combines them, the same way Integer combines Fixnum and Bignum. Small values are immediates, large ones promote to heap. Transparent to the programmer.

Two simpler alternatives are also viable:

- **BID-only**: 15-16 digits (51-bit significand), zero allocation. Operations exceeding the BID range would raise. Half the code.
- **i128-only**: 39 digits, one allocation per decimal. No dual paths. Simpler but slower with GC churn.

## Design details

- **Fixed 18 decimal places**: 10^18 fits in a 64-bit integer, keeping the SCALE factor cheap for multiplication and division. 18 places cover all ISO 4217 currency subdivisions.
- **Truncation toward zero for `*` and `/`**: consistent with C integer division. Floored division for `%`, `div`, `divmod` (matching Ruby's Integer).
- **Exact input conversion**: `Decimal("1e-19")` raises `ArgumentError` because the value cannot be represented in 18 decimal places. `Decimal("1e-19", exception: false)` returns `nil`. Trailing zeros beyond 18 places are accepted: `Decimal("1.10000000000000000000")` is `1.1d`. Arithmetic truncation is separate and expected.
- **Float conversion via `Float#to_s` then parse**: `Decimal(0.1)` gives `0.1d`, not `0.1000000000000000055...d`.
- **`0d` is a Decimal literal**: `0d` produces `Decimal(0)`. `0d42` remains `Integer(42)` (the existing decimal-integer prefix). `0D42` also remains `Integer(42)` (only lowercase `d` produces Decimal).
- **Frozen and Ractor-shareable**: like Rational. No mutable state.

## Portability

The prototype requires `__int128` (GCC and Clang). For the heap variant, MSVC would need a two-word i128 emulation or a pure-C fallback using `int64_t hi, lo` fields along with appropriate operations. The BID immediate tier (64-bit only) works everywhere.

## Scope

Implementation (`decimal.c`, `decimal.rb`), VM fast paths, YJIT and ZJIT type tracking and codegen, serialization, Kernel converters and `prism_compile.c`.

The `d` literal suffix requires a small Prism upstream change (~60 lines in `prism.c` plus regenerated sources). Psych would need a separate patch for YAML serialization. Both would be submitted as upstream PRs if this proposal is accepted.

## Gem

A gem version provides the same semantics as a C extension with pure Ruby fallback. It gets 14.2K i/s on compound interest with YJIT, versus core Decimal's 93K (6.5x slower). A gem cannot add VALUE tag bytes, register BOPs or teach YJIT new types, so it must heap-allocate every result and go through full method dispatch.

## Related work

| Language | Type | Encoding | Precision | Normalized |
|----------|------|----------|-----------|------------|
| Intel libbid | BID64 | 1+13+50 combination field | 16 digits | no (cohorts) |
| Intel libbid | BID128 | 1+17+110 floating-point | 34 digits | no (cohorts) |
| Roc | Dec | i128 fixed-point (* 10^18) | 39 digits | n/a (fixed scale) |
| C# | System.Decimal | 96-bit sig + 5-bit scale | 28-29 digits | no |
| Ruby | Decimal (this) | 51-bit immediate + i128 heap | 15-16 digits (BID), full i128 (heap) | yes (canonical) |

**Immediate tier vs Intel BID64**: Intel's combination-field encoding gets 16 digits from 64 bits (vs our 15-16) by implicitly encoding the leading significand digit. The cost is decoder complexity and unnormalized cohorts. `1.0` and `1.00` have different bit patterns, requiring rescaling for equality. Our BID encoding strips trailing zeros for a canonical form: equal immediates are always identical bit patterns, so equality is a single-word comparison. Heap decimals use i128 value comparison.

**Heap tier vs Roc Dec**: nearly identical design. Same i128 scaled by 10^18, same 256-bit widening for multiply and divide. Our additions: reciprocal lookup tables for division-free scale reduction, Barrett reduction for the SCALE division and promotion to the immediate tier when results fit.

**Both tiers vs C# System.Decimal**: C# uses a 96-bit significand with variable scale 0-28 in a single 128-bit value type. More precision (28-29 digits) than our immediate but no fast path. All arithmetic operates on three 32-bit words. Not normalized. Ruby doesn't have value types, so C#'s stack-allocation advantage doesn't apply. The tagged immediate achieves the same effect.

Like Integer, the two tiers are invisible in Rubyland. A Decimal is a Decimal.

https://github.com/ruby/ruby/pull/16659

-- 
https://bugs.ruby-lang.org/