[ruby-core:125192] [Ruby Feature#21982] Add `Decimal` as a core numeric class
Issue #21982 has been reported by shan (Shannon Skipper). ---------------------------------------- Feature #21982: Add `Decimal` as a core numeric class https://bugs.ruby-lang.org/issues/21982 * Author: shan (Shannon Skipper) * Status: Open ---------------------------------------- # Feature: Add `Decimal` as a core numeric class ## Abstract Add `Decimal < Numeric` to Ruby core: exact base-10 arithmetic using a tagged immediate VALUE (like Fixnum) for small values, promoting to a 128-bit heap object for larger values. ## Background Ruby apps that handle money, tax rates or measurements often use Integers with extra business logic since Float can't represent most base-10 fractions exactly: ```ruby 0.1 + 0.2 == 0.3 #=> false 0.1 + 0.2 #=> 0.30000000000000004 0.1d + 0.2d == 0.3d #=> true 0.1d + 0.2d #=> 0.3d ``` Alternatives have tradeoffs: - **BigDecimal**: correct but 8x slower than Float on compound interest. - **Rational**: correct but equally slow, and `Rational("19.99").to_s` gives `"1999/100"`. - **Integer cents**: correct and fast but pushes formatting and decimal-point tracking into application code. ## Proposal ```ruby # Literal syntax price = 19.99d tax_rate = 0.0875d total = (price * (1d + tax_rate)).round(2) #=> 21.74d # Kernel converter (like Integer(), Float()) Decimal("29.99") #=> 29.99d Decimal(42) #=> 42.0d # Value semantics: frozen, Ractor-shareable 19.99d.frozen? #=> true # Full numeric protocol 19.99d + 1 #=> 20.99d 19.99d <=> 20.0d #=> -1 19.99d.round(1) #=> 20.0d # Human-focused string interpolation "$#{19.99d}" #=> "$19.99" "$#{BigDecimal("19.99").to_s("F")}" #=> "$19.99" ``` ### Features - **`d` literal suffix**: `42d`, `3.14d`, `0.1d` (matching `r` for Rational, `i` for Complex) - **Frozen and Ractor-shareable**: value semantics like Rational - **18 decimal places**: fixed precision, full signed 128-bit range - **`Kernel#Decimal()` converter**: with `exception: false` support - **Full numeric protocol**: arithmetic, comparison, coercion, rounding, `to_i`/`to_f`/`to_r`/`to_s`, pattern matching ## Performance Apple M4 with YJIT. All values pre-allocated outside the measurement loop. ### Compound interest: 360 monthly iterations `balance = (balance * (1 + rate)).round(2)` repeated 360 times. A tight loop of multiply, add, round. Both Decimal and Float produce $60,225.61. | Type | YJIT | No JIT | |------|------|--------| | Decimal (BID) | 93K i/s | 55K i/s | | Float | 83K i/s | 60K i/s | | Rational | 10.7K i/s | 9.4K i/s | | BigDecimal | 10.1K i/s | 9.3K i/s | With YJIT, Decimal is 1.12x faster than Float on compound interest. Without YJIT, Float is 1.1x faster. YJIT helps Decimal more (1.7x speedup vs Float's 1.4x) because BOPs and unchecked entry points skip the per-call type checks. Rational and BigDecimal are ~9x slower than Decimal either way. ### Per-operation (benchmark-driver, YJIT) | Operation | Decimal (BID) | Float | Ratio | |-----------|---------------|-------|-------| | add | 147M i/s | 160M i/s | 1.09x slower | | mul | 159M i/s | 158M i/s | ~parity | | round(2) | 118M i/s | 78M i/s | 1.5x faster | | div (inexact) | 49M i/s | 140M i/s | 2.9x slower | | parse | 34M i/s | 34M i/s | parity | | to_s | 32M i/s | 9M i/s | 3.4x faster | | sum(1000) | 1.27M i/s | 794K i/s | 1.6x faster | Add and mul are near Float parity. Round is 1.5x faster, to_s 3.4x faster. Division is 2.9x slower (inexact results need wide arithmetic). ## Design Two-tier storage, mirroring Fixnum/Bignum: ``` Significand <= 2^51 - 1: 8 bytes, no allocation 63 62 12 11 8 7 0 +---+------------------------+-----+-------+ | 0 | 1999 | 2 | 0x84 | +---+------------------------+-----+-------+ sign significand (51 bits) scale tag 64 bits encode sign, significand, decimal position and type tag. The value IS the VALUE, like Fixnum. All 15-digit significands fit. Some 16-digit significands fit (up to 2,251,799,813,685,247). Significand > 2^51 - 1: heap allocated +--------+ +----------------+----------------+--------------------------------+ | ptr | --> | flags + klass | value * 10**18 | +--------+ | 16 bytes | 16 bytes | VALUE +----------------+----------------+--------------------------------+ 8 bytes object header full i128 range, 18 decimal places Standard Ruby object header with embedded i128 payload. ``` `Decimal("12.34")` is an immediate. No object, no allocation, no GC. `Decimal("9_999_999_999_999_999.99")` promotes to heap (significand exceeds 51 bits). `Decimal("123_456_789_012_345_678_901_234_567_890_123_456.78")` raises RangeError (exceeds 128-bit range). ### Optimization layers The prototype implements analogous layers to Float and Integer: - 13 BOPs with `DECIMAL_REDEFINED_OP_FLAG` - Interpreter fast paths in `vm_opt_plus/minus/mult/div/mod`, `vm_opt_lt/le/gt/ge`, `opt_equality_specialized` - YJIT `Type::Decimal` with inline BID add/sub and BOP guard paths - ZJIT `types::Decimal` with profiler support and method annotations - Unchecked `_dd` entry points for YJIT and interpreter - Reciprocal lookup tables for division-free scale reduction ### Heap arithmetic Heap multiply and divide use 256-bit widening (same algorithm as Roc). Optional fast paths exploit the fact that SCALE (10^18) fits in u64: schoolbook two-division `wide_div`, single-operand `wide_mul_64` and Barrett reduction for the u128 case. These improve heap multiply by ~25% and heap division by ~50%. All are removable without affecting correctness or BID performance. ## Type coercion When Decimal interacts with other numeric types: ```ruby 1.5d + 1 #=> 2.5d (Integer promotes to Decimal) 1.5d + 0.5 #=> 2.0 (Decimal demotes to Float) 1.5d + 1/4r #=> 1.75d (Rational promotes to Decimal) 1.5d + 1/3r # ArgumentError (1/3 exceeds 18 decimal places) 1.5d == 1.5 #=> true (compared via Rational) 1.5d == 3/2r #=> true (Rational comparison via <=>) ``` Decimal + Integer returns Decimal (lossless). Decimal + Float returns Float (caller chose approximate arithmetic). Decimal + Rational returns Decimal when the Rational is exactly representable in 18 decimal places, raises ArgumentError otherwise. Conversion is exact. Only arithmetic results (`*`, `/`) truncate. ## Relationship to BigDecimal Decimal and BigDecimal serve different needs: - **Decimal**: fixed precision (18 places), core type, immediate encoding, JIT-optimized. For the common case: prices, percentages, measurements. - **BigDecimal**: arbitrary precision, bundled gem, heap-allocated. For when you need more than 18 decimal places or unbounded digit counts. They can coexist. The Decimal conversion method is `to_dec` to avoid conflict with `bigdecimal/util`, which defines `to_d`. If `to_d` can be shared or BigDecimal's deprecated, `to_d` would be more natural. ## Why two Decimal tiers Intel's BID64 gives a 64-bit immediate with 16 digits. Roc's Dec gives a 128-bit fixed-point value with 39 digits. Both are proven designs. Ruby's approach combines them, the same way Integer combines Fixnum and Bignum. Small values are immediates, large ones promote to heap. Transparent to the programmer. Two simpler alternatives are also viable: - **BID-only**: 15-16 digits (51-bit significand), zero allocation. Operations exceeding the BID range would raise. Half the code. - **i128-only**: 39 digits, one allocation per decimal. No dual paths. Simpler but slower with GC churn. ## Design details - **Fixed 18 decimal places**: 10^18 fits in a 64-bit integer, keeping the SCALE factor cheap for multiplication and division. 18 places cover all ISO 4217 currency subdivisions. - **Truncation toward zero for `*` and `/`**: consistent with C integer division. Floored division for `%`, `div`, `divmod` (matching Ruby's Integer). - **Exact input conversion**: `Decimal("1e-19")` raises `ArgumentError` because the value cannot be represented in 18 decimal places. `Decimal("1e-19", exception: false)` returns `nil`. Trailing zeros beyond 18 places are accepted: `Decimal("1.10000000000000000000")` is `1.1d`. Arithmetic truncation is separate and expected. - **Float conversion via `Float#to_s` then parse**: `Decimal(0.1)` gives `0.1d`, not `0.1000000000000000055...d`. - **`0d` is a Decimal literal**: `0d` produces `Decimal(0)`. `0d42` remains `Integer(42)` (the existing decimal-integer prefix). `0D42` also remains `Integer(42)` (only lowercase `d` produces Decimal). - **Frozen and Ractor-shareable**: like Rational. No mutable state. ## Portability The prototype requires `__int128` (GCC and Clang). For the heap variant, MSVC would need a two-word i128 emulation or a pure-C fallback using `int64_t hi, lo` fields along with appropriate operations. The BID immediate tier (64-bit only) works everywhere. ## Scope Implementation (`decimal.c`, `decimal.rb`), VM fast paths, YJIT and ZJIT type tracking and codegen, serialization, Kernel converters and `prism_compile.c`. The `d` literal suffix requires a small Prism upstream change (~60 lines in `prism.c` plus regenerated sources). Psych would need a separate patch for YAML serialization. Both would be submitted as upstream PRs if this proposal is accepted. ## Gem A gem version provides the same semantics as a C extension with pure Ruby fallback. It gets 14.2K i/s on compound interest with YJIT, versus core Decimal's 93K (6.5x slower). A gem cannot add VALUE tag bytes, register BOPs or teach YJIT new types, so it must heap-allocate every result and go through full method dispatch. ## Related work | Language | Type | Encoding | Precision | Normalized | |----------|------|----------|-----------|------------| | Intel libbid | BID64 | 1+13+50 combination field | 16 digits | no (cohorts) | | Intel libbid | BID128 | 1+17+110 floating-point | 34 digits | no (cohorts) | | Roc | Dec | i128 fixed-point (* 10^18) | 39 digits | n/a (fixed scale) | | C# | System.Decimal | 96-bit sig + 5-bit scale | 28-29 digits | no | | Ruby | Decimal (this) | 51-bit immediate + i128 heap | 15-16 digits (BID), full i128 (heap) | yes (canonical) | **Immediate tier vs Intel BID64**: Intel's combination-field encoding gets 16 digits from 64 bits (vs our 15-16) by implicitly encoding the leading significand digit. The cost is decoder complexity and unnormalized cohorts. `1.0` and `1.00` have different bit patterns, requiring rescaling for equality. Our BID encoding strips trailing zeros for a canonical form: equal immediates are always identical bit patterns, so equality is a single-word comparison. Heap decimals use i128 value comparison. **Heap tier vs Roc Dec**: nearly identical design. Same i128 scaled by 10^18, same 256-bit widening for multiply and divide. Our additions: reciprocal lookup tables for division-free scale reduction, Barrett reduction for the SCALE division and promotion to the immediate tier when results fit. **Both tiers vs C# System.Decimal**: C# uses a 96-bit significand with variable scale 0-28 in a single 128-bit value type. More precision (28-29 digits) than our immediate but no fast path. All arithmetic operates on three 32-bit words. Not normalized. Ruby doesn't have value types, so C#'s stack-allocation advantage doesn't apply. The tagged immediate achieves the same effect. Like Integer, the two tiers are invisible in Rubyland. A Decimal is a Decimal. https://github.com/ruby/ruby/pull/16659 -- https://bugs.ruby-lang.org/
Issue #21982 has been updated by Eregon (Benoit Daloze). For context, how much of this is AI-generated? Is it necessary to be faster than BigDecimal, for which realistic use cases? Could BigDecimal be improved instead? I think the bar for a new numeric core type is very high, and this would add a lot of complexity. Ruby's design is few core types but versatile and many methods, this doesn't really fit that. ---------------------------------------- Feature #21982: Add `Decimal` as a core numeric class https://bugs.ruby-lang.org/issues/21982#change-116927 * Author: shan (Shannon Skipper) * Status: Open ---------------------------------------- # Feature: Add `Decimal` as a core numeric class ## Abstract Add `Decimal < Numeric` to Ruby core: exact base-10 arithmetic using a tagged immediate VALUE (like Fixnum) for small values, promoting to a 128-bit heap object for larger values. ## Background Ruby apps that handle money, tax rates or measurements often use Integers with extra business logic since Float can't represent most base-10 fractions exactly: ```ruby 0.1 + 0.2 == 0.3 #=> false 0.1 + 0.2 #=> 0.30000000000000004 0.1d + 0.2d == 0.3d #=> true 0.1d + 0.2d #=> 0.3d ``` Alternatives have tradeoffs: - **BigDecimal**: correct but 8x slower than Float on compound interest. - **Rational**: correct but equally slow, and `Rational("19.99").to_s` gives `"1999/100"`. - **Integer cents**: correct and fast but pushes formatting and decimal-point tracking into application code. ## Proposal ```ruby # Literal syntax price = 19.99d tax_rate = 0.0875d total = (price * (1d + tax_rate)).round(2) #=> 21.74d # Kernel converter (like Integer(), Float()) Decimal("29.99") #=> 29.99d Decimal(42) #=> 42.0d # Value semantics: frozen, Ractor-shareable 19.99d.frozen? #=> true # Full numeric protocol 19.99d + 1 #=> 20.99d 19.99d <=> 20.0d #=> -1 19.99d.round(1) #=> 20.0d # Human-focused string interpolation "$#{19.99d}" #=> "$19.99" "$#{BigDecimal("19.99").to_s("F")}" #=> "$19.99" ``` ### Features - **`d` literal suffix**: `42d`, `3.14d`, `0.1d` (matching `r` for Rational, `i` for Complex) - **Frozen and Ractor-shareable**: value semantics like Rational - **18 decimal places**: fixed precision, full signed 128-bit range - **`Kernel#Decimal()` converter**: with `exception: false` support - **Full numeric protocol**: arithmetic, comparison, coercion, rounding, `to_i`/`to_f`/`to_r`/`to_s`, pattern matching ## Performance Apple M4 with YJIT. All values pre-allocated outside the measurement loop. ### Compound interest: 360 monthly iterations `balance = (balance * (1 + rate)).round(2)` repeated 360 times. A tight loop of multiply, add, round. Both Decimal and Float produce $60,225.61. | Type | YJIT | No JIT | |------|------|--------| | Decimal (BID) | 93K i/s | 55K i/s | | Float | 83K i/s | 60K i/s | | Rational | 10.7K i/s | 9.4K i/s | | BigDecimal | 10.1K i/s | 9.3K i/s | With YJIT, Decimal is 1.12x faster than Float on compound interest. Without YJIT, Float is 1.1x faster. YJIT helps Decimal more (1.7x speedup vs Float's 1.4x) because BOPs and unchecked entry points skip the per-call type checks. Rational and BigDecimal are ~9x slower than Decimal either way. ### Per-operation (benchmark-driver, YJIT) | Operation | Decimal (BID) | Float | Ratio | |-----------|---------------|-------|-------| | add | 147M i/s | 160M i/s | 1.09x slower | | mul | 159M i/s | 158M i/s | ~parity | | round(2) | 118M i/s | 78M i/s | 1.5x faster | | div (inexact) | 49M i/s | 140M i/s | 2.9x slower | | parse | 34M i/s | 34M i/s | parity | | to_s | 32M i/s | 9M i/s | 3.4x faster | | sum(1000) | 1.27M i/s | 794K i/s | 1.6x faster | Add and mul are near Float parity. Round is 1.5x faster, to_s 3.4x faster. Division is 2.9x slower (inexact results need wide arithmetic). ## Design Two-tier storage, mirroring Fixnum/Bignum: ``` Significand <= 2^51 - 1: 8 bytes, no allocation 63 62 12 11 8 7 0 +---+------------------------+-----+-------+ | 0 | 1999 | 2 | 0x84 | +---+------------------------+-----+-------+ sign significand (51 bits) scale tag 64 bits encode sign, significand, decimal position and type tag. The value IS the VALUE, like Fixnum. All 15-digit significands fit. Some 16-digit significands fit (up to 2,251,799,813,685,247). Significand > 2^51 - 1: heap allocated +--------+ +----------------+----------------+--------------------------------+ | ptr | --> | flags + klass | value * 10**18 | +--------+ | 16 bytes | 16 bytes | VALUE +----------------+----------------+--------------------------------+ 8 bytes object header full i128 range, 18 decimal places Standard Ruby object header with embedded i128 payload. ``` `Decimal("12.34")` is an immediate. No object, no allocation, no GC. `Decimal("9_999_999_999_999_999.99")` promotes to heap (significand exceeds 51 bits). `Decimal("123_456_789_012_345_678_901_234_567_890_123_456.78")` raises RangeError (exceeds 128-bit range). ### Optimization layers The prototype implements analogous layers to Float and Integer: - 13 BOPs with `DECIMAL_REDEFINED_OP_FLAG` - Interpreter fast paths in `vm_opt_plus/minus/mult/div/mod`, `vm_opt_lt/le/gt/ge`, `opt_equality_specialized` - YJIT `Type::Decimal` with inline BID add/sub and BOP guard paths - ZJIT `types::Decimal` with profiler support and method annotations - Unchecked `_dd` entry points for YJIT and interpreter - Reciprocal lookup tables for division-free scale reduction ### Heap arithmetic Heap multiply and divide use 256-bit widening (same algorithm as Roc). Optional fast paths exploit the fact that SCALE (10^18) fits in u64: schoolbook two-division `wide_div`, single-operand `wide_mul_64` and Barrett reduction for the u128 case. These improve heap multiply by ~25% and heap division by ~50%. All are removable without affecting correctness or BID performance. ## Type coercion When Decimal interacts with other numeric types: ```ruby 1.5d + 1 #=> 2.5d (Integer promotes to Decimal) 1.5d + 0.5 #=> 2.0 (Decimal demotes to Float) 1.5d + 1/4r #=> 1.75d (Rational promotes to Decimal) 1.5d + 1/3r # ArgumentError (1/3 exceeds 18 decimal places) 1.5d == 1.5 #=> true (compared via Rational) 1.5d == 3/2r #=> true (Rational comparison via <=>) ``` Decimal + Integer returns Decimal (lossless). Decimal + Float returns Float (caller chose approximate arithmetic). Decimal + Rational returns Decimal when the Rational is exactly representable in 18 decimal places, raises ArgumentError otherwise. Conversion is exact. Only arithmetic results (`*`, `/`) truncate. ## Relationship to BigDecimal Decimal and BigDecimal serve different needs: - **Decimal**: fixed precision (18 places), core type, immediate encoding, JIT-optimized. For the common case: prices, percentages, measurements. - **BigDecimal**: arbitrary precision, bundled gem, heap-allocated. For when you need more than 18 decimal places or unbounded digit counts. They can coexist. The Decimal conversion method is `to_dec` to avoid conflict with `bigdecimal/util`, which defines `to_d`. If `to_d` can be shared or BigDecimal's deprecated, `to_d` would be more natural. ## Why two Decimal tiers Intel's BID64 gives a 64-bit immediate with 16 digits. Roc's Dec gives a 128-bit fixed-point value with 39 digits. Both are proven designs. Ruby's approach combines them, the same way Integer combines Fixnum and Bignum. Small values are immediates, large ones promote to heap. Transparent to the programmer. Two simpler alternatives are also viable: - **BID-only**: 15-16 digits (51-bit significand), zero allocation. Operations exceeding the BID range would raise. Half the code. - **i128-only**: 39 digits, one allocation per decimal. No dual paths. Simpler but slower with GC churn. ## Design details - **Fixed 18 decimal places**: 10^18 fits in a 64-bit integer, keeping the SCALE factor cheap for multiplication and division. 18 places cover all ISO 4217 currency subdivisions. - **Truncation toward zero for `*` and `/`**: consistent with C integer division. Floored division for `%`, `div`, `divmod` (matching Ruby's Integer). - **Exact input conversion**: `Decimal("1e-19")` raises `ArgumentError` because the value cannot be represented in 18 decimal places. `Decimal("1e-19", exception: false)` returns `nil`. Trailing zeros beyond 18 places are accepted: `Decimal("1.10000000000000000000")` is `1.1d`. Arithmetic truncation is separate and expected. - **Float conversion via `Float#to_s` then parse**: `Decimal(0.1)` gives `0.1d`, not `0.1000000000000000055...d`. - **`0d` is a Decimal literal**: `0d` produces `Decimal(0)`. `0d42` remains `Integer(42)` (the existing decimal-integer prefix). `0D42` also remains `Integer(42)` (only lowercase `d` produces Decimal). - **Frozen and Ractor-shareable**: like Rational. No mutable state. ## Portability The prototype requires `__int128` (GCC and Clang). For the heap variant, MSVC would need a two-word i128 emulation or a pure-C fallback using `int64_t hi, lo` fields along with appropriate operations. The BID immediate tier (64-bit only) works everywhere. ## Scope Implementation (`decimal.c`, `decimal.rb`), VM fast paths, YJIT and ZJIT type tracking and codegen, serialization, Kernel converters and `prism_compile.c`. The `d` literal suffix requires a small Prism upstream change (~60 lines in `prism.c` plus regenerated sources). Psych would need a separate patch for YAML serialization. Both would be submitted as upstream PRs if this proposal is accepted. ## Gem A gem version provides the same semantics as a C extension with pure Ruby fallback. It gets 14.2K i/s on compound interest with YJIT, versus core Decimal's 93K (6.5x slower). A gem cannot add VALUE tag bytes, register BOPs or teach YJIT new types, so it must heap-allocate every result and go through full method dispatch. ## Related work | Language | Type | Encoding | Precision | Normalized | |----------|------|----------|-----------|------------| | Intel libbid | BID64 | 1+13+50 combination field | 16 digits | no (cohorts) | | Intel libbid | BID128 | 1+17+110 floating-point | 34 digits | no (cohorts) | | Roc | Dec | i128 fixed-point (* 10^18) | 39 digits | n/a (fixed scale) | | C# | System.Decimal | 96-bit sig + 5-bit scale | 28-29 digits | no | | Ruby | Decimal (this) | 51-bit immediate + i128 heap | 15-16 digits (BID), full i128 (heap) | yes (canonical) | **Immediate tier vs Intel BID64**: Intel's combination-field encoding gets 16 digits from 64 bits (vs our 15-16) by implicitly encoding the leading significand digit. The cost is decoder complexity and unnormalized cohorts. `1.0` and `1.00` have different bit patterns, requiring rescaling for equality. Our BID encoding strips trailing zeros for a canonical form: equal immediates are always identical bit patterns, so equality is a single-word comparison. Heap decimals use i128 value comparison. **Heap tier vs Roc Dec**: nearly identical design. Same i128 scaled by 10^18, same 256-bit widening for multiply and divide. Our additions: reciprocal lookup tables for division-free scale reduction, Barrett reduction for the SCALE division and promotion to the immediate tier when results fit. **Both tiers vs C# System.Decimal**: C# uses a 96-bit significand with variable scale 0-28 in a single 128-bit value type. More precision (28-29 digits) than our immediate but no fast path. All arithmetic operates on three 32-bit words. Not normalized. Ruby doesn't have value types, so C#'s stack-allocation advantage doesn't apply. The tagged immediate achieves the same effect. Like Integer, the two tiers are invisible in Rubyland. A Decimal is a Decimal. https://github.com/ruby/ruby/pull/16659 -- https://bugs.ruby-lang.org/
Issue #21982 has been updated by naruse (Yui NARUSE). Combining two different precision decimals doesn't sound reasonable. It should use a single precision to ease to understand how it works. We long discussed about decimal including introducing based on C _Decimal128, but I don't understand what is the actual use case of it. ---------------------------------------- Feature #21982: Add `Decimal` as a core numeric class https://bugs.ruby-lang.org/issues/21982#change-116928 * Author: shan (Shannon Skipper) * Status: Open ---------------------------------------- # Feature: Add `Decimal` as a core numeric class ## Abstract Add `Decimal < Numeric` to Ruby core: exact base-10 arithmetic using a tagged immediate VALUE (like Fixnum) for small values, promoting to a 128-bit heap object for larger values. ## Background Ruby apps that handle money, tax rates or measurements often use Integers with extra business logic since Float can't represent most base-10 fractions exactly: ```ruby 0.1 + 0.2 == 0.3 #=> false 0.1 + 0.2 #=> 0.30000000000000004 0.1d + 0.2d == 0.3d #=> true 0.1d + 0.2d #=> 0.3d ``` Alternatives have tradeoffs: - **BigDecimal**: correct but 8x slower than Float on compound interest. - **Rational**: correct but equally slow, and `Rational("19.99").to_s` gives `"1999/100"`. - **Integer cents**: correct and fast but pushes formatting and decimal-point tracking into application code. ## Proposal ```ruby # Literal syntax price = 19.99d tax_rate = 0.0875d total = (price * (1d + tax_rate)).round(2) #=> 21.74d # Kernel converter (like Integer(), Float()) Decimal("29.99") #=> 29.99d Decimal(42) #=> 42.0d # Value semantics: frozen, Ractor-shareable 19.99d.frozen? #=> true # Full numeric protocol 19.99d + 1 #=> 20.99d 19.99d <=> 20.0d #=> -1 19.99d.round(1) #=> 20.0d # Human-focused string interpolation "$#{19.99d}" #=> "$19.99" "$#{BigDecimal("19.99").to_s("F")}" #=> "$19.99" ``` ### Features - **`d` literal suffix**: `42d`, `3.14d`, `0.1d` (matching `r` for Rational, `i` for Complex) - **Frozen and Ractor-shareable**: value semantics like Rational - **18 decimal places**: fixed precision, full signed 128-bit range - **`Kernel#Decimal()` converter**: with `exception: false` support - **Full numeric protocol**: arithmetic, comparison, coercion, rounding, `to_i`/`to_f`/`to_r`/`to_s`, pattern matching ## Performance Apple M4 with YJIT. All values pre-allocated outside the measurement loop. ### Compound interest: 360 monthly iterations `balance = (balance * (1 + rate)).round(2)` repeated 360 times. A tight loop of multiply, add, round. Both Decimal and Float produce $60,225.61. | Type | YJIT | No JIT | |------|------|--------| | Decimal (BID) | 93K i/s | 55K i/s | | Float | 83K i/s | 60K i/s | | Rational | 10.7K i/s | 9.4K i/s | | BigDecimal | 10.1K i/s | 9.3K i/s | With YJIT, Decimal is 1.12x faster than Float on compound interest. Without YJIT, Float is 1.1x faster. YJIT helps Decimal more (1.7x speedup vs Float's 1.4x) because BOPs and unchecked entry points skip the per-call type checks. Rational and BigDecimal are ~9x slower than Decimal either way. ### Per-operation (benchmark-driver, YJIT) | Operation | Decimal (BID) | Float | Ratio | |-----------|---------------|-------|-------| | add | 147M i/s | 160M i/s | 1.09x slower | | mul | 159M i/s | 158M i/s | ~parity | | round(2) | 118M i/s | 78M i/s | 1.5x faster | | div (inexact) | 49M i/s | 140M i/s | 2.9x slower | | parse | 34M i/s | 34M i/s | parity | | to_s | 32M i/s | 9M i/s | 3.4x faster | | sum(1000) | 1.27M i/s | 794K i/s | 1.6x faster | Add and mul are near Float parity. Round is 1.5x faster, to_s 3.4x faster. Division is 2.9x slower (inexact results need wide arithmetic). ## Design Two-tier storage, mirroring Fixnum/Bignum: ``` Significand <= 2^51 - 1: 8 bytes, no allocation 63 62 12 11 8 7 0 +---+------------------------+-----+-------+ | 0 | 1999 | 2 | 0x84 | +---+------------------------+-----+-------+ sign significand (51 bits) scale tag 64 bits encode sign, significand, decimal position and type tag. The value IS the VALUE, like Fixnum. All 15-digit significands fit. Some 16-digit significands fit (up to 2,251,799,813,685,247). Significand > 2^51 - 1: heap allocated +--------+ +----------------+----------------+--------------------------------+ | ptr | --> | flags + klass | value * 10**18 | +--------+ | 16 bytes | 16 bytes | VALUE +----------------+----------------+--------------------------------+ 8 bytes object header full i128 range, 18 decimal places Standard Ruby object header with embedded i128 payload. ``` `Decimal("12.34")` is an immediate. No object, no allocation, no GC. `Decimal("9_999_999_999_999_999.99")` promotes to heap (significand exceeds 51 bits). `Decimal("123_456_789_012_345_678_901_234_567_890_123_456.78")` raises RangeError (exceeds 128-bit range). ### Optimization layers The prototype implements analogous layers to Float and Integer: - 13 BOPs with `DECIMAL_REDEFINED_OP_FLAG` - Interpreter fast paths in `vm_opt_plus/minus/mult/div/mod`, `vm_opt_lt/le/gt/ge`, `opt_equality_specialized` - YJIT `Type::Decimal` with inline BID add/sub and BOP guard paths - ZJIT `types::Decimal` with profiler support and method annotations - Unchecked `_dd` entry points for YJIT and interpreter - Reciprocal lookup tables for division-free scale reduction ### Heap arithmetic Heap multiply and divide use 256-bit widening (same algorithm as Roc). Optional fast paths exploit the fact that SCALE (10^18) fits in u64: schoolbook two-division `wide_div`, single-operand `wide_mul_64` and Barrett reduction for the u128 case. These improve heap multiply by ~25% and heap division by ~50%. All are removable without affecting correctness or BID performance. ## Type coercion When Decimal interacts with other numeric types: ```ruby 1.5d + 1 #=> 2.5d (Integer promotes to Decimal) 1.5d + 0.5 #=> 2.0 (Decimal demotes to Float) 1.5d + 1/4r #=> 1.75d (Rational promotes to Decimal) 1.5d + 1/3r # ArgumentError (1/3 exceeds 18 decimal places) 1.5d == 1.5 #=> true (compared via Rational) 1.5d == 3/2r #=> true (Rational comparison via <=>) ``` Decimal + Integer returns Decimal (lossless). Decimal + Float returns Float (caller chose approximate arithmetic). Decimal + Rational returns Decimal when the Rational is exactly representable in 18 decimal places, raises ArgumentError otherwise. Conversion is exact. Only arithmetic results (`*`, `/`) truncate. ## Relationship to BigDecimal Decimal and BigDecimal serve different needs: - **Decimal**: fixed precision (18 places), core type, immediate encoding, JIT-optimized. For the common case: prices, percentages, measurements. - **BigDecimal**: arbitrary precision, bundled gem, heap-allocated. For when you need more than 18 decimal places or unbounded digit counts. They can coexist. The Decimal conversion method is `to_dec` to avoid conflict with `bigdecimal/util`, which defines `to_d`. If `to_d` can be shared or BigDecimal's deprecated, `to_d` would be more natural. ## Why two Decimal tiers Intel's BID64 gives a 64-bit immediate with 16 digits. Roc's Dec gives a 128-bit fixed-point value with 39 digits. Both are proven designs. Ruby's approach combines them, the same way Integer combines Fixnum and Bignum. Small values are immediates, large ones promote to heap. Transparent to the programmer. Two simpler alternatives are also viable: - **BID-only**: 15-16 digits (51-bit significand), zero allocation. Operations exceeding the BID range would raise. Half the code. - **i128-only**: 39 digits, one allocation per decimal. No dual paths. Simpler but slower with GC churn. ## Design details - **Fixed 18 decimal places**: 10^18 fits in a 64-bit integer, keeping the SCALE factor cheap for multiplication and division. 18 places cover all ISO 4217 currency subdivisions. - **Truncation toward zero for `*` and `/`**: consistent with C integer division. Floored division for `%`, `div`, `divmod` (matching Ruby's Integer). - **Exact input conversion**: `Decimal("1e-19")` raises `ArgumentError` because the value cannot be represented in 18 decimal places. `Decimal("1e-19", exception: false)` returns `nil`. Trailing zeros beyond 18 places are accepted: `Decimal("1.10000000000000000000")` is `1.1d`. Arithmetic truncation is separate and expected. - **Float conversion via `Float#to_s` then parse**: `Decimal(0.1)` gives `0.1d`, not `0.1000000000000000055...d`. - **`0d` is a Decimal literal**: `0d` produces `Decimal(0)`. `0d42` remains `Integer(42)` (the existing decimal-integer prefix). `0D42` also remains `Integer(42)` (only lowercase `d` produces Decimal). - **Frozen and Ractor-shareable**: like Rational. No mutable state. ## Portability The prototype requires `__int128` (GCC and Clang). For the heap variant, MSVC would need a two-word i128 emulation or a pure-C fallback using `int64_t hi, lo` fields along with appropriate operations. The BID immediate tier (64-bit only) works everywhere. ## Scope Implementation (`decimal.c`, `decimal.rb`), VM fast paths, YJIT and ZJIT type tracking and codegen, serialization, Kernel converters and `prism_compile.c`. The `d` literal suffix requires a small Prism upstream change (~60 lines in `prism.c` plus regenerated sources). Psych would need a separate patch for YAML serialization. Both would be submitted as upstream PRs if this proposal is accepted. ## Gem A gem version provides the same semantics as a C extension with pure Ruby fallback. It gets 14.2K i/s on compound interest with YJIT, versus core Decimal's 93K (6.5x slower). A gem cannot add VALUE tag bytes, register BOPs or teach YJIT new types, so it must heap-allocate every result and go through full method dispatch. ## Related work | Language | Type | Encoding | Precision | Normalized | |----------|------|----------|-----------|------------| | Intel libbid | BID64 | 1+13+50 combination field | 16 digits | no (cohorts) | | Intel libbid | BID128 | 1+17+110 floating-point | 34 digits | no (cohorts) | | Roc | Dec | i128 fixed-point (* 10^18) | 39 digits | n/a (fixed scale) | | C# | System.Decimal | 96-bit sig + 5-bit scale | 28-29 digits | no | | Ruby | Decimal (this) | 51-bit immediate + i128 heap | 15-16 digits (BID), full i128 (heap) | yes (canonical) | **Immediate tier vs Intel BID64**: Intel's combination-field encoding gets 16 digits from 64 bits (vs our 15-16) by implicitly encoding the leading significand digit. The cost is decoder complexity and unnormalized cohorts. `1.0` and `1.00` have different bit patterns, requiring rescaling for equality. Our BID encoding strips trailing zeros for a canonical form: equal immediates are always identical bit patterns, so equality is a single-word comparison. Heap decimals use i128 value comparison. **Heap tier vs Roc Dec**: nearly identical design. Same i128 scaled by 10^18, same 256-bit widening for multiply and divide. Our additions: reciprocal lookup tables for division-free scale reduction, Barrett reduction for the SCALE division and promotion to the immediate tier when results fit. **Both tiers vs C# System.Decimal**: C# uses a 96-bit significand with variable scale 0-28 in a single 128-bit value type. More precision (28-29 digits) than our immediate but no fast path. All arithmetic operates on three 32-bit words. Not normalized. Ruby doesn't have value types, so C#'s stack-allocation advantage doesn't apply. The tagged immediate achieves the same effect. Like Integer, the two tiers are invisible in Rubyland. A Decimal is a Decimal. https://github.com/ruby/ruby/pull/16659 -- https://bugs.ruby-lang.org/
Issue #21982 has been updated by shan (Shannon Skipper). Eregon (Benoit Daloze) wrote in #note-1:
For context, how much of this is AI-generated?
I implemented the i128 fixed-point arithmetic by porting Roc's Dec, then used AI to explore Ruby-specific optimizations (interpreter fast paths, YJIT inline paths) for that heap path. Several of those optimizations caused stack canary failures under so I rolled them back in the latest commit. I also used AI to summarize related work and address CI failures. The architectural decisions (two-tier storage, tag byte choice, coercion rules) are mine.
Is it necessary to be faster than BigDecimal, for which realistic use cases?
A realistic use case would be something like repetitive tight loop math around standard base 10 human amounts and measurements. Like batch calculations around compound interest. Memory usage and GC churn with heap allocations is another concern. I do think the perception of BigDecimal using several times more memory and being 10-20x slower hurts adoption. In fintech we commonly use Integers with extra business logic as a viable workaround.
Could BigDecimal be improved instead?
Yes, if it was made core. I don't think an i128 heap would be worth adding (which could be done with it as a gem), since it's still multiple times slower than Fixnum/Flonum. On the other hand, adding immediate BID support would be a huge performance improvement for almost all numbers we commonly use in human amounts, but requires that part be in core. BigDecimal has some ergonomics issues, but it could have a `d` suffix literal if it was in core too. The same fallback logic could apply, going to the BigDecimal gem for numbers that require more than 51 bit significand.
I think the bar for a new numeric core type is very high, and this would add a lot of complexity. Ruby's design is few core types but versatile and many methods, this doesn't really fit that.
I hesitated to propose this since I agree it's complex and the bar is high, but it'd be awesome for Rubyists to be able to do fast base 10 math. I liked the flexibility of falling back from very fast immediate BID to a more flexible and still fairly fast heap. In retrospect it might have been better to propose a simple immediate-only Decimal. The downside there is only handling up to 16 digits, but that covers the vast majority of numbers we frequently use for measurements. ---------------------------------------- Feature #21982: Add `Decimal` as a core numeric class https://bugs.ruby-lang.org/issues/21982#change-116929 * Author: shan (Shannon Skipper) * Status: Open ---------------------------------------- # Feature: Add `Decimal` as a core numeric class ## Abstract Add `Decimal < Numeric` to Ruby core: exact base-10 arithmetic using a tagged immediate VALUE (like Fixnum) for small values, promoting to a 128-bit heap object for larger values. ## Background Ruby apps that handle money, tax rates or measurements often use Integers with extra business logic since Float can't represent most base-10 fractions exactly: ```ruby 0.1 + 0.2 == 0.3 #=> false 0.1 + 0.2 #=> 0.30000000000000004 0.1d + 0.2d == 0.3d #=> true 0.1d + 0.2d #=> 0.3d ``` Alternatives have tradeoffs: - **BigDecimal**: correct but 8x slower than Float on compound interest. - **Rational**: correct but equally slow, and `Rational("19.99").to_s` gives `"1999/100"`. - **Integer cents**: correct and fast but pushes formatting and decimal-point tracking into application code. ## Proposal ```ruby # Literal syntax price = 19.99d tax_rate = 0.0875d total = (price * (1d + tax_rate)).round(2) #=> 21.74d # Kernel converter (like Integer(), Float()) Decimal("29.99") #=> 29.99d Decimal(42) #=> 42.0d # Value semantics: frozen, Ractor-shareable 19.99d.frozen? #=> true # Full numeric protocol 19.99d + 1 #=> 20.99d 19.99d <=> 20.0d #=> -1 19.99d.round(1) #=> 20.0d # Human-focused string interpolation "$#{19.99d}" #=> "$19.99" "$#{BigDecimal("19.99").to_s("F")}" #=> "$19.99" ``` ### Features - **`d` literal suffix**: `42d`, `3.14d`, `0.1d` (matching `r` for Rational, `i` for Complex) - **Frozen and Ractor-shareable**: value semantics like Rational - **18 decimal places**: fixed precision, full signed 128-bit range - **`Kernel#Decimal()` converter**: with `exception: false` support - **Full numeric protocol**: arithmetic, comparison, coercion, rounding, `to_i`/`to_f`/`to_r`/`to_s`, pattern matching ## Performance Apple M4 with YJIT. All values pre-allocated outside the measurement loop. ### Compound interest: 360 monthly iterations `balance = (balance * (1 + rate)).round(2)` repeated 360 times. A tight loop of multiply, add, round. Both Decimal and Float produce $60,225.61. | Type | YJIT | No JIT | |------|------|--------| | Decimal (BID) | 93K i/s | 55K i/s | | Float | 83K i/s | 60K i/s | | Rational | 10.7K i/s | 9.4K i/s | | BigDecimal | 10.1K i/s | 9.3K i/s | With YJIT, Decimal is 1.12x faster than Float on compound interest. Without YJIT, Float is 1.1x faster. YJIT helps Decimal more (1.7x speedup vs Float's 1.4x) because BOPs and unchecked entry points skip the per-call type checks. Rational and BigDecimal are ~9x slower than Decimal either way. ### Per-operation (benchmark-driver, YJIT) | Operation | Decimal (BID) | Float | Ratio | |-----------|---------------|-------|-------| | add | 147M i/s | 160M i/s | 1.09x slower | | mul | 159M i/s | 158M i/s | ~parity | | round(2) | 118M i/s | 78M i/s | 1.5x faster | | div (inexact) | 49M i/s | 140M i/s | 2.9x slower | | parse | 34M i/s | 34M i/s | parity | | to_s | 32M i/s | 9M i/s | 3.4x faster | | sum(1000) | 1.27M i/s | 794K i/s | 1.6x faster | Add and mul are near Float parity. Round is 1.5x faster, to_s 3.4x faster. Division is 2.9x slower (inexact results need wide arithmetic). ## Design Two-tier storage, mirroring Fixnum/Bignum: ``` Significand <= 2^51 - 1: 8 bytes, no allocation 63 62 12 11 8 7 0 +---+------------------------+-----+-------+ | 0 | 1999 | 2 | 0x84 | +---+------------------------+-----+-------+ sign significand (51 bits) scale tag 64 bits encode sign, significand, decimal position and type tag. The value IS the VALUE, like Fixnum. All 15-digit significands fit. Some 16-digit significands fit (up to 2,251,799,813,685,247). Significand > 2^51 - 1: heap allocated +--------+ +----------------+----------------+--------------------------------+ | ptr | --> | flags + klass | value * 10**18 | +--------+ | 16 bytes | 16 bytes | VALUE +----------------+----------------+--------------------------------+ 8 bytes object header full i128 range, 18 decimal places Standard Ruby object header with embedded i128 payload. ``` `Decimal("12.34")` is an immediate. No object, no allocation, no GC. `Decimal("9_999_999_999_999_999.99")` promotes to heap (significand exceeds 51 bits). `Decimal("123_456_789_012_345_678_901_234_567_890_123_456.78")` raises RangeError (exceeds 128-bit range). ### Optimization layers The prototype implements analogous layers to Float and Integer: - 13 BOPs with `DECIMAL_REDEFINED_OP_FLAG` - Interpreter fast paths in `vm_opt_plus/minus/mult/div/mod`, `vm_opt_lt/le/gt/ge`, `opt_equality_specialized` - YJIT `Type::Decimal` with inline BID add/sub and BOP guard paths - ZJIT `types::Decimal` with profiler support and method annotations - Unchecked `_dd` entry points for YJIT and interpreter - Reciprocal lookup tables for division-free scale reduction ### Heap arithmetic Heap multiply and divide use 256-bit widening (same algorithm as Roc). Optional fast paths exploit the fact that SCALE (10^18) fits in u64: schoolbook two-division `wide_div`, single-operand `wide_mul_64` and Barrett reduction for the u128 case. These improve heap multiply by ~25% and heap division by ~50%. All are removable without affecting correctness or BID performance. ## Type coercion When Decimal interacts with other numeric types: ```ruby 1.5d + 1 #=> 2.5d (Integer promotes to Decimal) 1.5d + 0.5 #=> 2.0 (Decimal demotes to Float) 1.5d + 1/4r #=> 1.75d (Rational promotes to Decimal) 1.5d + 1/3r # ArgumentError (1/3 exceeds 18 decimal places) 1.5d == 1.5 #=> true (compared via Rational) 1.5d == 3/2r #=> true (Rational comparison via <=>) ``` Decimal + Integer returns Decimal (lossless). Decimal + Float returns Float (caller chose approximate arithmetic). Decimal + Rational returns Decimal when the Rational is exactly representable in 18 decimal places, raises ArgumentError otherwise. Conversion is exact. Only arithmetic results (`*`, `/`) truncate. ## Relationship to BigDecimal Decimal and BigDecimal serve different needs: - **Decimal**: fixed precision (18 places), core type, immediate encoding, JIT-optimized. For the common case: prices, percentages, measurements. - **BigDecimal**: arbitrary precision, bundled gem, heap-allocated. For when you need more than 18 decimal places or unbounded digit counts. They can coexist. The Decimal conversion method is `to_dec` to avoid conflict with `bigdecimal/util`, which defines `to_d`. If `to_d` can be shared or BigDecimal's deprecated, `to_d` would be more natural. ## Why two Decimal tiers Intel's BID64 gives a 64-bit immediate with 16 digits. Roc's Dec gives a 128-bit fixed-point value with 39 digits. Both are proven designs. Ruby's approach combines them, the same way Integer combines Fixnum and Bignum. Small values are immediates, large ones promote to heap. Transparent to the programmer. Two simpler alternatives are also viable: - **BID-only**: 15-16 digits (51-bit significand), zero allocation. Operations exceeding the BID range would raise. Half the code. - **i128-only**: 39 digits, one allocation per decimal. No dual paths. Simpler but slower with GC churn. ## Design details - **Fixed 18 decimal places**: 10^18 fits in a 64-bit integer, keeping the SCALE factor cheap for multiplication and division. 18 places cover all ISO 4217 currency subdivisions. - **Truncation toward zero for `*` and `/`**: consistent with C integer division. Floored division for `%`, `div`, `divmod` (matching Ruby's Integer). - **Exact input conversion**: `Decimal("1e-19")` raises `ArgumentError` because the value cannot be represented in 18 decimal places. `Decimal("1e-19", exception: false)` returns `nil`. Trailing zeros beyond 18 places are accepted: `Decimal("1.10000000000000000000")` is `1.1d`. Arithmetic truncation is separate and expected. - **Float conversion via `Float#to_s` then parse**: `Decimal(0.1)` gives `0.1d`, not `0.1000000000000000055...d`. - **`0d` is a Decimal literal**: `0d` produces `Decimal(0)`. `0d42` remains `Integer(42)` (the existing decimal-integer prefix). `0D42` also remains `Integer(42)` (only lowercase `d` produces Decimal). - **Frozen and Ractor-shareable**: like Rational. No mutable state. ## Portability The prototype requires `__int128` (GCC and Clang). For the heap variant, MSVC would need a two-word i128 emulation or a pure-C fallback using `int64_t hi, lo` fields along with appropriate operations. The BID immediate tier (64-bit only) works everywhere. ## Scope Implementation (`decimal.c`, `decimal.rb`), VM fast paths, YJIT and ZJIT type tracking and codegen, serialization, Kernel converters and `prism_compile.c`. The `d` literal suffix requires a small Prism upstream change (~60 lines in `prism.c` plus regenerated sources). Psych would need a separate patch for YAML serialization. Both would be submitted as upstream PRs if this proposal is accepted. ## Gem A gem version provides the same semantics as a C extension with pure Ruby fallback. It gets 14.2K i/s on compound interest with YJIT, versus core Decimal's 93K (6.5x slower). A gem cannot add VALUE tag bytes, register BOPs or teach YJIT new types, so it must heap-allocate every result and go through full method dispatch. ## Related work | Language | Type | Encoding | Precision | Normalized | |----------|------|----------|-----------|------------| | Intel libbid | BID64 | 1+13+50 combination field | 16 digits | no (cohorts) | | Intel libbid | BID128 | 1+17+110 floating-point | 34 digits | no (cohorts) | | Roc | Dec | i128 fixed-point (* 10^18) | 39 digits | n/a (fixed scale) | | C# | System.Decimal | 96-bit sig + 5-bit scale | 28-29 digits | no | | Ruby | Decimal (this) | 51-bit immediate + i128 heap | 15-16 digits (BID), full i128 (heap) | yes (canonical) | **Immediate tier vs Intel BID64**: Intel's combination-field encoding gets 16 digits from 64 bits (vs our 15-16) by implicitly encoding the leading significand digit. The cost is decoder complexity and unnormalized cohorts. `1.0` and `1.00` have different bit patterns, requiring rescaling for equality. Our BID encoding strips trailing zeros for a canonical form: equal immediates are always identical bit patterns, so equality is a single-word comparison. Heap decimals use i128 value comparison. **Heap tier vs Roc Dec**: nearly identical design. Same i128 scaled by 10^18, same 256-bit widening for multiply and divide. Our additions: reciprocal lookup tables for division-free scale reduction, Barrett reduction for the SCALE division and promotion to the immediate tier when results fit. **Both tiers vs C# System.Decimal**: C# uses a 96-bit significand with variable scale 0-28 in a single 128-bit value type. More precision (28-29 digits) than our immediate but no fast path. All arithmetic operates on three 32-bit words. Not normalized. Ruby doesn't have value types, so C#'s stack-allocation advantage doesn't apply. The tagged immediate achieves the same effect. Like Integer, the two tiers are invisible in Rubyland. A Decimal is a Decimal. https://github.com/ruby/ruby/pull/16659 -- https://bugs.ruby-lang.org/
Issue #21982 has been updated by shan (Shannon Skipper). naruse (Yui NARUSE) wrote in #note-2:
Combining two different precision decimals doesn't sound reasonable. It should use a single precision to ease to understand how it works.
I think that's probably fair. I was thinking Fixnum/Flonum and their heap counterparts, but that's backing the same precision.
We long discussed about decimal including introducing based on C _Decimal128, but I don't understand what is the actual use case of it.
An immediate BID representation avoiding the heap seems more appealing if I had to choose one. Has there been discussion of a Flonum-like immediate BID Decimal? I didn't know about C _Decimal128 discussions, but I understand hesitation since it's not that much faster or smaller footprint than BigDecimal and can be done as a gem. ---------------------------------------- Feature #21982: Add `Decimal` as a core numeric class https://bugs.ruby-lang.org/issues/21982#change-116930 * Author: shan (Shannon Skipper) * Status: Open ---------------------------------------- # Feature: Add `Decimal` as a core numeric class ## Abstract Add `Decimal < Numeric` to Ruby core: exact base-10 arithmetic using a tagged immediate VALUE (like Fixnum) for small values, promoting to a 128-bit heap object for larger values. ## Background Ruby apps that handle money, tax rates or measurements often use Integers with extra business logic since Float can't represent most base-10 fractions exactly: ```ruby 0.1 + 0.2 == 0.3 #=> false 0.1 + 0.2 #=> 0.30000000000000004 0.1d + 0.2d == 0.3d #=> true 0.1d + 0.2d #=> 0.3d ``` Alternatives have tradeoffs: - **BigDecimal**: correct but 8x slower than Float on compound interest. - **Rational**: correct but equally slow, and `Rational("19.99").to_s` gives `"1999/100"`. - **Integer cents**: correct and fast but pushes formatting and decimal-point tracking into application code. ## Proposal ```ruby # Literal syntax price = 19.99d tax_rate = 0.0875d total = (price * (1d + tax_rate)).round(2) #=> 21.74d # Kernel converter (like Integer(), Float()) Decimal("29.99") #=> 29.99d Decimal(42) #=> 42.0d # Value semantics: frozen, Ractor-shareable 19.99d.frozen? #=> true # Full numeric protocol 19.99d + 1 #=> 20.99d 19.99d <=> 20.0d #=> -1 19.99d.round(1) #=> 20.0d # Human-focused string interpolation "$#{19.99d}" #=> "$19.99" "$#{BigDecimal("19.99").to_s("F")}" #=> "$19.99" ``` ### Features - **`d` literal suffix**: `42d`, `3.14d`, `0.1d` (matching `r` for Rational, `i` for Complex) - **Frozen and Ractor-shareable**: value semantics like Rational - **18 decimal places**: fixed precision, full signed 128-bit range - **`Kernel#Decimal()` converter**: with `exception: false` support - **Full numeric protocol**: arithmetic, comparison, coercion, rounding, `to_i`/`to_f`/`to_r`/`to_s`, pattern matching ## Performance Apple M4 with YJIT. All values pre-allocated outside the measurement loop. ### Compound interest: 360 monthly iterations `balance = (balance * (1 + rate)).round(2)` repeated 360 times. A tight loop of multiply, add, round. Both Decimal and Float produce $60,225.61. | Type | YJIT | No JIT | |------|------|--------| | Decimal (BID) | 93K i/s | 55K i/s | | Float | 83K i/s | 60K i/s | | Rational | 10.7K i/s | 9.4K i/s | | BigDecimal | 10.1K i/s | 9.3K i/s | With YJIT, Decimal is 1.12x faster than Float on compound interest. Without YJIT, Float is 1.1x faster. YJIT helps Decimal more (1.7x speedup vs Float's 1.4x) because BOPs and unchecked entry points skip the per-call type checks. Rational and BigDecimal are ~9x slower than Decimal either way. ### Per-operation (benchmark-driver, YJIT) | Operation | Decimal (BID) | Float | Ratio | |-----------|---------------|-------|-------| | add | 147M i/s | 160M i/s | 1.09x slower | | mul | 159M i/s | 158M i/s | ~parity | | round(2) | 118M i/s | 78M i/s | 1.5x faster | | div (inexact) | 49M i/s | 140M i/s | 2.9x slower | | parse | 34M i/s | 34M i/s | parity | | to_s | 32M i/s | 9M i/s | 3.4x faster | | sum(1000) | 1.27M i/s | 794K i/s | 1.6x faster | Add and mul are near Float parity. Round is 1.5x faster, to_s 3.4x faster. Division is 2.9x slower (inexact results need wide arithmetic). ## Design Two-tier storage, mirroring Fixnum/Bignum: ``` Significand <= 2^51 - 1: 8 bytes, no allocation 63 62 12 11 8 7 0 +---+------------------------+-----+-------+ | 0 | 1999 | 2 | 0x84 | +---+------------------------+-----+-------+ sign significand (51 bits) scale tag 64 bits encode sign, significand, decimal position and type tag. The value IS the VALUE, like Fixnum. All 15-digit significands fit. Some 16-digit significands fit (up to 2,251,799,813,685,247). Significand > 2^51 - 1: heap allocated +--------+ +----------------+----------------+--------------------------------+ | ptr | --> | flags + klass | value * 10**18 | +--------+ | 16 bytes | 16 bytes | VALUE +----------------+----------------+--------------------------------+ 8 bytes object header full i128 range, 18 decimal places Standard Ruby object header with embedded i128 payload. ``` `Decimal("12.34")` is an immediate. No object, no allocation, no GC. `Decimal("9_999_999_999_999_999.99")` promotes to heap (significand exceeds 51 bits). `Decimal("123_456_789_012_345_678_901_234_567_890_123_456.78")` raises RangeError (exceeds 128-bit range). ### Optimization layers The prototype implements analogous layers to Float and Integer: - 13 BOPs with `DECIMAL_REDEFINED_OP_FLAG` - Interpreter fast paths in `vm_opt_plus/minus/mult/div/mod`, `vm_opt_lt/le/gt/ge`, `opt_equality_specialized` - YJIT `Type::Decimal` with inline BID add/sub and BOP guard paths - ZJIT `types::Decimal` with profiler support and method annotations - Unchecked `_dd` entry points for YJIT and interpreter - Reciprocal lookup tables for division-free scale reduction ### Heap arithmetic Heap multiply and divide use 256-bit widening (same algorithm as Roc). Optional fast paths exploit the fact that SCALE (10^18) fits in u64: schoolbook two-division `wide_div`, single-operand `wide_mul_64` and Barrett reduction for the u128 case. These improve heap multiply by ~25% and heap division by ~50%. All are removable without affecting correctness or BID performance. ## Type coercion When Decimal interacts with other numeric types: ```ruby 1.5d + 1 #=> 2.5d (Integer promotes to Decimal) 1.5d + 0.5 #=> 2.0 (Decimal demotes to Float) 1.5d + 1/4r #=> 1.75d (Rational promotes to Decimal) 1.5d + 1/3r # ArgumentError (1/3 exceeds 18 decimal places) 1.5d == 1.5 #=> true (compared via Rational) 1.5d == 3/2r #=> true (Rational comparison via <=>) ``` Decimal + Integer returns Decimal (lossless). Decimal + Float returns Float (caller chose approximate arithmetic). Decimal + Rational returns Decimal when the Rational is exactly representable in 18 decimal places, raises ArgumentError otherwise. Conversion is exact. Only arithmetic results (`*`, `/`) truncate. ## Relationship to BigDecimal Decimal and BigDecimal serve different needs: - **Decimal**: fixed precision (18 places), core type, immediate encoding, JIT-optimized. For the common case: prices, percentages, measurements. - **BigDecimal**: arbitrary precision, bundled gem, heap-allocated. For when you need more than 18 decimal places or unbounded digit counts. They can coexist. The Decimal conversion method is `to_dec` to avoid conflict with `bigdecimal/util`, which defines `to_d`. If `to_d` can be shared or BigDecimal's deprecated, `to_d` would be more natural. ## Why two Decimal tiers Intel's BID64 gives a 64-bit immediate with 16 digits. Roc's Dec gives a 128-bit fixed-point value with 39 digits. Both are proven designs. Ruby's approach combines them, the same way Integer combines Fixnum and Bignum. Small values are immediates, large ones promote to heap. Transparent to the programmer. Two simpler alternatives are also viable: - **BID-only**: 15-16 digits (51-bit significand), zero allocation. Operations exceeding the BID range would raise. Half the code. - **i128-only**: 39 digits, one allocation per decimal. No dual paths. Simpler but slower with GC churn. ## Design details - **Fixed 18 decimal places**: 10^18 fits in a 64-bit integer, keeping the SCALE factor cheap for multiplication and division. 18 places cover all ISO 4217 currency subdivisions. - **Truncation toward zero for `*` and `/`**: consistent with C integer division. Floored division for `%`, `div`, `divmod` (matching Ruby's Integer). - **Exact input conversion**: `Decimal("1e-19")` raises `ArgumentError` because the value cannot be represented in 18 decimal places. `Decimal("1e-19", exception: false)` returns `nil`. Trailing zeros beyond 18 places are accepted: `Decimal("1.10000000000000000000")` is `1.1d`. Arithmetic truncation is separate and expected. - **Float conversion via `Float#to_s` then parse**: `Decimal(0.1)` gives `0.1d`, not `0.1000000000000000055...d`. - **`0d` is a Decimal literal**: `0d` produces `Decimal(0)`. `0d42` remains `Integer(42)` (the existing decimal-integer prefix). `0D42` also remains `Integer(42)` (only lowercase `d` produces Decimal). - **Frozen and Ractor-shareable**: like Rational. No mutable state. ## Portability The prototype requires `__int128` (GCC and Clang). For the heap variant, MSVC would need a two-word i128 emulation or a pure-C fallback using `int64_t hi, lo` fields along with appropriate operations. The BID immediate tier (64-bit only) works everywhere. ## Scope Implementation (`decimal.c`, `decimal.rb`), VM fast paths, YJIT and ZJIT type tracking and codegen, serialization, Kernel converters and `prism_compile.c`. The `d` literal suffix requires a small Prism upstream change (~60 lines in `prism.c` plus regenerated sources). Psych would need a separate patch for YAML serialization. Both would be submitted as upstream PRs if this proposal is accepted. ## Gem A gem version provides the same semantics as a C extension with pure Ruby fallback. It gets 14.2K i/s on compound interest with YJIT, versus core Decimal's 93K (6.5x slower). A gem cannot add VALUE tag bytes, register BOPs or teach YJIT new types, so it must heap-allocate every result and go through full method dispatch. ## Related work | Language | Type | Encoding | Precision | Normalized | |----------|------|----------|-----------|------------| | Intel libbid | BID64 | 1+13+50 combination field | 16 digits | no (cohorts) | | Intel libbid | BID128 | 1+17+110 floating-point | 34 digits | no (cohorts) | | Roc | Dec | i128 fixed-point (* 10^18) | 39 digits | n/a (fixed scale) | | C# | System.Decimal | 96-bit sig + 5-bit scale | 28-29 digits | no | | Ruby | Decimal (this) | 51-bit immediate + i128 heap | 15-16 digits (BID), full i128 (heap) | yes (canonical) | **Immediate tier vs Intel BID64**: Intel's combination-field encoding gets 16 digits from 64 bits (vs our 15-16) by implicitly encoding the leading significand digit. The cost is decoder complexity and unnormalized cohorts. `1.0` and `1.00` have different bit patterns, requiring rescaling for equality. Our BID encoding strips trailing zeros for a canonical form: equal immediates are always identical bit patterns, so equality is a single-word comparison. Heap decimals use i128 value comparison. **Heap tier vs Roc Dec**: nearly identical design. Same i128 scaled by 10^18, same 256-bit widening for multiply and divide. Our additions: reciprocal lookup tables for division-free scale reduction, Barrett reduction for the SCALE division and promotion to the immediate tier when results fit. **Both tiers vs C# System.Decimal**: C# uses a 96-bit significand with variable scale 0-28 in a single 128-bit value type. More precision (28-29 digits) than our immediate but no fast path. All arithmetic operates on three 32-bit words. Not normalized. Ruby doesn't have value types, so C#'s stack-allocation advantage doesn't apply. The tagged immediate achieves the same effect. Like Integer, the two tiers are invisible in Rubyland. A Decimal is a Decimal. https://github.com/ruby/ruby/pull/16659 -- https://bugs.ruby-lang.org/
Issue #21982 has been updated by mrkn (Kenta Murata). Assignee set to mrkn (Kenta Murata) What version of bigdecimal did you use for the benchmark? We've released [bigdecimal v4.1.1](https://github.com/ruby/bigdecimal/releases/tag/v4.1.1) 2 days ago. The calculation performance is improved in this version by supporting an embedded BigDecimal object (thanks @byroot). Could you please reevaluate with this new version? ---------------------------------------- Feature #21982: Add `Decimal` as a core numeric class https://bugs.ruby-lang.org/issues/21982#change-116935 * Author: shan (Shannon Skipper) * Status: Open * Assignee: mrkn (Kenta Murata) ---------------------------------------- # Feature: Add `Decimal` as a core numeric class ## Abstract Add `Decimal < Numeric` to Ruby core: exact base-10 arithmetic using a tagged immediate VALUE (like Fixnum) for small values, promoting to a 128-bit heap object for larger values. ## Background Ruby apps that handle money, tax rates or measurements often use Integers with extra business logic since Float can't represent most base-10 fractions exactly: ```ruby 0.1 + 0.2 == 0.3 #=> false 0.1 + 0.2 #=> 0.30000000000000004 0.1d + 0.2d == 0.3d #=> true 0.1d + 0.2d #=> 0.3d ``` Alternatives have tradeoffs: - **BigDecimal**: correct but 8x slower than Float on compound interest. - **Rational**: correct but equally slow, and `Rational("19.99").to_s` gives `"1999/100"`. - **Integer cents**: correct and fast but pushes formatting and decimal-point tracking into application code. ## Proposal ```ruby # Literal syntax price = 19.99d tax_rate = 0.0875d total = (price * (1d + tax_rate)).round(2) #=> 21.74d # Kernel converter (like Integer(), Float()) Decimal("29.99") #=> 29.99d Decimal(42) #=> 42.0d # Value semantics: frozen, Ractor-shareable 19.99d.frozen? #=> true # Full numeric protocol 19.99d + 1 #=> 20.99d 19.99d <=> 20.0d #=> -1 19.99d.round(1) #=> 20.0d # Human-focused string interpolation "$#{19.99d}" #=> "$19.99" "$#{BigDecimal("19.99").to_s("F")}" #=> "$19.99" ``` ### Features - **`d` literal suffix**: `42d`, `3.14d`, `0.1d` (matching `r` for Rational, `i` for Complex) - **Frozen and Ractor-shareable**: value semantics like Rational - **18 decimal places**: fixed precision, full signed 128-bit range - **`Kernel#Decimal()` converter**: with `exception: false` support - **Full numeric protocol**: arithmetic, comparison, coercion, rounding, `to_i`/`to_f`/`to_r`/`to_s`, pattern matching ## Performance Apple M4 with YJIT. All values pre-allocated outside the measurement loop. ### Compound interest: 360 monthly iterations `balance = (balance * (1 + rate)).round(2)` repeated 360 times. A tight loop of multiply, add, round. Both Decimal and Float produce $60,225.61. | Type | YJIT | No JIT | |------|------|--------| | Decimal (BID) | 93K i/s | 55K i/s | | Float | 83K i/s | 60K i/s | | Rational | 10.7K i/s | 9.4K i/s | | BigDecimal | 10.1K i/s | 9.3K i/s | With YJIT, Decimal is 1.12x faster than Float on compound interest. Without YJIT, Float is 1.1x faster. YJIT helps Decimal more (1.7x speedup vs Float's 1.4x) because BOPs and unchecked entry points skip the per-call type checks. Rational and BigDecimal are ~9x slower than Decimal either way. ### Per-operation (benchmark-driver, YJIT) | Operation | Decimal (BID) | Float | Ratio | |-----------|---------------|-------|-------| | add | 147M i/s | 160M i/s | 1.09x slower | | mul | 159M i/s | 158M i/s | ~parity | | round(2) | 118M i/s | 78M i/s | 1.5x faster | | div (inexact) | 49M i/s | 140M i/s | 2.9x slower | | parse | 34M i/s | 34M i/s | parity | | to_s | 32M i/s | 9M i/s | 3.4x faster | | sum(1000) | 1.27M i/s | 794K i/s | 1.6x faster | Add and mul are near Float parity. Round is 1.5x faster, to_s 3.4x faster. Division is 2.9x slower (inexact results need wide arithmetic). ## Design Two-tier storage, mirroring Fixnum/Bignum: ``` Significand <= 2^51 - 1: 8 bytes, no allocation 63 62 12 11 8 7 0 +---+------------------------+-----+-------+ | 0 | 1999 | 2 | 0x84 | +---+------------------------+-----+-------+ sign significand (51 bits) scale tag 64 bits encode sign, significand, decimal position and type tag. The value IS the VALUE, like Fixnum. All 15-digit significands fit. Some 16-digit significands fit (up to 2,251,799,813,685,247). Significand > 2^51 - 1: heap allocated +--------+ +----------------+----------------+--------------------------------+ | ptr | --> | flags + klass | value * 10**18 | +--------+ | 16 bytes | 16 bytes | VALUE +----------------+----------------+--------------------------------+ 8 bytes object header full i128 range, 18 decimal places Standard Ruby object header with embedded i128 payload. ``` `Decimal("12.34")` is an immediate. No object, no allocation, no GC. `Decimal("9_999_999_999_999_999.99")` promotes to heap (significand exceeds 51 bits). `Decimal("123_456_789_012_345_678_901_234_567_890_123_456.78")` raises RangeError (exceeds 128-bit range). ### Optimization layers The prototype implements analogous layers to Float and Integer: - 13 BOPs with `DECIMAL_REDEFINED_OP_FLAG` - Interpreter fast paths in `vm_opt_plus/minus/mult/div/mod`, `vm_opt_lt/le/gt/ge`, `opt_equality_specialized` - YJIT `Type::Decimal` with inline BID add/sub and BOP guard paths - ZJIT `types::Decimal` with profiler support and method annotations - Unchecked `_dd` entry points for YJIT and interpreter - Reciprocal lookup tables for division-free scale reduction ### Heap arithmetic Heap multiply and divide use 256-bit widening (same algorithm as Roc). Optional fast paths exploit the fact that SCALE (10^18) fits in u64: schoolbook two-division `wide_div`, single-operand `wide_mul_64` and Barrett reduction for the u128 case. These improve heap multiply by ~25% and heap division by ~50%. All are removable without affecting correctness or BID performance. ## Type coercion When Decimal interacts with other numeric types: ```ruby 1.5d + 1 #=> 2.5d (Integer promotes to Decimal) 1.5d + 0.5 #=> 2.0 (Decimal demotes to Float) 1.5d + 1/4r #=> 1.75d (Rational promotes to Decimal) 1.5d + 1/3r # ArgumentError (1/3 exceeds 18 decimal places) 1.5d == 1.5 #=> true (compared via Rational) 1.5d == 3/2r #=> true (Rational comparison via <=>) ``` Decimal + Integer returns Decimal (lossless). Decimal + Float returns Float (caller chose approximate arithmetic). Decimal + Rational returns Decimal when the Rational is exactly representable in 18 decimal places, raises ArgumentError otherwise. Conversion is exact. Only arithmetic results (`*`, `/`) truncate. ## Relationship to BigDecimal Decimal and BigDecimal serve different needs: - **Decimal**: fixed precision (18 places), core type, immediate encoding, JIT-optimized. For the common case: prices, percentages, measurements. - **BigDecimal**: arbitrary precision, bundled gem, heap-allocated. For when you need more than 18 decimal places or unbounded digit counts. They can coexist. The Decimal conversion method is `to_dec` to avoid conflict with `bigdecimal/util`, which defines `to_d`. If `to_d` can be shared or BigDecimal's deprecated, `to_d` would be more natural. ## Why two Decimal tiers Intel's BID64 gives a 64-bit immediate with 16 digits. Roc's Dec gives a 128-bit fixed-point value with 39 digits. Both are proven designs. Ruby's approach combines them, the same way Integer combines Fixnum and Bignum. Small values are immediates, large ones promote to heap. Transparent to the programmer. Two simpler alternatives are also viable: - **BID-only**: 15-16 digits (51-bit significand), zero allocation. Operations exceeding the BID range would raise. Half the code. - **i128-only**: 39 digits, one allocation per decimal. No dual paths. Simpler but slower with GC churn. ## Design details - **Fixed 18 decimal places**: 10^18 fits in a 64-bit integer, keeping the SCALE factor cheap for multiplication and division. 18 places cover all ISO 4217 currency subdivisions. - **Truncation toward zero for `*` and `/`**: consistent with C integer division. Floored division for `%`, `div`, `divmod` (matching Ruby's Integer). - **Exact input conversion**: `Decimal("1e-19")` raises `ArgumentError` because the value cannot be represented in 18 decimal places. `Decimal("1e-19", exception: false)` returns `nil`. Trailing zeros beyond 18 places are accepted: `Decimal("1.10000000000000000000")` is `1.1d`. Arithmetic truncation is separate and expected. - **Float conversion via `Float#to_s` then parse**: `Decimal(0.1)` gives `0.1d`, not `0.1000000000000000055...d`. - **`0d` is a Decimal literal**: `0d` produces `Decimal(0)`. `0d42` remains `Integer(42)` (the existing decimal-integer prefix). `0D42` also remains `Integer(42)` (only lowercase `d` produces Decimal). - **Frozen and Ractor-shareable**: like Rational. No mutable state. ## Portability The prototype requires `__int128` (GCC and Clang). For the heap variant, MSVC would need a two-word i128 emulation or a pure-C fallback using `int64_t hi, lo` fields along with appropriate operations. The BID immediate tier (64-bit only) works everywhere. ## Scope Implementation (`decimal.c`, `decimal.rb`), VM fast paths, YJIT and ZJIT type tracking and codegen, serialization, Kernel converters and `prism_compile.c`. The `d` literal suffix requires a small Prism upstream change (~60 lines in `prism.c` plus regenerated sources). Psych would need a separate patch for YAML serialization. Both would be submitted as upstream PRs if this proposal is accepted. ## Gem A gem version provides the same semantics as a C extension with pure Ruby fallback. It gets 14.2K i/s on compound interest with YJIT, versus core Decimal's 93K (6.5x slower). A gem cannot add VALUE tag bytes, register BOPs or teach YJIT new types, so it must heap-allocate every result and go through full method dispatch. ## Related work | Language | Type | Encoding | Precision | Normalized | |----------|------|----------|-----------|------------| | Intel libbid | BID64 | 1+13+50 combination field | 16 digits | no (cohorts) | | Intel libbid | BID128 | 1+17+110 floating-point | 34 digits | no (cohorts) | | Roc | Dec | i128 fixed-point (* 10^18) | 39 digits | n/a (fixed scale) | | C# | System.Decimal | 96-bit sig + 5-bit scale | 28-29 digits | no | | Ruby | Decimal (this) | 51-bit immediate + i128 heap | 15-16 digits (BID), full i128 (heap) | yes (canonical) | **Immediate tier vs Intel BID64**: Intel's combination-field encoding gets 16 digits from 64 bits (vs our 15-16) by implicitly encoding the leading significand digit. The cost is decoder complexity and unnormalized cohorts. `1.0` and `1.00` have different bit patterns, requiring rescaling for equality. Our BID encoding strips trailing zeros for a canonical form: equal immediates are always identical bit patterns, so equality is a single-word comparison. Heap decimals use i128 value comparison. **Heap tier vs Roc Dec**: nearly identical design. Same i128 scaled by 10^18, same 256-bit widening for multiply and divide. Our additions: reciprocal lookup tables for division-free scale reduction, Barrett reduction for the SCALE division and promotion to the immediate tier when results fit. **Both tiers vs C# System.Decimal**: C# uses a 96-bit significand with variable scale 0-28 in a single 128-bit value type. More precision (28-29 digits) than our immediate but no fast path. All arithmetic operates on three 32-bit words. Not normalized. Ruby doesn't have value types, so C#'s stack-allocation advantage doesn't apply. The tagged immediate achieves the same effect. Like Integer, the two tiers are invisible in Rubyland. A Decimal is a Decimal. Attempt at POC spike showing performance viability: https://github.com/ruby/ruby/pull/16659 -- https://bugs.ruby-lang.org/
Issue #21982 has been updated by byroot (Jean Boussier). I don't have a particular opinion on this feature, but if the main concern is performance, I think `bigdecimal` could likely be optimized first. Of course having to allocate an object will always be slower than an immediate, but as mentioned, now that BigDecimal is embedded, the allocation and sweeping is much faster than it used to be. Looking the at reported benchmark in this ticket prompted me to look at the `to_s` method, which apparently is the biggest disparity, and basically I think the gap can be closed easily: https://github.com/ruby/bigdecimal/pull/519 ---------------------------------------- Feature #21982: Add `Decimal` as a core numeric class https://bugs.ruby-lang.org/issues/21982#change-116936 * Author: shan (Shannon Skipper) * Status: Open * Assignee: mrkn (Kenta Murata) ---------------------------------------- # Feature: Add `Decimal` as a core numeric class ## Abstract Add `Decimal < Numeric` to Ruby core: exact base-10 arithmetic using a tagged immediate VALUE (like Fixnum) for small values, promoting to a 128-bit heap object for larger values. ## Background Ruby apps that handle money, tax rates or measurements often use Integers with extra business logic since Float can't represent most base-10 fractions exactly: ```ruby 0.1 + 0.2 == 0.3 #=> false 0.1 + 0.2 #=> 0.30000000000000004 0.1d + 0.2d == 0.3d #=> true 0.1d + 0.2d #=> 0.3d ``` Alternatives have tradeoffs: - **BigDecimal**: correct but 8x slower than Float on compound interest. - **Rational**: correct but equally slow, and `Rational("19.99").to_s` gives `"1999/100"`. - **Integer cents**: correct and fast but pushes formatting and decimal-point tracking into application code. ## Proposal ```ruby # Literal syntax price = 19.99d tax_rate = 0.0875d total = (price * (1d + tax_rate)).round(2) #=> 21.74d # Kernel converter (like Integer(), Float()) Decimal("29.99") #=> 29.99d Decimal(42) #=> 42.0d # Value semantics: frozen, Ractor-shareable 19.99d.frozen? #=> true # Full numeric protocol 19.99d + 1 #=> 20.99d 19.99d <=> 20.0d #=> -1 19.99d.round(1) #=> 20.0d # Human-focused string interpolation "$#{19.99d}" #=> "$19.99" "$#{BigDecimal("19.99").to_s("F")}" #=> "$19.99" ``` ### Features - **`d` literal suffix**: `42d`, `3.14d`, `0.1d` (matching `r` for Rational, `i` for Complex) - **Frozen and Ractor-shareable**: value semantics like Rational - **18 decimal places**: fixed precision, full signed 128-bit range - **`Kernel#Decimal()` converter**: with `exception: false` support - **Full numeric protocol**: arithmetic, comparison, coercion, rounding, `to_i`/`to_f`/`to_r`/`to_s`, pattern matching ## Performance Apple M4 with YJIT. All values pre-allocated outside the measurement loop. ### Compound interest: 360 monthly iterations `balance = (balance * (1 + rate)).round(2)` repeated 360 times. A tight loop of multiply, add, round. Both Decimal and Float produce $60,225.61. | Type | YJIT | No JIT | |------|------|--------| | Decimal (BID) | 93K i/s | 55K i/s | | Float | 83K i/s | 60K i/s | | Rational | 10.7K i/s | 9.4K i/s | | BigDecimal | 10.1K i/s | 9.3K i/s | With YJIT, Decimal is 1.12x faster than Float on compound interest. Without YJIT, Float is 1.1x faster. YJIT helps Decimal more (1.7x speedup vs Float's 1.4x) because BOPs and unchecked entry points skip the per-call type checks. Rational and BigDecimal are ~9x slower than Decimal either way. ### Per-operation (benchmark-driver, YJIT) | Operation | Decimal (BID) | Float | Ratio | |-----------|---------------|-------|-------| | add | 147M i/s | 160M i/s | 1.09x slower | | mul | 159M i/s | 158M i/s | ~parity | | round(2) | 118M i/s | 78M i/s | 1.5x faster | | div (inexact) | 49M i/s | 140M i/s | 2.9x slower | | parse | 34M i/s | 34M i/s | parity | | to_s | 32M i/s | 9M i/s | 3.4x faster | | sum(1000) | 1.27M i/s | 794K i/s | 1.6x faster | Add and mul are near Float parity. Round is 1.5x faster, to_s 3.4x faster. Division is 2.9x slower (inexact results need wide arithmetic). ## Design Two-tier storage, mirroring Fixnum/Bignum: ``` Significand <= 2^51 - 1: 8 bytes, no allocation 63 62 12 11 8 7 0 +---+------------------------+-----+-------+ | 0 | 1999 | 2 | 0x84 | +---+------------------------+-----+-------+ sign significand (51 bits) scale tag 64 bits encode sign, significand, decimal position and type tag. The value IS the VALUE, like Fixnum. All 15-digit significands fit. Some 16-digit significands fit (up to 2,251,799,813,685,247). Significand > 2^51 - 1: heap allocated +--------+ +----------------+----------------+--------------------------------+ | ptr | --> | flags + klass | value * 10**18 | +--------+ | 16 bytes | 16 bytes | VALUE +----------------+----------------+--------------------------------+ 8 bytes object header full i128 range, 18 decimal places Standard Ruby object header with embedded i128 payload. ``` `Decimal("12.34")` is an immediate. No object, no allocation, no GC. `Decimal("9_999_999_999_999_999.99")` promotes to heap (significand exceeds 51 bits). `Decimal("123_456_789_012_345_678_901_234_567_890_123_456.78")` raises RangeError (exceeds 128-bit range). ### Optimization layers The prototype implements analogous layers to Float and Integer: - 13 BOPs with `DECIMAL_REDEFINED_OP_FLAG` - Interpreter fast paths in `vm_opt_plus/minus/mult/div/mod`, `vm_opt_lt/le/gt/ge`, `opt_equality_specialized` - YJIT `Type::Decimal` with inline BID add/sub and BOP guard paths - ZJIT `types::Decimal` with profiler support and method annotations - Unchecked `_dd` entry points for YJIT and interpreter - Reciprocal lookup tables for division-free scale reduction ### Heap arithmetic Heap multiply and divide use 256-bit widening (same algorithm as Roc). Optional fast paths exploit the fact that SCALE (10^18) fits in u64: schoolbook two-division `wide_div`, single-operand `wide_mul_64` and Barrett reduction for the u128 case. These improve heap multiply by ~25% and heap division by ~50%. All are removable without affecting correctness or BID performance. ## Type coercion When Decimal interacts with other numeric types: ```ruby 1.5d + 1 #=> 2.5d (Integer promotes to Decimal) 1.5d + 0.5 #=> 2.0 (Decimal demotes to Float) 1.5d + 1/4r #=> 1.75d (Rational promotes to Decimal) 1.5d + 1/3r # ArgumentError (1/3 exceeds 18 decimal places) 1.5d == 1.5 #=> true (compared via Rational) 1.5d == 3/2r #=> true (Rational comparison via <=>) ``` Decimal + Integer returns Decimal (lossless). Decimal + Float returns Float (caller chose approximate arithmetic). Decimal + Rational returns Decimal when the Rational is exactly representable in 18 decimal places, raises ArgumentError otherwise. Conversion is exact. Only arithmetic results (`*`, `/`) truncate. ## Relationship to BigDecimal Decimal and BigDecimal serve different needs: - **Decimal**: fixed precision (18 places), core type, immediate encoding, JIT-optimized. For the common case: prices, percentages, measurements. - **BigDecimal**: arbitrary precision, bundled gem, heap-allocated. For when you need more than 18 decimal places or unbounded digit counts. They can coexist. The Decimal conversion method is `to_dec` to avoid conflict with `bigdecimal/util`, which defines `to_d`. If `to_d` can be shared or BigDecimal's deprecated, `to_d` would be more natural. ## Why two Decimal tiers Intel's BID64 gives a 64-bit immediate with 16 digits. Roc's Dec gives a 128-bit fixed-point value with 39 digits. Both are proven designs. Ruby's approach combines them, the same way Integer combines Fixnum and Bignum. Small values are immediates, large ones promote to heap. Transparent to the programmer. Two simpler alternatives are also viable: - **BID-only**: 15-16 digits (51-bit significand), zero allocation. Operations exceeding the BID range would raise. Half the code. - **i128-only**: 39 digits, one allocation per decimal. No dual paths. Simpler but slower with GC churn. ## Design details - **Fixed 18 decimal places**: 10^18 fits in a 64-bit integer, keeping the SCALE factor cheap for multiplication and division. 18 places cover all ISO 4217 currency subdivisions. - **Truncation toward zero for `*` and `/`**: consistent with C integer division. Floored division for `%`, `div`, `divmod` (matching Ruby's Integer). - **Exact input conversion**: `Decimal("1e-19")` raises `ArgumentError` because the value cannot be represented in 18 decimal places. `Decimal("1e-19", exception: false)` returns `nil`. Trailing zeros beyond 18 places are accepted: `Decimal("1.10000000000000000000")` is `1.1d`. Arithmetic truncation is separate and expected. - **Float conversion via `Float#to_s` then parse**: `Decimal(0.1)` gives `0.1d`, not `0.1000000000000000055...d`. - **`0d` is a Decimal literal**: `0d` produces `Decimal(0)`. `0d42` remains `Integer(42)` (the existing decimal-integer prefix). `0D42` also remains `Integer(42)` (only lowercase `d` produces Decimal). - **Frozen and Ractor-shareable**: like Rational. No mutable state. ## Portability The prototype requires `__int128` (GCC and Clang). For the heap variant, MSVC would need a two-word i128 emulation or a pure-C fallback using `int64_t hi, lo` fields along with appropriate operations. The BID immediate tier (64-bit only) works everywhere. ## Scope Implementation (`decimal.c`, `decimal.rb`), VM fast paths, YJIT and ZJIT type tracking and codegen, serialization, Kernel converters and `prism_compile.c`. The `d` literal suffix requires a small Prism upstream change (~60 lines in `prism.c` plus regenerated sources). Psych would need a separate patch for YAML serialization. Both would be submitted as upstream PRs if this proposal is accepted. ## Gem A gem version provides the same semantics as a C extension with pure Ruby fallback. It gets 14.2K i/s on compound interest with YJIT, versus core Decimal's 93K (6.5x slower). A gem cannot add VALUE tag bytes, register BOPs or teach YJIT new types, so it must heap-allocate every result and go through full method dispatch. ## Related work | Language | Type | Encoding | Precision | Normalized | |----------|------|----------|-----------|------------| | Intel libbid | BID64 | 1+13+50 combination field | 16 digits | no (cohorts) | | Intel libbid | BID128 | 1+17+110 floating-point | 34 digits | no (cohorts) | | Roc | Dec | i128 fixed-point (* 10^18) | 39 digits | n/a (fixed scale) | | C# | System.Decimal | 96-bit sig + 5-bit scale | 28-29 digits | no | | Ruby | Decimal (this) | 51-bit immediate + i128 heap | 15-16 digits (BID), full i128 (heap) | yes (canonical) | **Immediate tier vs Intel BID64**: Intel's combination-field encoding gets 16 digits from 64 bits (vs our 15-16) by implicitly encoding the leading significand digit. The cost is decoder complexity and unnormalized cohorts. `1.0` and `1.00` have different bit patterns, requiring rescaling for equality. Our BID encoding strips trailing zeros for a canonical form: equal immediates are always identical bit patterns, so equality is a single-word comparison. Heap decimals use i128 value comparison. **Heap tier vs Roc Dec**: nearly identical design. Same i128 scaled by 10^18, same 256-bit widening for multiply and divide. Our additions: reciprocal lookup tables for division-free scale reduction, Barrett reduction for the SCALE division and promotion to the immediate tier when results fit. **Both tiers vs C# System.Decimal**: C# uses a 96-bit significand with variable scale 0-28 in a single 128-bit value type. More precision (28-29 digits) than our immediate but no fast path. All arithmetic operates on three 32-bit words. Not normalized. Ruby doesn't have value types, so C#'s stack-allocation advantage doesn't apply. The tagged immediate achieves the same effect. Like Integer, the two tiers are invisible in Rubyland. A Decimal is a Decimal. Attempt at POC spike showing performance viability: https://github.com/ruby/ruby/pull/16659 -- https://bugs.ruby-lang.org/
participants (5)
-
byroot (Jean Boussier) -
Eregon (Benoit Daloze) -
mrkn (Kenta Murata) -
naruse (Yui NARUSE) -
shan (Shannon Skipper)