[ruby-core:122842] [Ruby Feature#21518] Statistical helpers to `Enumerable`

Issue #21518 has been reported by Amitleshed (Amit Leshed). ---------------------------------------- Feature #21518: Statistical helpers to `Enumerable` https://bugs.ruby-lang.org/issues/21518 * Author: Amitleshed (Amit Leshed) * Status: Open ---------------------------------------- **Summary** I'd like to add two statistical helpers to `Enumerable`: - `Enumerable#average` (arithmetic mean) - `Enumerable#median` Both are small, well-defined operations that many Rubyists re-implement in apps and gems. Providing them in core avoids repeated, ad-hoc code and aligns with `Enumerable#sum`, which Ruby already ships. **Motivation** - These are among the most common “roll-your-own” helpers for arrays/ranges of numbers. - They are conceptually simple, universally useful beyond web/Rails. - Similar to `sum`, they’re primitives for quick data analysis, ETL scripts, CLI tooling, etc. - Including them encourages consistent semantics (what to do with empty sets, mixed numerics, etc.). ## Proposed API & Semantics ```ruby Enumerable#average -> Float or nil Enumerable#median -> Numeric or nil ``` ```ruby [1, 2, 3, 4].average # => 2.5 (1..4).average # => 2.5 [].average # => nil [1, 3, 2].median # => 2 [1, 2, 3, 10].median # => 2.5 (1..6).median # => 3.5 [].median # => nil ``` Ruby implementation ```ruby module Enumerable def average count = 0 total = 0.0 each do |x| raise TypeError, "non-numeric value for average" unless x.is_a?(Numeric) total += x count += 1 end count.zero? ? nil : total / count end def median arr = to_a return nil if arr.empty? arr.each { |x| raise TypeError, "non-numeric value for median" unless x.is_a?(Numeric) } arr.sort! mid = arr.length / 2 arr.length.odd? ? arr[mid] : (arr[mid - 1] + arr[mid]) / 2.0 end end ``` **Upon approval I'm more than willing to implement spec and code in C.** -- https://bugs.ruby-lang.org/

Issue #21518 has been updated by Dan0042 (Daniel DeLorme). In favor, just careful about the bug in #median ```ruby x = [1, 3, 2] x.median #=> 2 x #=> [1, 2, 3] modified by #median ``` ---------------------------------------- Feature #21518: Statistical helpers to `Enumerable` https://bugs.ruby-lang.org/issues/21518#change-114147 * Author: Amitleshed (Amit Leshed) * Status: Open ---------------------------------------- **Summary** I'd like to add two statistical helpers to `Enumerable`: - `Enumerable#average` (arithmetic mean) - `Enumerable#median` Both are small, well-defined operations that many Rubyists re-implement in apps and gems. Providing them in core avoids repeated, ad-hoc code and aligns with `Enumerable#sum`, which Ruby already ships. **Motivation** - These are among the most common “roll-your-own” helpers for arrays/ranges of numbers. - They are conceptually simple, universally useful beyond web/Rails. - Similar to `sum`, they’re primitives for quick data analysis, ETL scripts, CLI tooling, etc. - Including them encourages consistent semantics (what to do with empty sets, mixed numerics, etc.). ## Proposed API & Semantics ```ruby Enumerable#average -> Float or nil Enumerable#median -> Numeric or nil ``` ```ruby [1, 2, 3, 4].average # => 2.5 (1..4).average # => 2.5 [].average # => nil [1, 3, 2].median # => 2 [1, 2, 3, 10].median # => 2.5 (1..6).median # => 3.5 [].median # => nil ``` Ruby implementation ```ruby module Enumerable def average count = 0 total = 0.0 each do |x| raise TypeError, "non-numeric value for average" unless x.is_a?(Numeric) total += x count += 1 end count.zero? ? nil : total / count end def median arr = to_a return nil if arr.empty? arr.each { |x| raise TypeError, "non-numeric value for median" unless x.is_a?(Numeric) } arr.sort! mid = arr.length / 2 arr.length.odd? ? arr[mid] : (arr[mid - 1] + arr[mid]) / 2.0 end end ``` **Upon approval I'm more than willing to implement spec and code in C.** -- https://bugs.ruby-lang.org/

Issue #21518 has been updated by Amitleshed (Amit Leshed). Dan0042 (Daniel DeLorme) wrote in #note-1:
In favor, just careful about the bug in #median ```ruby x = [1, 3, 2] x.median #=> 2 x #=> [1, 2, 3] modified by #median ``` You'll want to use `arr = entries` rather than `arr = to_a`
Right. Great catch. Dan0042 (Daniel DeLorme) wrote in #note-1:
In favor, just careful about the bug in #median ```ruby x = [1, 3, 2] x.median #=> 2 x #=> [1, 2, 3] modified by #median ``` You'll want to use `arr = entries` rather than `arr = to_a`
Thanks, great catch! ---------------------------------------- Feature #21518: Statistical helpers to `Enumerable` https://bugs.ruby-lang.org/issues/21518#change-114148 * Author: Amitleshed (Amit Leshed) * Status: Open ---------------------------------------- **Summary** I'd like to add two statistical helpers to `Enumerable`: - `Enumerable#average` (arithmetic mean) - `Enumerable#median` Both are small, well-defined operations that many Rubyists re-implement in apps and gems. Providing them in core avoids repeated, ad-hoc code and aligns with `Enumerable#sum`, which Ruby already ships. **Motivation** - These are among the most common “roll-your-own” helpers for arrays/ranges of numbers. - They are conceptually simple, universally useful beyond web/Rails. - Similar to `sum`, they’re primitives for quick data analysis, ETL scripts, CLI tooling, etc. - Including them encourages consistent semantics (what to do with empty sets, mixed numerics, etc.). ## Proposed API & Semantics ```ruby Enumerable#average -> Float or nil Enumerable#median -> Numeric or nil ``` ```ruby [1, 2, 3, 4].average # => 2.5 (1..4).average # => 2.5 [].average # => nil [1, 3, 2].median # => 2 [1, 2, 3, 10].median # => 2.5 (1..6).median # => 3.5 [].median # => nil ``` Ruby implementation ```ruby module Enumerable def average count = 0 total = 0.0 each do |x| raise TypeError, "non-numeric value for average" unless x.is_a?(Numeric) total += x count += 1 end count.zero? ? nil : total / count end def median arr = to_a return nil if arr.empty? arr.each { |x| raise TypeError, "non-numeric value for median" unless x.is_a?(Numeric) } arr.sort! mid = arr.length / 2 arr.length.odd? ? arr[mid] : (arr[mid - 1] + arr[mid]) / 2.0 end end ``` **Upon approval I'm more than willing to implement spec and code in C.** -- https://bugs.ruby-lang.org/

Issue #21518 has been updated by Amitleshed (Amit Leshed). Thanks, great catch! ---------------------------------------- Feature #21518: Statistical helpers to `Enumerable` https://bugs.ruby-lang.org/issues/21518#change-114149 * Author: Amitleshed (Amit Leshed) * Status: Open ---------------------------------------- **Summary** I'd like to add two statistical helpers to `Enumerable`: - `Enumerable#average` (arithmetic mean) - `Enumerable#median` Both are small, well-defined operations that many Rubyists re-implement in apps and gems. Providing them in core avoids repeated, ad-hoc code and aligns with `Enumerable#sum`, which Ruby already ships. **Motivation** - These are among the most common “roll-your-own” helpers for arrays/ranges of numbers. - They are conceptually simple, universally useful beyond web/Rails. - Similar to `sum`, they’re primitives for quick data analysis, ETL scripts, CLI tooling, etc. - Including them encourages consistent semantics (what to do with empty sets, mixed numerics, etc.). ## Proposed API & Semantics ```ruby Enumerable#average -> Float or nil Enumerable#median -> Numeric or nil ``` ```ruby [1, 2, 3, 4].average # => 2.5 (1..4).average # => 2.5 [].average # => nil [1, 3, 2].median # => 2 [1, 2, 3, 10].median # => 2.5 (1..6).median # => 3.5 [].median # => nil ``` Ruby implementation ```ruby module Enumerable def average count = 0 total = 0.0 each do |x| raise TypeError, "non-numeric value for average" unless x.is_a?(Numeric) total += x count += 1 end count.zero? ? nil : total / count end def median arr = to_a return nil if arr.empty? arr.each { |x| raise TypeError, "non-numeric value for median" unless x.is_a?(Numeric) } arr.sort! mid = arr.length / 2 arr.length.odd? ? arr[mid] : (arr[mid - 1] + arr[mid]) / 2.0 end end ``` **Upon approval I'm more than willing to implement spec and code in C.** -- https://bugs.ruby-lang.org/

Issue #21518 has been updated by herwin (Herwin W). Ranges might need their own specialised implementation: this implementation will timeout on infinite ranges, and `(1..100000).average` (or `.median`) can be calculated without having to create an intermediate array. (Why anyone would want to calculate these values from this kind of Ranges is beyond me, but that's another issue) ---------------------------------------- Feature #21518: Statistical helpers to `Enumerable` https://bugs.ruby-lang.org/issues/21518#change-114156 * Author: Amitleshed (Amit Leshed) * Status: Open ---------------------------------------- **Summary** I'd like to add two statistical helpers to `Enumerable`: - `Enumerable#average` (arithmetic mean) - `Enumerable#median` Both are small, well-defined operations that many Rubyists re-implement in apps and gems. Providing them in core avoids repeated, ad-hoc code and aligns with `Enumerable#sum`, which Ruby already ships. **Motivation** - These are among the most common “roll-your-own” helpers for arrays/ranges of numbers. - They are conceptually simple, universally useful beyond web/Rails. - Similar to `sum`, they’re primitives for quick data analysis, ETL scripts, CLI tooling, etc. - Including them encourages consistent semantics (what to do with empty sets, mixed numerics, etc.). ## Proposed API & Semantics ```ruby Enumerable#average -> Float or nil Enumerable#median -> Numeric or nil ``` ```ruby [1, 2, 3, 4].average # => 2.5 (1..4).average # => 2.5 [].average # => nil [1, 3, 2].median # => 2 [1, 2, 3, 10].median # => 2.5 (1..6).median # => 3.5 [].median # => nil ``` Ruby implementation ```ruby module Enumerable def average count = 0 total = 0.0 each do |x| raise TypeError, "non-numeric value for average" unless x.is_a?(Numeric) total += x count += 1 end count.zero? ? nil : total / count end def median arr = to_a return nil if arr.empty? arr.each { |x| raise TypeError, "non-numeric value for median" unless x.is_a?(Numeric) } arr.sort! mid = arr.length / 2 arr.length.odd? ? arr[mid] : (arr[mid - 1] + arr[mid]) / 2.0 end end ``` **Upon approval I'm more than willing to implement spec and code in C.** -- https://bugs.ruby-lang.org/

Issue #21518 has been updated by Amitleshed (Amit Leshed). *** Thanks for the engagement everyone *** Here's a refactored version: ``` ruby module Enumerable def average return nil if none? return range_midpoint if numeric_range? total = 0.0 count = 0 each do |x| raise TypeError, "non-numeric value for average" unless x.is_a?(Numeric) total += x count += 1 end total / count end def median return nil if none? return range_midpoint if numeric_range? arr = entries arr.each { |x| raise TypeError, "non-numeric value for median" unless x.is_a?(Numeric) } arr.sort! mid = arr.length / 2 arr.length.odd? ? arr[mid] : (arr[mid - 1] + arr[mid]) / 2.0 end private def numeric_range? is_a?(Range) && first.is_a?(Numeric) && last.is_a?(Numeric) end def range_midpoint max = exclude_end? ? (last - step) : last (first + max) / 2.0 end end ``` ---------------------------------------- Feature #21518: Statistical helpers to `Enumerable` https://bugs.ruby-lang.org/issues/21518#change-114157 * Author: Amitleshed (Amit Leshed) * Status: Open ---------------------------------------- **Summary** I'd like to add two statistical helpers to `Enumerable`: - `Enumerable#average` (arithmetic mean) - `Enumerable#median` Both are small, well-defined operations that many Rubyists re-implement in apps and gems. Providing them in core avoids repeated, ad-hoc code and aligns with `Enumerable#sum`, which Ruby already ships. **Motivation** - These are among the most common “roll-your-own” helpers for arrays/ranges of numbers. - They are conceptually simple, universally useful beyond web/Rails. - Similar to `sum`, they’re primitives for quick data analysis, ETL scripts, CLI tooling, etc. - Including them encourages consistent semantics (what to do with empty sets, mixed numerics, etc.). ## Proposed API & Semantics ```ruby Enumerable#average -> Float or nil Enumerable#median -> Numeric or nil ``` ```ruby [1, 2, 3, 4].average # => 2.5 (1..4).average # => 2.5 [].average # => nil [1, 3, 2].median # => 2 [1, 2, 3, 10].median # => 2.5 (1..6).median # => 3.5 [].median # => nil ``` Ruby implementation ```ruby module Enumerable def average count = 0 total = 0.0 each do |x| raise TypeError, "non-numeric value for average" unless x.is_a?(Numeric) total += x count += 1 end count.zero? ? nil : total / count end def median arr = to_a return nil if arr.empty? arr.each { |x| raise TypeError, "non-numeric value for median" unless x.is_a?(Numeric) } arr.sort! mid = arr.length / 2 arr.length.odd? ? arr[mid] : (arr[mid - 1] + arr[mid]) / 2.0 end end ``` **Upon approval I'm more than willing to implement spec and code in C.** -- https://bugs.ruby-lang.org/

Issue #21518 has been updated by mame (Yusuke Endoh). Naturally, these methods have been desired by some people for a very long time, but Ruby has historically been very cautious about introducing them. Even the obviously useful `#sum` method was only added in 2016, which is relatively recent in Ruby's history. One reason behind this caution is the reluctance to add methods to Array that assume all elements are Integer or Float. Since Array can contain Strings or other non-numeric objects, there's a question of whether it is appropriate to add methods that make no sense in such cases. The reason why `#sum` was eventually added was the growing attention to an algorithm called the Kahan-Babuska Summation Algorithm. This is a clever algorithm that reduces floating-point error when summing, and it is actually implemented in `Array#sum`. Before this algorithm gained attention, I remember the prevailing opinion was that it should be written explicitly, like `ary.inject(0, &:+)`. For now, you may want to try using https://github.com/red-data-tools/enumerable-statistics to get a better idea of what you actually need. ---------------------------------------- Feature #21518: Statistical helpers to `Enumerable` https://bugs.ruby-lang.org/issues/21518#change-114167 * Author: Amitleshed (Amit Leshed) * Status: Open ---------------------------------------- **Summary** I'd like to add two statistical helpers to `Enumerable`: - `Enumerable#average` (arithmetic mean) - `Enumerable#median` Both are small, well-defined operations that many Rubyists re-implement in apps and gems. Providing them in core avoids repeated, ad-hoc code and aligns with `Enumerable#sum`, which Ruby already ships. **Motivation** - These are among the most common “roll-your-own” helpers for arrays/ranges of numbers. - They are conceptually simple, universally useful beyond web/Rails. - Similar to `sum`, they’re primitives for quick data analysis, ETL scripts, CLI tooling, etc. - Including them encourages consistent semantics (what to do with empty sets, mixed numerics, etc.). ## Proposed API & Semantics ```ruby Enumerable#average -> Float or nil Enumerable#median -> Numeric or nil ``` ```ruby [1, 2, 3, 4].average # => 2.5 (1..4).average # => 2.5 [].average # => nil [1, 3, 2].median # => 2 [1, 2, 3, 10].median # => 2.5 (1..6).median # => 3.5 [].median # => nil ``` Ruby implementation ```ruby module Enumerable def average count = 0 total = 0.0 each do |x| raise TypeError, "non-numeric value for average" unless x.is_a?(Numeric) total += x count += 1 end count.zero? ? nil : total / count end def median arr = to_a return nil if arr.empty? arr.each { |x| raise TypeError, "non-numeric value for median" unless x.is_a?(Numeric) } arr.sort! mid = arr.length / 2 arr.length.odd? ? arr[mid] : (arr[mid - 1] + arr[mid]) / 2.0 end end ``` **Upon approval I'm more than willing to implement spec and code in C.** -- https://bugs.ruby-lang.org/

Issue #21518 has been updated by matheusrich (Matheus Richard). I wonder if these helpers could be inside `Math::Statistics`: ```rb Math::Statistics.average(some_enumerable) ``` I think it would be okay for this module to assume the arguments are numeric. ---------------------------------------- Feature #21518: Statistical helpers to `Enumerable` https://bugs.ruby-lang.org/issues/21518#change-114191 * Author: Amitleshed (Amit Leshed) * Status: Open ---------------------------------------- **Summary** I'd like to add two statistical helpers to `Enumerable`: - `Enumerable#average` (arithmetic mean) - `Enumerable#median` Both are small, well-defined operations that many Rubyists re-implement in apps and gems. Providing them in core avoids repeated, ad-hoc code and aligns with `Enumerable#sum`, which Ruby already ships. **Motivation** - These are among the most common “roll-your-own” helpers for arrays/ranges of numbers. - They are conceptually simple, universally useful beyond web/Rails. - Similar to `sum`, they’re primitives for quick data analysis, ETL scripts, CLI tooling, etc. - Including them encourages consistent semantics (what to do with empty sets, mixed numerics, etc.). ## Proposed API & Semantics ```ruby Enumerable#average -> Float or nil Enumerable#median -> Numeric or nil ``` ```ruby [1, 2, 3, 4].average # => 2.5 (1..4).average # => 2.5 [].average # => nil [1, 3, 2].median # => 2 [1, 2, 3, 10].median # => 2.5 (1..6).median # => 3.5 [].median # => nil ``` Ruby implementation ```ruby module Enumerable def average count = 0 total = 0.0 each do |x| raise TypeError, "non-numeric value for average" unless x.is_a?(Numeric) total += x count += 1 end count.zero? ? nil : total / count end def median arr = to_a return nil if arr.empty? arr.each { |x| raise TypeError, "non-numeric value for median" unless x.is_a?(Numeric) } arr.sort! mid = arr.length / 2 arr.length.odd? ? arr[mid] : (arr[mid - 1] + arr[mid]) / 2.0 end end ``` **Upon approval I'm more than willing to implement spec and code in C.** -- https://bugs.ruby-lang.org/
participants (5)
-
Amitleshed (Amit Leshed)
-
Dan0042 (Daniel DeLorme)
-
herwin (Herwin W)
-
mame (Yusuke Endoh)
-
matheusrich (Matheus Richard)