[ruby-core:124905] [Ruby Feature#21932] `MatchData#get_int`
Issue #21932 has been reported by nobu (Nobuyoshi Nakada). ---------------------------------------- Feature #21932: `MatchData#get_int` https://bugs.ruby-lang.org/issues/21932 * Author: nobu (Nobuyoshi Nakada) * Status: Open ---------------------------------------- This is suggested by @akr today, `$~.get_int(1)` is equivalent to `$1.to_i` but does not create the intermediate string object. -- https://bugs.ruby-lang.org/
Issue #21932 has been updated by zenspider (Ryan Davis). Tried to add a comment to your commit but github is being very sketchy today. In the method comment on the impl side, you have examples for parsing a date... but IDGI... 1/2/10 are supposed to be the base arg, right? Base 1? ---------------------------------------- Feature #21932: `MatchData#get_int` https://bugs.ruby-lang.org/issues/21932#change-116577 * Author: nobu (Nobuyoshi Nakada) * Status: Open ---------------------------------------- This is suggested by @akr today, `$~.get_int(1)` is equivalent to `$1.to_i` but does not create the intermediate string object. https://github.com/nobu/ruby/tree/match-get_int -- https://bugs.ruby-lang.org/
Issue #21932 has been updated by nobu (Nobuyoshi Nakada). zenspider (Ryan Davis) wrote in #note-2:
In the method comment on the impl side, you have examples for parsing a date... but IDGI... 1/2/10 are supposed to be the base arg, right? Base 1?
I can't get from where the example comes. Do you want to mean something like this? ```ruby /\d+/.match("1/2/10").get_int(0) # => 1 /\d+/.match("1/2/10").get_int(0, 1) # invalid radix 1 (ArgumentError) ``` ---------------------------------------- Feature #21932: `MatchData#get_int` https://bugs.ruby-lang.org/issues/21932#change-116578 * Author: nobu (Nobuyoshi Nakada) * Status: Open ---------------------------------------- This is suggested by @akr today, `$~.get_int(1)` is equivalent to `$1.to_i` but does not create the intermediate string object. https://github.com/nobu/ruby/tree/match-get_int -- https://bugs.ruby-lang.org/
Issue #21932 has been updated by kou (Kouhei Sutou). FYI: strscan will use `integer_at` not `get_int`: https://github.com/ruby/strscan/pull/192#issuecomment-4002582149 ---------------------------------------- Feature #21932: `MatchData#get_int` https://bugs.ruby-lang.org/issues/21932#change-116603 * Author: nobu (Nobuyoshi Nakada) * Status: Open ---------------------------------------- This is suggested by @akr today, `$~.get_int(1)` is equivalent to `$1.to_i` but does not create the intermediate string object. https://github.com/nobu/ruby/tree/match-get_int -- https://bugs.ruby-lang.org/
Issue #21932 has been updated by matz (Yukihiro Matsumoto). I agree with adding `integer_at(n)` to `MatchData`, and `StringScanner` too (#21943). Matz. ---------------------------------------- Feature #21932: `MatchData#get_int` https://bugs.ruby-lang.org/issues/21932#change-116746 * Author: nobu (Nobuyoshi Nakada) * Status: Open ---------------------------------------- This is suggested by @akr today, `$~.get_int(1)` is equivalent to `$1.to_i` but does not create the intermediate string object. https://github.com/nobu/ruby/tree/match-get_int -- https://bugs.ruby-lang.org/
Issue #21932 has been updated by mame (Yusuke Endoh). Here is a supplement to Matz's decision. This method will basically follow the behavior of `String#to_i`. The base can be specified as the second argument: ```ruby "2024" =~ /(\d+)/ $~.integer_at(1) # => 2024 (default: base 10) $~.integer_at(1, 8) # => 1044 (interprets "2024" as base 8) $~.integer_at(1, 16) # => 8228 (interprets "2024" as base 16) ``` When it encounters non-numeric characters or an empty string, it behaves the same as `String#to_i`: ```ruby # integer_at should behave as String#to_i "foo" =~ /(...)/ $~.integer_at(1) # => 0 (== "foo".to_i) "0xF" =~ /(...)/ $~.integer_at(1) # => 0 (== "0xF".to_i, not 15) "" =~ /(\d*)/ $~.integer_at(1) # => 0 (== "".to_i) "1_0_0" =~ /(\d+(?:_\d+)*)/ $~.integer_at(1) # => 100 (== "1_0_0".to_i) ``` If the base is set to 0, it respects prefixes like `0x` (the same as `String#to_i(0)`): ```ruby "0xF" =~ /(...)/ $~.integer_at(1, 0) # => 15 (== "0xF".to_i(0)) ``` If there is no match for the group, it returns `nil`: ```ruby "b" =~ /(a)|(b)/ $~.integer_at(1) # => nil ``` ---------------------------------------- Feature #21932: `MatchData#get_int` https://bugs.ruby-lang.org/issues/21932#change-116768 * Author: nobu (Nobuyoshi Nakada) * Status: Open ---------------------------------------- This is suggested by @akr today, `$~.get_int(1)` is equivalent to `$1.to_i` but does not create the intermediate string object. https://github.com/nobu/ruby/tree/match-get_int -- https://bugs.ruby-lang.org/
Issue #21932 has been updated by Eregon (Benoit Daloze). I think returning 0 when the group isn't parseable as a number seems bad behavior. At least if I would use this method, I would expect two things of it: * It returns the Integer value of that group, without needing `Integer($N)` * It fails if the capture isn't a number, like Kernel#Integer Does anyone have a use case for returning 0 when the group isn't a number? It just seems like a "broken data" situation for no reason when e.g. using the wrong group number. ---------------------------------------- Feature #21932: `MatchData#get_int` https://bugs.ruby-lang.org/issues/21932#change-116771 * Author: nobu (Nobuyoshi Nakada) * Status: Open ---------------------------------------- This is suggested by @akr today, `$~.get_int(1)` is equivalent to `$1.to_i` but does not create the intermediate string object. https://github.com/nobu/ruby/tree/match-get_int -- https://bugs.ruby-lang.org/
Issue #21932 has been updated by naruse (Yui NARUSE). Eregon (Benoit Daloze) wrote in #note-8:
I think returning 0 when the group isn't parseable as a number seems bad behavior.
At least if I would use this method, I would expect two things of it: * It returns the Integer value of that group, without needing `Integer($N)` * It fails if the capture isn't a number, like Kernel#Integer
Does anyone have a use case for returning 0 when the group isn't a number? It just seems like a "broken data" situation for no reason when e.g. using the wrong group number.
There is two reason: 1. there are two major method to parse integer in Ruby: to_i and Integer(). * to_i is loose and the default base is 10 * Integer is strict, and the default base is `0`; it interprets "0o" and "0x" prefix In this use case, interpreting "0x" prefix is not useful. If this behavior is to_i, it is easy to explain the behavior. In other words, `match_data.get_int(n)` behaves as `match_data[n]&.to_i` 2. Distinguish with the group is not matched Considering `/(a)|(\d+)/ =~ "a"; $~.get_int(2)`. The current proposal says it returns nil. Another option for this case is exception, but I think it is not useful. At this time I can distinguish the case with matching "0", because this returns 0. Other minor reasons are... * for empty string, it will returns 0. * if you want to reject non integers, you can write strict regexp pattern. ---------------------------------------- Feature #21932: `MatchData#get_int` https://bugs.ruby-lang.org/issues/21932#change-116772 * Author: nobu (Nobuyoshi Nakada) * Status: Open ---------------------------------------- This is suggested by @akr today, `$~.get_int(1)` is equivalent to `$1.to_i` but does not create the intermediate string object. https://github.com/nobu/ruby/tree/match-get_int -- https://bugs.ruby-lang.org/
Issue #21932 has been updated by Eregon (Benoit Daloze). Thanks for the explanations. naruse (Yui NARUSE) wrote in #note-9:
In this use case, interpreting "0x" prefix is not useful
It could be useful, but one could workaround that with `/0x(\h+)/` instead of `/(0x\h+)/`. Leading 0 (octal) is likely more dangerous than `0x` though (`Integer("011")` => `9`).
If this behavior is to_i, it is easy to explain the behavior.
It wouldn't be hard to explain it's the same as `Integer($N, 10)`.
Distinguish with the group is not matched
Yes, agreed returning `nil` for group not matched is good.
for empty string, it will returns 0.
Could easily be handled as a special case but yeah not as simple as `Integer($N, 10)` then. Still fairly easy to explain/document.
if you want to reject non integers, you can write strict regexp pattern.
This reason convinces me, it's not bulletproof but should be enough guarantee for most cases to not return 0 except for actual 0's in input. BTW, given the method name is `MatchData#integer_at(n)`, people might expect it uses `Integer()` as that's very similar to the method name. ---------------------------------------- Feature #21932: `MatchData#get_int` https://bugs.ruby-lang.org/issues/21932#change-116774 * Author: nobu (Nobuyoshi Nakada) * Status: Open ---------------------------------------- This is suggested by @akr today, `$~.get_int(1)` is equivalent to `$1.to_i` but does not create the intermediate string object. https://github.com/nobu/ruby/tree/match-get_int -- https://bugs.ruby-lang.org/
participants (7)
-
Eregon (Benoit Daloze) -
kou (Kouhei Sutou) -
mame (Yusuke Endoh) -
matz (Yukihiro Matsumoto) -
naruse (Yui NARUSE) -
nobu (Nobuyoshi Nakada) -
zenspider (Ryan Davis)