[ruby-core:114181] [Ruby master Bug#19767] [Not really a bug, but more a not ideal notification] "historical binary regexp match" when using the "n" modifier in a ruby regex

Issue #19767 has been reported by rubyFeedback (robert heiler). ---------------------------------------- Bug #19767: [Not really a bug, but more a not ideal notification] "historical binary regexp match" when using the "n" modifier in a ruby regex https://bugs.ruby-lang.org/issues/19767 * Author: rubyFeedback (robert heiler) * Status: Open * Priority: Normal * ruby -v: ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-linux] * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- To get my knowledge about ruby regexes up-to-date I have been going through this tutorial/book here at: https://learnbyexample.github.io/Ruby_Regexp/unicode.html One example they provide is this, with some odd characters: 'fox:αλεπού'.scan(/\w+/n) This will match the found word ("fox"), but it also reports the following warning: warning: historical binary regexp match /.../n against UTF-8 string Now: this may be obvious to others, but to me personally I am not sure what a "historical" binary regexp match actually is. I assume it may have meant that this was more used in the past, and may be discouraged now? Or is something else meant? What does "historical" mean in this context? I may not be the only one who does not fully understand the term historical. Most of ruby's warnings are fairly easy to understand, but this one seems odd. Right now I do not know whether we can use the "n" modifier in a regex - not that I really have a good use case for it (I am using UTF-8 these days, so I don't seem to need ASCII-8BIT anyway), but perhaps the warning could be changed a little. I have no good alternative suggestion how it can be changed, largely because I do not know what it actually means, e. g. what is "historical" about it (but, even then, I'd actually recommend against using the word "historical" because I don't understand what it means; deprecated is easy to understand, historical does not tell me anything). Perhaps it could be expressed somewhat differently and we could get rid of the word "historical" there? Either way, it's a tiny issue so I was not even sure whether to report it. But, from the point of view of other warnings, I believe the term "historical" does not tell the user enough about what the issue is here. (irb):1: warning: historical binary regexp match /.../n against UTF-8 string => ["fox"] -- https://bugs.ruby-lang.org/

Issue #19767 has been updated by Dan0042 (Daniel DeLorme). The "historical" and "binary" parts were added in 2017 https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/d8c... https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/dbd... The original warning was added in 2008 https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/880... It means that even though it may look like a binary regexp, it doesn't act like one. `"é"[/./n] == "é"`, not the first byte of "é" TBH I don't know why it was done that way. It would be convenient if `/.../n =~ str` was equivalent to `/.../n =~ str.b` but without the intermediary string. ---------------------------------------- Misc #19767: [Not really a bug, but more a not ideal notification] "historical binary regexp match" when using the "n" modifier in a ruby regex https://bugs.ruby-lang.org/issues/19767#change-104790 * Author: rubyFeedback (robert heiler) * Status: Open * Priority: Normal ---------------------------------------- To get my knowledge about ruby regexes up-to-date I have been going through this tutorial/book here at: https://learnbyexample.github.io/Ruby_Regexp/unicode.html One example they provide is this, with some odd characters: 'fox:αλεπού'.scan(/\w+/n) This will match the found word ("fox"), but it also reports the following warning: warning: historical binary regexp match /.../n against UTF-8 string Now: this may be obvious to others, but to me personally I am not sure what a "historical" binary regexp match actually is. I assume it may have meant that this was more used in the past, and may be discouraged now? Or is something else meant? What does "historical" mean in this context? I may not be the only one who does not fully understand the term historical. Most of ruby's warnings are fairly easy to understand, but this one seems odd. Right now I do not know whether we can use the "n" modifier in a regex - not that I really have a good use case for it (I am using UTF-8 these days, so I don't seem to need ASCII-8BIT anyway), but perhaps the warning could be changed a little. I have no good alternative suggestion how it can be changed, largely because I do not know what it actually means, e. g. what is "historical" about it (but, even then, I'd actually recommend against using the word "historical" because I don't understand what it means; deprecated is easy to understand, historical does not tell me anything). Perhaps it could be expressed somewhat differently and we could get rid of the word "historical" there? Either way, it's a tiny issue so I was not even sure whether to report it. But, from the point of view of other warnings, I believe the term "historical" does not tell the user enough about what the issue is here. (irb):1: warning: historical binary regexp match /.../n against UTF-8 string => ["fox"] -- https://bugs.ruby-lang.org/

😃 пт, 29 сент. 2023 г. в 18:57, Dan0042 (Daniel DeLorme) via ruby-core < ruby-core@ml.ruby-lang.org>:
Issue #19767 has been updated by Dan0042 (Daniel DeLorme).
The "historical" and "binary" parts were added in 2017
https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/d8c...
https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/dbd... The original warning was added in 2008
https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/880...
It means that even though it may look like a binary regexp, it doesn't act like one. `"é"[/./n] == "é"`, not the first byte of "é"
TBH I don't know why it was done that way. It would be convenient if `/.../n =~ str` was equivalent to `/.../n =~ str.b` but without the intermediary string.
---------------------------------------- Misc #19767: [Not really a bug, but more a not ideal notification] "historical binary regexp match" when using the "n" modifier in a ruby regex https://bugs.ruby-lang.org/issues/19767#change-104790
* Author: rubyFeedback (robert heiler) * Status: Open * Priority: Normal ---------------------------------------- To get my knowledge about ruby regexes up-to-date I have been going through this tutorial/book here at:
https://learnbyexample.github.io/Ruby_Regexp/unicode.html
One example they provide is this, with some odd characters:
'fox:αλεπού'.scan(/\w+/n)
This will match the found word ("fox"), but it also reports the following warning:
warning: historical binary regexp match /.../n against UTF-8 string
Now: this may be obvious to others, but to me personally I am not sure what a "historical" binary regexp match actually is. I assume it may have meant that this was more used in the past, and may be discouraged now? Or is something else meant? What does "historical" mean in this context?
I may not be the only one who does not fully understand the term historical. Most of ruby's warnings are fairly easy to understand, but this one seems odd. Right now I do not know whether we can use the "n" modifier in a regex - not that I really have a good use case for it (I am using UTF-8 these days, so I don't seem to need ASCII-8BIT anyway), but perhaps the warning could be changed a little.
I have no good alternative suggestion how it can be changed, largely because I do not know what it actually means, e. g. what is "historical" about it (but, even then, I'd actually recommend against using the word "historical" because I don't understand what it means; deprecated is easy to understand, historical does not tell me anything).
Perhaps it could be expressed somewhat differently and we could get rid of the word "historical" there? Either way, it's a tiny issue so I was not even sure whether to report it. But, from the point of view of other warnings, I believe the term "historical" does not tell the user enough about what the issue is here.
(irb):1: warning: historical binary regexp match /.../n against UTF-8 string => ["fox"]
-- https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org...
participants (3)
-
Dan0042 (Daniel DeLorme)
-
rubyFeedback (robert heiler)
-
Владислав Родин