[ruby-core:113895] [Ruby master Bug#19728] Automate (checking of) Regexp character property documentation

Issue #19728 has been reported by duerst (Martin Dürst). ---------------------------------------- Bug #19728: Automate (checking of) Regexp character property documentation https://bugs.ruby-lang.org/issues/19728 * Author: duerst (Martin Dürst) * Status: Open * Priority: Normal * Assignee: duerst (Martin Dürst) * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- This came up in a discussion at https://github.com/ruby/ruby/pull/7923. The documentation at doc/regexp.rdoc currently contains a list of character properties that can be used in regular expressions. But there is no guarantee that this list is updated when the Unicode version is updated. One idea is to create a ruby equivalent of https://github.com/k-takata/Onigmo/blob/master/tool/update-doc.py. Another idea is to just write a test that checks enc/unicode/$UNICODE_VERSION/name2ctype.h against the relevant part of the documentation file. This might make it easier for the documentation to be rewritten while guaranteeing that no properties get forgotten. -- https://bugs.ruby-lang.org/

Issue #19728 has been updated by janosch-x (Janosch Müller). How about doing it in [enc-unicode.rb](https://github.com/ruby/ruby/blob/master/tool/enc-unicode.rb)? On the one hand, this script is a bit convoluted as it is, and does not need another responsibility. On the other hand, it already passes a (quote) "human-friendly name for the group" to its `#make_const` method for every property that it creates, and the sections of the document could be based on that. It also has the abbreviations (e.g. LL for lowercase letter) available in its `aliases` variable. Generating the doc here would ensure an exact match of docs and code, whereas a test would probably not ensure e.g. that properties are in the correct section of the doc. ---------------------------------------- Bug #19728: Automate (checking of) Regexp character property documentation https://bugs.ruby-lang.org/issues/19728#change-103552 * Author: duerst (Martin Dürst) * Status: Open * Priority: Normal * Assignee: duerst (Martin Dürst) * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- This came up in a discussion at https://github.com/ruby/ruby/pull/7923. The documentation at doc/regexp.rdoc currently contains a list of character properties that can be used in regular expressions. But there is no guarantee that this list is updated when the Unicode version is updated. One idea is to create a ruby equivalent of https://github.com/k-takata/Onigmo/blob/master/tool/update-doc.py. Another idea is to just write a test that checks enc/unicode/$UNICODE_VERSION/name2ctype.h against the relevant part of the documentation file. This might make it easier for the documentation to be rewritten while guaranteeing that no properties get forgotten. -- https://bugs.ruby-lang.org/

Issue #19728 has been updated by janosch-x (Janosch Müller). I found that `enc-unicode.rb` deals with some inconsistent unicode data (i.e. some data which uses short property names and some data which uses long names), so it doesn't provide much useful context. I've made a PR to create documentation from the result instead: https://github.com/ruby/ruby/pull/7944 ---------------------------------------- Bug #19728: Automate (checking of) Regexp character property documentation https://bugs.ruby-lang.org/issues/19728#change-103559 * Author: duerst (Martin Dürst) * Status: Open * Priority: Normal * Assignee: duerst (Martin Dürst) * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN ---------------------------------------- This came up in a discussion at https://github.com/ruby/ruby/pull/7923. The documentation at doc/regexp.rdoc currently contains a list of character properties that can be used in regular expressions. But there is no guarantee that this list is updated when the Unicode version is updated. One idea is to create a ruby equivalent of https://github.com/k-takata/Onigmo/blob/master/tool/update-doc.py. Another idea is to just write a test that checks enc/unicode/$UNICODE_VERSION/name2ctype.h against the relevant part of the documentation file. This might make it easier for the documentation to be rewritten while guaranteeing that no properties get forgotten. -- https://bugs.ruby-lang.org/
participants (2)
-
duerst
-
janosch-x