
Issue #20148 has been updated by duerst (Martin Dürst). Status changed from Open to Rejected The characters involved (shown right-to-left in most environments) are: U+0627 ا ARABIC LETTER ALEF U+00628 ب ARABIC LETTER BEH U+0062A ت ARABIC LETTER TEH U+00679 ٹ ARABIC LETTER TTEH U+0067E پ ARABIC LETTER PEH The first three characters are widely used in most if not all languages written with Arabic. The last two are more specific; in the code charts (see https://www.unicode.org/charts/PDF/U0600.pdf), TTEH has an annotation of 'Urdu', and PEH has an annotation of 'Persian, Urdu,...'. In the Urdu alphabet (see https://en.wikipedia.org/wiki/Urdu_alphabet), these are the first five letters, where PEH comes directly after BEH, and TTEH comes directly after TEH. The Ruby `sort` method sorts these letters/strings in Unicode codepoint order, the same way it does for all characters/strings. That's because sorting text is language-dependent. As an example, Swedish sorts 'ä' and 'ö' after 'z', whereas German sorts them with 'a' and 'o', respectively. It's impossible for `sort` to get it correct for both languages at the same time, and it would require a lot of data. I'm not sure how Arabic-speaking people would sort PEH or TTEH, if they recognize these letters at all. This is also similar to expecting `['a', 'A', 'b', 'B'].sort` to produce `['A', 'a', 'B', 'b']`, when it actually produces `["A", "B", "a", "b"]`. So I'm sorry to have to reject this because it works according to the specification. A feature request to provide language-specific string comparisons (e.g. `string1.<=>(string2, 'ur')` so that this can be used in a block with `sort` may be appropriate, but it will take quite some time to implement this. Alternatively, I suggest you define a hash for the Urdu alphabet order, e.g. ``` {"ا" => 1, "ب" => 2, "پ" => 3, "ت" => 4, "ٹ" => 5 }``` (the code above will look strange because of the effects of the Unicode Bidirectional algorithm, but it should be correct), and use that with the `sort_by` method to sort Urdu strings. ---------------------------------------- Bug #20148: Sorting not working as expected on Urdu words. https://bugs.ruby-lang.org/issues/20148#change-106018 * Author: zohaibnadeem13@gmail.com (Zohaib Nadeem) * Status: Rejected * Priority: Normal * ruby -v: 3.1.4 * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- I was trying to sort an array of Urdu characters and found out an ambiguity in the result. Here is the script that I am using. ['ا', 'پ', 'ب', 'ت', 'ٹ'].sort Actual Result: ["ا", "ب", "ت", "ٹ", "پ"] Expected Result: ["ا", "ب", 'پ', "ت", "ٹ"] -- https://bugs.ruby-lang.org/