[ruby-core:116016] [Ruby master Bug#20150] Memory leak in grapheme clusters

Issue #20150 has been reported by peterzhu2118 (Peter Zhu). ---------------------------------------- Bug #20150: Memory leak in grapheme clusters https://bugs.ruby-lang.org/issues/20150 * Author: peterzhu2118 (Peter Zhu) * Status: Open * Priority: Normal * Backport: 3.0: UNKNOWN, 3.1: REQUIRED, 3.2: REQUIRED, 3.3: REQUIRED ---------------------------------------- GitHub PR: https://github.com/ruby/ruby/pull/9414 String#grapheme_cluters and String#each_grapheme_cluster leaks memory because if the string is not UTF-8, then the created regex will not be freed. For example: ```ruby str = "hello world".encode(Encoding::UTF_32LE) 10.times do 1_000.times do str.grapheme_clusters end puts `ps -o rss= -p #{$$}` end ``` Before: ``` 26000 42256 59008 75792 92528 109232 125936 142672 159392 176160 ``` After: ``` 9264 9504 9808 10000 10128 10224 10352 10544 10704 10896 ``` -- https://bugs.ruby-lang.org/

Issue #20150 has been updated by jeremyevans0 (Jeremy Evans). Status changed from Open to Closed Fixed by commit:b3d612804946e841e47d14e09b6839224a79c1a4 ---------------------------------------- Bug #20150: Memory leak in grapheme clusters https://bugs.ruby-lang.org/issues/20150#change-106087 * Author: peterzhu2118 (Peter Zhu) * Status: Closed * Priority: Normal * Backport: 3.0: UNKNOWN, 3.1: REQUIRED, 3.2: REQUIRED, 3.3: REQUIRED ---------------------------------------- GitHub PR: https://github.com/ruby/ruby/pull/9414 String#grapheme_cluters and String#each_grapheme_cluster leaks memory because if the string is not UTF-8, then the created regex will not be freed. For example: ```ruby str = "hello world".encode(Encoding::UTF_32LE) 10.times do 1_000.times do str.grapheme_clusters end puts `ps -o rss= -p #{$$}` end ``` Before: ``` 26000 42256 59008 75792 92528 109232 125936 142672 159392 176160 ``` After: ``` 9264 9504 9808 10000 10128 10224 10352 10544 10704 10896 ``` -- https://bugs.ruby-lang.org/

Issue #20150 has been updated by nagachika (Tomoyuki Chikanaga). Backport changed from 3.0: UNKNOWN, 3.1: REQUIRED, 3.2: REQUIRED, 3.3: REQUIRED to 3.0: UNKNOWN, 3.1: REQUIRED, 3.2: DONE, 3.3: REQUIRED ruby_3_2 b4f8623441a8be53b643fed826ba44e933cafd7e merged revision(s) b3d612804946e841e47d14e09b6839224a79c1a4. ---------------------------------------- Bug #20150: Memory leak in grapheme clusters https://bugs.ruby-lang.org/issues/20150#change-106310 * Author: peterzhu2118 (Peter Zhu) * Status: Closed * Priority: Normal * Backport: 3.0: UNKNOWN, 3.1: REQUIRED, 3.2: DONE, 3.3: REQUIRED ---------------------------------------- GitHub PR: https://github.com/ruby/ruby/pull/9414 String#grapheme_cluters and String#each_grapheme_cluster leaks memory because if the string is not UTF-8, then the created regex will not be freed. For example: ```ruby str = "hello world".encode(Encoding::UTF_32LE) 10.times do 1_000.times do str.grapheme_clusters end puts `ps -o rss= -p #{$$}` end ``` Before: ``` 26000 42256 59008 75792 92528 109232 125936 142672 159392 176160 ``` After: ``` 9264 9504 9808 10000 10128 10224 10352 10544 10704 10896 ``` -- https://bugs.ruby-lang.org/

Hello everybody (but in particular Tomoyuki Chikanaga and Yui Naruse), On 2024-01-18 12:21, nagachika (Tomoyuki Chikanaga) via ruby-core wrote:
Issue #20150 has been updated by nagachika (Tomoyuki Chikanaga).
Backport changed from 3.0: UNKNOWN, 3.1: REQUIRED, 3.2: REQUIRED, 3.3: REQUIRED to 3.0: UNKNOWN, 3.1: REQUIRED, 3.2: DONE, 3.3: REQUIRED
I was under the impression that backports of bug fixes had to "trickle down", i.e. first being applied in the main branch, then 3.3, then 3.2, and so on (of course unless they were not needed for a specific branch). The above "3.2: DONE, 3.3: REQUIRED" shows that the backport first occurred in 3.2, before 3.3. Can somebody please confirm or restate the actual backport policy now in effect? Thanks and regards, Martin.
ruby_3_2 b4f8623441a8be53b643fed826ba44e933cafd7e merged revision(s) b3d612804946e841e47d14e09b6839224a79c1a4.
---------------------------------------- Bug #20150: Memory leak in grapheme clusters https://bugs.ruby-lang.org/issues/20150#change-106310
* Author: peterzhu2118 (Peter Zhu) * Status: Closed * Priority: Normal * Backport: 3.0: UNKNOWN, 3.1: REQUIRED, 3.2: DONE, 3.3: REQUIRED ---------------------------------------- GitHub PR: https://github.com/ruby/ruby/pull/9414
String#grapheme_cluters and String#each_grapheme_cluster leaks memory because if the string is not UTF-8, then the created regex will not be freed.

Issue #20150 has been updated by nagachika (Tomoyuki Chikanaga). Hello, Martin-sensei. In my understandings, there's no explicit rule regarding the order of backporting to each stable branch. In this case, I backported the changeset to the 3.2 branch ahead of the 3.3 branch because I hoped to include some obvious bug-fixes in ruby-3.2.3 released yesterday. I also think these fixes should be backported to 3.3 branch before release of ruby-3.3.1, but it's up to naruse-san, the current 3.3 branch maintainer. Best Regards, ---------------------------------------- Bug #20150: Memory leak in grapheme clusters https://bugs.ruby-lang.org/issues/20150#change-106342 * Author: peterzhu2118 (Peter Zhu) * Status: Closed * Priority: Normal * Backport: 3.0: UNKNOWN, 3.1: REQUIRED, 3.2: DONE, 3.3: REQUIRED ---------------------------------------- GitHub PR: https://github.com/ruby/ruby/pull/9414 String#grapheme_cluters and String#each_grapheme_cluster leaks memory because if the string is not UTF-8, then the created regex will not be freed. For example: ```ruby str = "hello world".encode(Encoding::UTF_32LE) 10.times do 1_000.times do str.grapheme_clusters end puts `ps -o rss= -p #{$$}` end ``` Before: ``` 26000 42256 59008 75792 92528 109232 125936 142672 159392 176160 ``` After: ``` 9264 9504 9808 10000 10128 10224 10352 10544 10704 10896 ``` -- https://bugs.ruby-lang.org/

Hello Tomoyuki, Many thanks for your careful explanation! Regards, Martin. On 2024-01-19 17:14, nagachika (Tomoyuki Chikanaga) via ruby-core wrote:
Issue #20150 has been updated by nagachika (Tomoyuki Chikanaga).
Hello, Martin-sensei.
In my understandings, there's no explicit rule regarding the order of backporting to each stable branch. In this case, I backported the changeset to the 3.2 branch ahead of the 3.3 branch because I hoped to include some obvious bug-fixes in ruby-3.2.3 released yesterday. I also think these fixes should be backported to 3.3 branch before release of ruby-3.3.1, but it's up to naruse-san, the current 3.3 branch maintainer.
Best Regards,
---------------------------------------- Bug #20150: Memory leak in grapheme clusters https://bugs.ruby-lang.org/issues/20150#change-106342
* Author: peterzhu2118 (Peter Zhu) * Status: Closed * Priority: Normal * Backport: 3.0: UNKNOWN, 3.1: REQUIRED, 3.2: DONE, 3.3: REQUIRED ---------------------------------------- GitHub PR: https://github.com/ruby/ruby/pull/9414
String#grapheme_cluters and String#each_grapheme_cluster leaks memory because if the string is not UTF-8, then the created regex will not be freed.
For example:
```ruby str = "hello world".encode(Encoding::UTF_32LE)
10.times do 1_000.times do str.grapheme_clusters end
puts `ps -o rss= -p #{$$}` end ```
Before:
``` 26000 42256 59008 75792 92528 109232 125936 142672 159392 176160 ```
After:
``` 9264 9504 9808 10000 10128 10224 10352 10544 10704 10896 ```

Issue #20150 has been updated by naruse (Yui NARUSE). Backport changed from 3.0: UNKNOWN, 3.1: REQUIRED, 3.2: DONE, 3.3: REQUIRED to 3.0: UNKNOWN, 3.1: REQUIRED, 3.2: DONE, 3.3: DONE ruby_3_3 62de3eb5a2e5b1f0f1516dc99241c4c54a1bf691 merged revision(s) b3d612804946e841e47d14e09b6839224a79c1a4. ---------------------------------------- Bug #20150: Memory leak in grapheme clusters https://bugs.ruby-lang.org/issues/20150#change-107273 * Author: peterzhu2118 (Peter Zhu) * Status: Closed * Backport: 3.0: UNKNOWN, 3.1: REQUIRED, 3.2: DONE, 3.3: DONE ---------------------------------------- GitHub PR: https://github.com/ruby/ruby/pull/9414 String#grapheme_cluters and String#each_grapheme_cluster leaks memory because if the string is not UTF-8, then the created regex will not be freed. For example: ```ruby str = "hello world".encode(Encoding::UTF_32LE) 10.times do 1_000.times do str.grapheme_clusters end puts `ps -o rss= -p #{$$}` end ``` Before: ``` 26000 42256 59008 75792 92528 109232 125936 142672 159392 176160 ``` After: ``` 9264 9504 9808 10000 10128 10224 10352 10544 10704 10896 ``` -- https://bugs.ruby-lang.org/
participants (5)
-
jeremyevans0 (Jeremy Evans)
-
Martin J. Dürst
-
nagachika (Tomoyuki Chikanaga)
-
naruse (Yui NARUSE)
-
peterzhu2118 (Peter Zhu)