[ruby-core:125238] [Ruby Bug#21993] `rb_gc_update_tbl_refs` is incorrectly documented as the dcompact pair for `rb_mark_tbl_no_pin`, and is unsafe for `st_table`s with non-VALUE keys
Issue #21993 has been reported by ivoanjo (Ivo Anjo). ---------------------------------------- Bug #21993: `rb_gc_update_tbl_refs` is incorrectly documented as the dcompact pair for `rb_mark_tbl_no_pin`, and is unsafe for `st_table`s with non-VALUE keys https://bugs.ruby-lang.org/issues/21993 * Author: ivoanjo (Ivo Anjo) * Status: Open * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN ---------------------------------------- Hey! I work for Datadog on the [Ruby profiler](https://github.com/datadog/dd-trace-rb) and I've been exploring the TypedData GC API to understand the correct patterns for writing compaction-aware extensions. While reading through the API, I noticed a mismatch between the documented pairing of `rb_mark_tbl_no_pin` + `rb_gc_update_tbl_refs` and what the implementation actually does that seems like it can cause crashes and even corruption for extensions. ### The problem The public header documents `rb_gc_update_tbl_refs` as the `dcompact` counterpart for `rb_mark_tbl_no_pin`: ``` * Updates references inside of tables. After you marked values using * rb_mark_tbl_no_pin(), the objects inside of the table could of course be * moved. This function is to fixup those references. ``` However, `rb_mark_tbl_no_pin`'s implementation only marks **values**, not keys: ```c // gc.c / gc/gc.h static int gc_mark_tbl_no_pin_i(st_data_t key, st_data_t value, st_data_t data) { rb_gc_mark_movable((VALUE)value); // key is never touched return ST_CONTINUE; } ``` Thus, tables used with `rb_mark_tbl_no_pin` can have non-VALUE keys (e.g. `ID`s, raw integers, C pointers, etc). The problem is that `rb_gc_update_tbl_refs` (via `gc_update_table_refs`) calls `rb_gc_location` on **both** keys and values: ```c static int hash_foreach_replace(st_data_t key, st_data_t value, ...) { if (rb_gc_location((VALUE)key) != (VALUE)key) // <- called on key return ST_REPLACE; if (rb_gc_location((VALUE)value) != (VALUE)value) return ST_REPLACE; ... } ``` Calling `rb_gc_location` on something that's not a `VALUE` and even sometimes replacing it is probably going to cause crashes and heap corruption. ### Suggested fixes I can think of a few ways to address this: **a) Deprecate `rb_gc_update_tbl_refs`** Maybe just deprecate it? I'm not even sure if this gets used very often. **b) Introduce `rb_mark_hash_no_pin`** The `mark_keyvalue` iterator already exists internally and is used by `mark_hash()` for Ruby's own `Hash` objects. Exposing a public `rb_mark_hash_no_pin` would give `rb_gc_update_tbl_refs` a safe, correct `dmark` counterpart for tables where both keys and values are Ruby objects. **c) Fix the docs and document the precondition for `rb_gc_update_tbl_refs`** Update the docs to make explicit that `rb_gc_update_tbl_refs` assumes a `VALUE => VALUE` table, and that all keys **must have been marked** (e.g. via `rb_mark_set` or a manual iteration with `rb_gc_mark_movable`) in addition to calling `rb_mark_tbl_no_pin` for the values. Weird and ugly, but correct? --- Happy to help with a patch if useful. -- https://bugs.ruby-lang.org/
Issue #21993 has been updated by luke-gru (Luke Gruber). Nice catch! I think this would only be a problem if the key was a number that was also a valid pointer into the Ruby heap and it happened to point to a T_MOVED object. But it should definitely be fixed. ---------------------------------------- Bug #21993: `rb_gc_update_tbl_refs` is incorrectly documented as the dcompact pair for `rb_mark_tbl_no_pin`, and is unsafe for `st_table`s with non-VALUE keys https://bugs.ruby-lang.org/issues/21993#change-117043 * Author: ivoanjo (Ivo Anjo) * Status: Open * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN ---------------------------------------- Hey! I work for Datadog on the [Ruby profiler](https://github.com/datadog/dd-trace-rb) and I've been exploring the TypedData GC API to understand the correct patterns for writing compaction-aware extensions. While reading through the API, I noticed a mismatch between the documented pairing of `rb_mark_tbl_no_pin` + `rb_gc_update_tbl_refs` and what the implementation actually does that seems like it can cause crashes and even corruption for extensions. ### The problem The public header documents `rb_gc_update_tbl_refs` as the `dcompact` counterpart for `rb_mark_tbl_no_pin`: ``` * Updates references inside of tables. After you marked values using * rb_mark_tbl_no_pin(), the objects inside of the table could of course be * moved. This function is to fixup those references. ``` However, `rb_mark_tbl_no_pin`'s implementation only marks **values**, not keys: ```c // gc.c / gc/gc.h static int gc_mark_tbl_no_pin_i(st_data_t key, st_data_t value, st_data_t data) { rb_gc_mark_movable((VALUE)value); // key is never touched return ST_CONTINUE; } ``` Thus, tables used with `rb_mark_tbl_no_pin` can have non-VALUE keys (e.g. `ID`s, raw integers, C pointers, etc). The problem is that `rb_gc_update_tbl_refs` (via `gc_update_table_refs`) calls `rb_gc_location` on **both** keys and values: ```c static int hash_foreach_replace(st_data_t key, st_data_t value, ...) { if (rb_gc_location((VALUE)key) != (VALUE)key) // <- called on key return ST_REPLACE; if (rb_gc_location((VALUE)value) != (VALUE)value) return ST_REPLACE; ... } ``` Calling `rb_gc_location` on something that's not a `VALUE` and even sometimes replacing it is probably going to cause crashes and heap corruption. ### Suggested fixes I can think of a few ways to address this: **a) Deprecate `rb_gc_update_tbl_refs`** Maybe just deprecate it? I'm not even sure if this gets used very often. **b) Introduce `rb_mark_hash_no_pin`** The `mark_keyvalue` iterator already exists internally and is used by `mark_hash()` for Ruby's own `Hash` objects. Exposing a public `rb_mark_hash_no_pin` would give `rb_gc_update_tbl_refs` a safe, correct `dmark` counterpart for tables where both keys and values are Ruby objects. **c) Fix the docs and document the precondition for `rb_gc_update_tbl_refs`** Update the docs to make explicit that `rb_gc_update_tbl_refs` assumes a `VALUE => VALUE` table, and that all keys **must have been marked** (e.g. via `rb_mark_set` or a manual iteration with `rb_gc_mark_movable`) in addition to calling `rb_mark_tbl_no_pin` for the values. Weird and ugly, but correct? --- Happy to help with a patch if useful. -- https://bugs.ruby-lang.org/
Issue #21993 has been updated by luke-gru (Luke Gruber). Looking into this further, it looks like `gc_update_table_refs` is only used once internally, with the finalizer_table. It looks to me like it's used wrong there because the key (finalized object) is not pinned so can be moved, but the st_table is not rehashed afterwards. Moving keys of st_tables is tricky, although I could be misunderstanding something. ---------------------------------------- Bug #21993: `rb_gc_update_tbl_refs` is incorrectly documented as the dcompact pair for `rb_mark_tbl_no_pin`, and is unsafe for `st_table`s with non-VALUE keys https://bugs.ruby-lang.org/issues/21993#change-117044 * Author: ivoanjo (Ivo Anjo) * Status: Open * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN ---------------------------------------- Hey! I work for Datadog on the [Ruby profiler](https://github.com/datadog/dd-trace-rb) and I've been exploring the TypedData GC API to understand the correct patterns for writing compaction-aware extensions. While reading through the API, I noticed a mismatch between the documented pairing of `rb_mark_tbl_no_pin` + `rb_gc_update_tbl_refs` and what the implementation actually does that seems like it can cause crashes and even corruption for extensions. ### The problem The public header documents `rb_gc_update_tbl_refs` as the `dcompact` counterpart for `rb_mark_tbl_no_pin`: ``` * Updates references inside of tables. After you marked values using * rb_mark_tbl_no_pin(), the objects inside of the table could of course be * moved. This function is to fixup those references. ``` However, `rb_mark_tbl_no_pin`'s implementation only marks **values**, not keys: ```c // gc.c / gc/gc.h static int gc_mark_tbl_no_pin_i(st_data_t key, st_data_t value, st_data_t data) { rb_gc_mark_movable((VALUE)value); // key is never touched return ST_CONTINUE; } ``` Thus, tables used with `rb_mark_tbl_no_pin` can have non-VALUE keys (e.g. `ID`s, raw integers, C pointers, etc). The problem is that `rb_gc_update_tbl_refs` (via `gc_update_table_refs`) calls `rb_gc_location` on **both** keys and values: ```c static int hash_foreach_replace(st_data_t key, st_data_t value, ...) { if (rb_gc_location((VALUE)key) != (VALUE)key) // <- called on key return ST_REPLACE; if (rb_gc_location((VALUE)value) != (VALUE)value) return ST_REPLACE; ... } ``` Calling `rb_gc_location` on something that's not a `VALUE` and even sometimes replacing it is probably going to cause crashes and heap corruption. ### Suggested fixes I can think of a few ways to address this: **a) Deprecate `rb_gc_update_tbl_refs`** Maybe just deprecate it? I'm not even sure if this gets used very often. **b) Introduce `rb_mark_hash_no_pin`** The `mark_keyvalue` iterator already exists internally and is used by `mark_hash()` for Ruby's own `Hash` objects. Exposing a public `rb_mark_hash_no_pin` would give `rb_gc_update_tbl_refs` a safe, correct `dmark` counterpart for tables where both keys and values are Ruby objects. **c) Fix the docs and document the precondition for `rb_gc_update_tbl_refs`** Update the docs to make explicit that `rb_gc_update_tbl_refs` assumes a `VALUE => VALUE` table, and that all keys **must have been marked** (e.g. via `rb_mark_set` or a manual iteration with `rb_gc_mark_movable`) in addition to calling `rb_mark_tbl_no_pin` for the values. Weird and ugly, but correct? --- Happy to help with a patch if useful. -- https://bugs.ruby-lang.org/
Issue #21993 has been updated by ivoanjo (Ivo Anjo). I guess objects that get a finalizer don't move -- e.g. `gc_is_moveable_obj` ==> `FALSE` so that's what made the finalizer table never be affected by this detail. ---------------------------------------- Bug #21993: `rb_gc_update_tbl_refs` is incorrectly documented as the dcompact pair for `rb_mark_tbl_no_pin`, and is unsafe for `st_table`s with non-VALUE keys https://bugs.ruby-lang.org/issues/21993#change-117061 * Author: ivoanjo (Ivo Anjo) * Status: Open * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN ---------------------------------------- Hey! I work for Datadog on the [Ruby profiler](https://github.com/datadog/dd-trace-rb) and I've been exploring the TypedData GC API to understand the correct patterns for writing compaction-aware extensions. While reading through the API, I noticed a mismatch between the documented pairing of `rb_mark_tbl_no_pin` + `rb_gc_update_tbl_refs` and what the implementation actually does that seems like it can cause crashes and even corruption for extensions. ### The problem The public header documents `rb_gc_update_tbl_refs` as the `dcompact` counterpart for `rb_mark_tbl_no_pin`: ``` * Updates references inside of tables. After you marked values using * rb_mark_tbl_no_pin(), the objects inside of the table could of course be * moved. This function is to fixup those references. ``` However, `rb_mark_tbl_no_pin`'s implementation only marks **values**, not keys: ```c // gc.c / gc/gc.h static int gc_mark_tbl_no_pin_i(st_data_t key, st_data_t value, st_data_t data) { rb_gc_mark_movable((VALUE)value); // key is never touched return ST_CONTINUE; } ``` Thus, tables used with `rb_mark_tbl_no_pin` can have non-VALUE keys (e.g. `ID`s, raw integers, C pointers, etc). The problem is that `rb_gc_update_tbl_refs` (via `gc_update_table_refs`) calls `rb_gc_location` on **both** keys and values: ```c static int hash_foreach_replace(st_data_t key, st_data_t value, ...) { if (rb_gc_location((VALUE)key) != (VALUE)key) // <- called on key return ST_REPLACE; if (rb_gc_location((VALUE)value) != (VALUE)value) return ST_REPLACE; ... } ``` Calling `rb_gc_location` on something that's not a `VALUE` and even sometimes replacing it is probably going to cause crashes and heap corruption. ### Suggested fixes I can think of a few ways to address this: **a) Deprecate `rb_gc_update_tbl_refs`** Maybe just deprecate it? I'm not even sure if this gets used very often. **b) Introduce `rb_mark_hash_no_pin`** The `mark_keyvalue` iterator already exists internally and is used by `mark_hash()` for Ruby's own `Hash` objects. Exposing a public `rb_mark_hash_no_pin` would give `rb_gc_update_tbl_refs` a safe, correct `dmark` counterpart for tables where both keys and values are Ruby objects. **c) Fix the docs and document the precondition for `rb_gc_update_tbl_refs`** Update the docs to make explicit that `rb_gc_update_tbl_refs` assumes a `VALUE => VALUE` table, and that all keys **must have been marked** (e.g. via `rb_mark_set` or a manual iteration with `rb_gc_mark_movable`) in addition to calling `rb_mark_tbl_no_pin` for the values. Weird and ugly, but correct? --- Happy to help with a patch if useful. -- https://bugs.ruby-lang.org/
Issue #21993 has been updated by luke-gru (Luke Gruber). Oh yeah, thanks. There's even a helpful comment for it. I think changing `rb_gc_update_tbl_refs` to only move values probably makes sense. We could add something like `rb_mark_hash_nopin_values` where it marks values for moving but pins keys. They could be the mark/compact pair for VALUE=>VALUE st_tables. If the user wants to support moving of a st_table key, they would need to write it themselves. ---------------------------------------- Bug #21993: `rb_gc_update_tbl_refs` is incorrectly documented as the dcompact pair for `rb_mark_tbl_no_pin`, and is unsafe for `st_table`s with non-VALUE keys https://bugs.ruby-lang.org/issues/21993#change-117062 * Author: ivoanjo (Ivo Anjo) * Status: Open * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN ---------------------------------------- Hey! I work for Datadog on the [Ruby profiler](https://github.com/datadog/dd-trace-rb) and I've been exploring the TypedData GC API to understand the correct patterns for writing compaction-aware extensions. While reading through the API, I noticed a mismatch between the documented pairing of `rb_mark_tbl_no_pin` + `rb_gc_update_tbl_refs` and what the implementation actually does that seems like it can cause crashes and even corruption for extensions. ### The problem The public header documents `rb_gc_update_tbl_refs` as the `dcompact` counterpart for `rb_mark_tbl_no_pin`: ``` * Updates references inside of tables. After you marked values using * rb_mark_tbl_no_pin(), the objects inside of the table could of course be * moved. This function is to fixup those references. ``` However, `rb_mark_tbl_no_pin`'s implementation only marks **values**, not keys: ```c // gc.c / gc/gc.h static int gc_mark_tbl_no_pin_i(st_data_t key, st_data_t value, st_data_t data) { rb_gc_mark_movable((VALUE)value); // key is never touched return ST_CONTINUE; } ``` Thus, tables used with `rb_mark_tbl_no_pin` can have non-VALUE keys (e.g. `ID`s, raw integers, C pointers, etc). The problem is that `rb_gc_update_tbl_refs` (via `gc_update_table_refs`) calls `rb_gc_location` on **both** keys and values: ```c static int hash_foreach_replace(st_data_t key, st_data_t value, ...) { if (rb_gc_location((VALUE)key) != (VALUE)key) // <- called on key return ST_REPLACE; if (rb_gc_location((VALUE)value) != (VALUE)value) return ST_REPLACE; ... } ``` Calling `rb_gc_location` on something that's not a `VALUE` and even sometimes replacing it is probably going to cause crashes and heap corruption. ### Suggested fixes I can think of a few ways to address this: **a) Deprecate `rb_gc_update_tbl_refs`** Maybe just deprecate it? I'm not even sure if this gets used very often. **b) Introduce `rb_mark_hash_no_pin`** The `mark_keyvalue` iterator already exists internally and is used by `mark_hash()` for Ruby's own `Hash` objects. Exposing a public `rb_mark_hash_no_pin` would give `rb_gc_update_tbl_refs` a safe, correct `dmark` counterpart for tables where both keys and values are Ruby objects. **c) Fix the docs and document the precondition for `rb_gc_update_tbl_refs`** Update the docs to make explicit that `rb_gc_update_tbl_refs` assumes a `VALUE => VALUE` table, and that all keys **must have been marked** (e.g. via `rb_mark_set` or a manual iteration with `rb_gc_mark_movable`) in addition to calling `rb_mark_tbl_no_pin` for the values. Weird and ugly, but correct? --- Happy to help with a patch if useful. -- https://bugs.ruby-lang.org/
Issue #21993 has been updated by ivoanjo (Ivo Anjo). Yeah I think that'd be fine! To be honest I never ever knew these APIs existed, I've been researching the GC and that's how I found it, so I suspect they might not be very common? ---------------------------------------- Bug #21993: `rb_gc_update_tbl_refs` is incorrectly documented as the dcompact pair for `rb_mark_tbl_no_pin`, and is unsafe for `st_table`s with non-VALUE keys https://bugs.ruby-lang.org/issues/21993#change-117063 * Author: ivoanjo (Ivo Anjo) * Status: Open * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN ---------------------------------------- Hey! I work for Datadog on the [Ruby profiler](https://github.com/datadog/dd-trace-rb) and I've been exploring the TypedData GC API to understand the correct patterns for writing compaction-aware extensions. While reading through the API, I noticed a mismatch between the documented pairing of `rb_mark_tbl_no_pin` + `rb_gc_update_tbl_refs` and what the implementation actually does that seems like it can cause crashes and even corruption for extensions. ### The problem The public header documents `rb_gc_update_tbl_refs` as the `dcompact` counterpart for `rb_mark_tbl_no_pin`: ``` * Updates references inside of tables. After you marked values using * rb_mark_tbl_no_pin(), the objects inside of the table could of course be * moved. This function is to fixup those references. ``` However, `rb_mark_tbl_no_pin`'s implementation only marks **values**, not keys: ```c // gc.c / gc/gc.h static int gc_mark_tbl_no_pin_i(st_data_t key, st_data_t value, st_data_t data) { rb_gc_mark_movable((VALUE)value); // key is never touched return ST_CONTINUE; } ``` Thus, tables used with `rb_mark_tbl_no_pin` can have non-VALUE keys (e.g. `ID`s, raw integers, C pointers, etc). The problem is that `rb_gc_update_tbl_refs` (via `gc_update_table_refs`) calls `rb_gc_location` on **both** keys and values: ```c static int hash_foreach_replace(st_data_t key, st_data_t value, ...) { if (rb_gc_location((VALUE)key) != (VALUE)key) // <- called on key return ST_REPLACE; if (rb_gc_location((VALUE)value) != (VALUE)value) return ST_REPLACE; ... } ``` Calling `rb_gc_location` on something that's not a `VALUE` and even sometimes replacing it is probably going to cause crashes and heap corruption. ### Suggested fixes I can think of a few ways to address this: **a) Deprecate `rb_gc_update_tbl_refs`** Maybe just deprecate it? I'm not even sure if this gets used very often. **b) Introduce `rb_mark_hash_no_pin`** The `mark_keyvalue` iterator already exists internally and is used by `mark_hash()` for Ruby's own `Hash` objects. Exposing a public `rb_mark_hash_no_pin` would give `rb_gc_update_tbl_refs` a safe, correct `dmark` counterpart for tables where both keys and values are Ruby objects. **c) Fix the docs and document the precondition for `rb_gc_update_tbl_refs`** Update the docs to make explicit that `rb_gc_update_tbl_refs` assumes a `VALUE => VALUE` table, and that all keys **must have been marked** (e.g. via `rb_mark_set` or a manual iteration with `rb_gc_mark_movable`) in addition to calling `rb_mark_tbl_no_pin` for the values. Weird and ugly, but correct? --- Happy to help with a patch if useful. -- https://bugs.ruby-lang.org/
participants (2)
-
ivoanjo (Ivo Anjo) -
luke-gru (Luke Gruber)