Issue #21842 has been updated by byroot (Jean Boussier). Hum, good find. So the function was exposed as a result of [Feature #13381], before that the function was internal. In that ticket we didn't discuss the default encoding, but it might be fair to assume it should have been BINARY (aka ASCII-8BIT) like `rb_str_new*`. The function was later documented in https://github.com/ruby/ruby/commit/091faca99ca and assumed to default to ASCII-8BIT. At first glance I'd say it makes sense to treat this as a bug and change the default encoding. On the other hand, one could argue that interned binary strings don't make that much sense. I don't have a strong opinion either way. ---------------------------------------- Bug #21842: Encoding of rb_interned_str https://bugs.ruby-lang.org/issues/21842#change-116158 * Author: herwin (Herwin W) * Status: Open * ruby -v: ruby 4.0.1 (2026-01-13 revision e04267a14b) +PRISM [x86_64-linux], but seen on 3.0 - 4.1-dev * Backport: 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN ---------------------------------------- This is one of the API methods to get an fstring. The documentation in the source says the following: ```c /** * Identical to rb_str_new(), except it returns an infamous "f"string. What is * a fstring? Well it is a special subkind of strings that is immutable, * deduped globally, and managed by our GC. It is much like a Symbol (in fact * Symbols are dynamic these days and are backended using fstrings). This * concept has been silently introduced at some point in 2.x era. Since then * it gained wider acceptance in the core. Starting from 3.x extension * libraries can also generate ones. * * @param[in] ptr A memory region of `len` bytes length. * @param[in] len Length of `ptr`, in bytes, not including the * terminating NUL character. * @exception rb_eArgError `len` is negative. * @return A found or created instance of ::rb_cString, of `len` bytes * length, of "binary" encoding, whose contents are identical to * that of `ptr`. * @pre At least `len` bytes of continuous memory region shall be * accessible via `ptr`. */ VALUE rb_interned_str(const char *ptr, long len); ``` I tried to create some specs for them (https://github.com/ruby/spec/pull/1327), but instead of binary encoding, the string is actually encoded as US-ASCII. This may result is some weird behaviour if the input contains bytes that are not valid in US-ASCII (the following is more an observation of the current behaviour) ```ruby it "support binary strings that are invalid in ASCII encoding" do str = "foo\x81bar\x82baz".b result = @s.rb_interned_str(str, str.bytesize) result.encoding.should == Encoding::US_ASCII result.should == str.dup.force_encoding(Encoding::US_ASCII) result.should_not.valid_encoding? end ``` So it seems to me like either the implementation of the documentation is incorrect. (`rb_interned_str_cstr` has the same behaviour, it's pretty much the same thing except using a null terminator instead of an explicit length argument). -- https://bugs.ruby-lang.org/