Issue #21842 has been updated by herwin (Herwin W). I've made a short update of the documentation in https://github.com/ruby/ruby/pull/15897, mostly to explain what information is used to determine the encoding of the result. I've tried to keep the line width usage similar to the original, which meant doubling some random spaces until it lined up. I would not mind dropping this dependency, since it makes updating these texts a whole lot easier. ---------------------------------------- Bug #21842: Encoding of rb_interned_str https://bugs.ruby-lang.org/issues/21842#change-116169 * Author: herwin (Herwin W) * Status: Open * ruby -v: ruby 4.0.1 (2026-01-13 revision e04267a14b) +PRISM [x86_64-linux], but seen on 3.0 - 4.1-dev * Backport: 3.2: WONTFIX, 3.3: REQUIRED, 3.4: REQUIRED, 4.0: REQUIRED ---------------------------------------- This is one of the API methods to get an fstring. The documentation in the source says the following: ```c /** * Identical to rb_str_new(), except it returns an infamous "f"string. What is * a fstring? Well it is a special subkind of strings that is immutable, * deduped globally, and managed by our GC. It is much like a Symbol (in fact * Symbols are dynamic these days and are backended using fstrings). This * concept has been silently introduced at some point in 2.x era. Since then * it gained wider acceptance in the core. Starting from 3.x extension * libraries can also generate ones. * * @param[in] ptr A memory region of `len` bytes length. * @param[in] len Length of `ptr`, in bytes, not including the * terminating NUL character. * @exception rb_eArgError `len` is negative. * @return A found or created instance of ::rb_cString, of `len` bytes * length, of "binary" encoding, whose contents are identical to * that of `ptr`. * @pre At least `len` bytes of continuous memory region shall be * accessible via `ptr`. */ VALUE rb_interned_str(const char *ptr, long len); ``` I tried to create some specs for them (https://github.com/ruby/spec/pull/1327), but instead of binary encoding, the string is actually encoded as US-ASCII. This may result is some weird behaviour if the input contains bytes that are not valid in US-ASCII (the following is more an observation of the current behaviour) ```ruby it "support binary strings that are invalid in ASCII encoding" do str = "foo\x81bar\x82baz".b result = @s.rb_interned_str(str, str.bytesize) result.encoding.should == Encoding::US_ASCII result.should == str.dup.force_encoding(Encoding::US_ASCII) result.should_not.valid_encoding? end ``` So it seems to me like either the implementation of the documentation is incorrect. (`rb_interned_str_cstr` has the same behaviour, it's pretty much the same thing except using a null terminator instead of an explicit length argument). -- https://bugs.ruby-lang.org/