[luatex] UTF-8 byte sequence 0xEF, 0xBF, 0xBD causes invalid sequence error whereas ^^^^fffd does not

Vítek Novotný witiko at mail.muni.cz
Fri Jan 13 22:35:34 CET 2023


Dear LuaTeX developers,

assume the following plain TeX document `example.tex`:

    \newwrite\outfile
    \openout\outfile\jobname.out
    \write\outfile{^^^^fffd}
    \closeout\outfile
    \bye

Running `luatex example` will correctly produce file `example.out` with the
UTF-8 encoding of U+FFFD: 0xEF, 0xBF, and 0xBD.

    $ hexdump -C
    00000000  ef bf bd 0a                                       |....|
    00000004

Now, let's change `example.tex` as follows:

    \input\jobname.out
    \bye

Running `luatex example` produces the following error:

    ! String contains an invalid utf-8 sequence.

I would expect that LuaTeX would treat ^^^^fffd and the byte sequence 0xEF,
0xBF, and 0xBD the same. This issue was co-discovered by @lostenderman at
<https://github.com/lostenderman/markdown/issues/34>.

Best,
Vitek
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://tug.org/pipermail/luatex/attachments/20230113/83b5bb09/attachment.sig>


More information about the luatex mailing list.