[luatex] UTF-8 byte sequence 0xEF, 0xBF, 0xBD causes invalid sequence error whereas ^^^^fffd does not
Vítek Novotný
witiko at mail.muni.cz
Fri Jan 13 22:35:34 CET 2023
Dear LuaTeX developers,
assume the following plain TeX document `example.tex`:
\newwrite\outfile
\openout\outfile\jobname.out
\write\outfile{^^^^fffd}
\closeout\outfile
\bye
Running `luatex example` will correctly produce file `example.out` with the
UTF-8 encoding of U+FFFD: 0xEF, 0xBF, and 0xBD.
$ hexdump -C
00000000 ef bf bd 0a |....|
00000004
Now, let's change `example.tex` as follows:
\input\jobname.out
\bye
Running `luatex example` produces the following error:
! String contains an invalid utf-8 sequence.
I would expect that LuaTeX would treat ^^^^fffd and the byte sequence 0xEF,
0xBF, and 0xBD the same. This issue was co-discovered by @lostenderman at
<https://github.com/lostenderman/markdown/issues/34>.
Best,
Vitek
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://tug.org/pipermail/luatex/attachments/20230113/83b5bb09/attachment.sig>
More information about the luatex
mailing list.