[luatex] UTF-8 byte sequence 0xEF, 0xBF, 0xBD causes invalid sequence error whereas ^^^^fffd does not

luigi scarso luigi.scarso at gmail.com
Sat Jan 14 10:27:39 CET 2023


On Fri, 13 Jan 2023 at 22:36, Vítek Novotný <witiko at mail.muni.cz> wrote:

> Dear LuaTeX developers,
>
> assume the following plain TeX document `example.tex`:
>
>     \newwrite\outfile
>     \openout\outfile\jobname.out
>     \write\outfile{^^^^fffd}
>     \closeout\outfile
>     \bye
>
> Running `luatex example` will correctly produce file `example.out` with the
> UTF-8 encoding of U+FFFD: 0xEF, 0xBF, and 0xBD.
>
>     $ hexdump -C
>     00000000  ef bf bd 0a                                       |....|
>     00000004
>
> Now, let's change `example.tex` as follows:
>
>     \input\jobname.out
>     \bye
>
> Running `luatex example` produces the following error:
>
>     ! String contains an invalid utf-8 sequence.
>
> I would expect that LuaTeX would treat ^^^^fffd and the byte sequence 0xEF,
> 0xBF, and 0xBD the same. This issue was co-discovered by @lostenderman at
> <https://github.com/lostenderman/markdown/issues/34>.
>
>
hm, checking it now.

--
luigi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://tug.org/pipermail/luatex/attachments/20230114/a377bbf1/attachment.html>


More information about the luatex mailing list.