[luatex] UTF-8 byte sequence 0xEF, 0xBF, 0xBD causes invalid sequence error whereas ^^^^fffd does not

Vítek Novotný witiko at mail.muni.cz
Sat Jan 14 12:52:59 CET 2023


On Sat, Jan 14, 2023 at 10:50:21AM +0100, luigi scarso wrote:
> On Sat, 14 Jan 2023 at 10:27, luigi scarso <luigi.scarso at gmail.com> wrote:
> 
> >
> >
> > On Fri, 13 Jan 2023 at 22:36, Vítek Novotný <witiko at mail.muni.cz> wrote:
> >
> >> Dear LuaTeX developers,
> >>
> >> assume the following plain TeX document `example.tex`:
> >>
> >>     \newwrite\outfile
> >>     \openout\outfile\jobname.out
> >>     \write\outfile{^^^^fffd}
> >>     \closeout\outfile
> >>     \bye
> >>
> >> Running `luatex example` will correctly produce file `example.out` with
> >> the
> >> UTF-8 encoding of U+FFFD: 0xEF, 0xBF, and 0xBD.
> >>
> >>     $ hexdump -C
> >>     00000000  ef bf bd 0a                                       |....|
> >>     00000004
> >>
> >> Now, let's change `example.tex` as follows:
> >>
> >>     \input\jobname.out
> >>     \bye
> >>
> >> Running `luatex example` produces the following error:
> >>
> >>     ! String contains an invalid utf-8 sequence.
> >>
> >> I would expect that LuaTeX would treat ^^^^fffd and the byte sequence
> >> 0xEF,
> >> 0xBF, and 0xBD the same. This issue was co-discovered by @lostenderman at
> >> <https://github.com/lostenderman/markdown/issues/34>.
> >>
> >>
> > hm, checking it now.
> >
> >
> hm I am not able to reproduce the error... My log says,
> Missing character: There is no � (U+FFFD) in font cmr10!
> but luatex  exits fine
> 
>  $ luatex --credits
> This is LuaTeX, Version 1.15.1 (TeX Live 2023/dev)
>
> The LuaTeX team is Hans Hagen, Hartmut Henkel, Taco Hoekwater, Luigi Scarso.
> 
> LuaTeX merges and builds upon (parts of) the code from these projects:
> 
> tex       : Donald Knuth
> etex      : Peter Breitenlohner, Phil Taylor and friends
> omega     : John Plaice and Yannis Haralambous
> aleph     : Giuseppe Bilotta
> pdftex    : Han The Thanh and friends
> kpathsea  : Karl Berry, Olaf Weber and others
> lua       : Roberto Ierusalimschy, Waldemar Celes and Luiz Henrique de
> Figueiredo
> metapost  : John Hobby, Taco Hoekwater, Luigi Scarso, Hans Hagen and friends
> pplib     : Paweł Jackowski
> fontforge : George Williams (partial)
> luajit    : Mike Pall (used in LuajitTeX)
> 
> Compiled with libpng 1.6.39; using 1.6.39
> Compiled with lua version 5.3.6
> Compiled with mplib version 2.02
> Compiled with zlib 1.2.13; using 1.2.13
> 
> Development id: 7554

Here is mine:

		# luatex --credits
		This is LuaTeX, Version 1.15.0 (TeX Live 2022)

		The LuaTeX team is Hans Hagen, Hartmut Henkel, Taco Hoekwater, Luigi Scarso.

		LuaTeX merges and builds upon (parts of) the code from these projects:

		tex       : Donald Knuth
		etex      : Peter Breitenlohner, Phil Taylor and friends
		omega     : John Plaice and Yannis Haralambous
		aleph     : Giuseppe Bilotta
		pdftex    : Han The Thanh and friends
		kpathsea  : Karl Berry, Olaf Weber and others
		lua       : Roberto Ierusalimschy, Waldemar Celes and Luiz Henrique de Figueiredo
		metapost  : John Hobby, Taco Hoekwater, Luigi Scarso, Hans Hagen and friends
		pplib     : Paweł Jackowski
		fontforge : George Williams (partial)
		luajit    : Mike Pall (used in LuajitTeX)

		Compiled with libpng 1.6.37; using 1.6.37
		Compiled with lua version 5.3.6
		Compiled with mplib version 2.02
		Compiled with zlib 1.2.11; using 1.2.11

		Development id: 7509

Here is the full sequence of commands to reproduce the issue:

    $ docker run --rm -it texlive/texlive

    # cat > example.tex << EOF
    > \newwrite\outfile
    > \openout\outfile\jobname.out
    > \write\outfile{^^^^fffd}
    > \closeout\outfile
    > \bye
    > EOF

    # luatex example
    This is LuaTeX, Version 1.15.0 (TeX Live 2022)
     restricted system commands enabled.
    (./example.tex [1{/usr/local/texlive/2022/texmf-var/fonts/map/pdftex/updmap/pdf
    tex.map}])</usr/local/texlive/2022/texmf-dist/fonts/type1/public/amsfonts/cm/cm
    r10.pfb>
    Output written on example.pdf (1 page, 8143 bytes).
    Transcript written on example.log.

    # cat > example.tex << EOF
    > \input\jobname.out
    > \bye
    > EOF

    # luatex example
    This is LuaTeX, Version 1.15.0 (TeX Live 2022)
     restricted system commands enabled.
    (./example.tex (./example.out
    ! String contains an invalid utf-8 sequence.
    l.1
      ?
    ? q
    OK, entering \batchmode

I don't have the opportunity to test the latest LuaTeX but if I
understand you correctly, it seems that this issue has been fixed
sumetime between Development id 7509 and 7554 and will no longer be
present in TeX Live 2023.

Best,
Vit
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://tug.org/pipermail/luatex/attachments/20230114/550815bc/attachment.sig>


More information about the luatex mailing list.