[luatex] UTF-8 byte sequence 0xEF, 0xBF, 0xBD causes invalid sequence error whereas ^^^^fffd does not

luigi scarso luigi.scarso at gmail.com
Sat Jan 14 13:11:12 CET 2023


On Sat, 14 Jan 2023 at 12:53, Vítek Novotný <witiko at mail.muni.cz> wrote:

> On Sat, Jan 14, 2023 at 10:50:21AM +0100, luigi scarso wrote:
> > On Sat, 14 Jan 2023 at 10:27, luigi scarso <luigi.scarso at gmail.com>
> wrote:
> >
> > >
> > >
> > > On Fri, 13 Jan 2023 at 22:36, Vítek Novotný <witiko at mail.muni.cz>
> wrote:
> > >
> > >> Dear LuaTeX developers,
> > >>
> > >> assume the following plain TeX document `example.tex`:
> > >>
> > >>     \newwrite\outfile
> > >>     \openout\outfile\jobname.out
> > >>     \write\outfile{^^^^fffd}
> > >>     \closeout\outfile
> > >>     \bye
> > >>
> > >> Running `luatex example` will correctly produce file `example.out`
> with
> > >> the
> > >> UTF-8 encoding of U+FFFD: 0xEF, 0xBF, and 0xBD.
> > >>
> > >>     $ hexdump -C
> > >>     00000000  ef bf bd 0a                                       |....|
> > >>     00000004
> > >>
> > >> Now, let's change `example.tex` as follows:
> > >>
> > >>     \input\jobname.out
> > >>     \bye
> > >>
> > >> Running `luatex example` produces the following error:
> > >>
> > >>     ! String contains an invalid utf-8 sequence.
> > >>
> > >> I would expect that LuaTeX would treat ^^^^fffd and the byte sequence
> > >> 0xEF,
> > >> 0xBF, and 0xBD the same. This issue was co-discovered by
> @lostenderman at
> > >> <https://github.com/lostenderman/markdown/issues/34>.
> > >>
> > >>
> > > hm, checking it now.
> > >
> > >
> > hm I am not able to reproduce the error... My log says,
> > Missing character: There is no � (U+FFFD) in font cmr10!
> > but luatex  exits fine
> >
> >  $ luatex --credits
> > This is LuaTeX, Version 1.15.1 (TeX Live 2023/dev)
> >
> > The LuaTeX team is Hans Hagen, Hartmut Henkel, Taco Hoekwater, Luigi
> Scarso.
> >
> > LuaTeX merges and builds upon (parts of) the code from these projects:
> >
> > tex       : Donald Knuth
> > etex      : Peter Breitenlohner, Phil Taylor and friends
> > omega     : John Plaice and Yannis Haralambous
> > aleph     : Giuseppe Bilotta
> > pdftex    : Han The Thanh and friends
> > kpathsea  : Karl Berry, Olaf Weber and others
> > lua       : Roberto Ierusalimschy, Waldemar Celes and Luiz Henrique de
> > Figueiredo
> > metapost  : John Hobby, Taco Hoekwater, Luigi Scarso, Hans Hagen and
> friends
> > pplib     : Paweł Jackowski
> > fontforge : George Williams (partial)
> > luajit    : Mike Pall (used in LuajitTeX)
> >
> > Compiled with libpng 1.6.39; using 1.6.39
> > Compiled with lua version 5.3.6
> > Compiled with mplib version 2.02
> > Compiled with zlib 1.2.13; using 1.2.13
> >
> > Development id: 7554
>
> Here is mine:
>
>                 # luatex --credits
>                 This is LuaTeX, Version 1.15.0 (TeX Live 2022)
>
>                 The LuaTeX team is Hans Hagen, Hartmut Henkel, Taco
> Hoekwater, Luigi Scarso.
>
>                 LuaTeX merges and builds upon (parts of) the code from
> these projects:
>
>                 tex       : Donald Knuth
>                 etex      : Peter Breitenlohner, Phil Taylor and friends
>                 omega     : John Plaice and Yannis Haralambous
>                 aleph     : Giuseppe Bilotta
>                 pdftex    : Han The Thanh and friends
>                 kpathsea  : Karl Berry, Olaf Weber and others
>                 lua       : Roberto Ierusalimschy, Waldemar Celes and Luiz
> Henrique de Figueiredo
>                 metapost  : John Hobby, Taco Hoekwater, Luigi Scarso, Hans
> Hagen and friends
>                 pplib     : Paweł Jackowski
>                 fontforge : George Williams (partial)
>                 luajit    : Mike Pall (used in LuajitTeX)
>
>                 Compiled with libpng 1.6.37; using 1.6.37
>                 Compiled with lua version 5.3.6
>                 Compiled with mplib version 2.02
>                 Compiled with zlib 1.2.11; using 1.2.11
>
>                 Development id: 7509
>
> Here is the full sequence of commands to reproduce the issue:
>
>     $ docker run --rm -it texlive/texlive
>
>     # cat > example.tex << EOF
>     > \newwrite\outfile
>     > \openout\outfile\jobname.out
>     > \write\outfile{^^^^fffd}
>     > \closeout\outfile
>     > \bye
>     > EOF
>
>     # luatex example
>     This is LuaTeX, Version 1.15.0 (TeX Live 2022)
>      restricted system commands enabled.
>     (./example.tex
> [1{/usr/local/texlive/2022/texmf-var/fonts/map/pdftex/updmap/pdf
>
> tex.map}])</usr/local/texlive/2022/texmf-dist/fonts/type1/public/amsfonts/cm/cm
>     r10.pfb>
>     Output written on example.pdf (1 page, 8143 bytes).
>     Transcript written on example.log.
>
>     # cat > example.tex << EOF
>     > \input\jobname.out
>     > \bye
>     > EOF
>
>     # luatex example
>     This is LuaTeX, Version 1.15.0 (TeX Live 2022)
>      restricted system commands enabled.
>     (./example.tex (./example.out
>     ! String contains an invalid utf-8 sequence.
>     l.1
>       ?
>     ? q
>     OK, entering \batchmode
>
> I don't have the opportunity to test the latest LuaTeX but if I
> understand you correctly, it seems that this issue has been fixed
> sumetime between Development id 7509 and 7554 and will no longer be
> present in TeX Live 2023.
>
> Best,
> Vit
>

yes, maybe this patch we have done fixes the issue:
2022-08-16  Luigi Scarso <luigi.scarso at gmail.com>
     * Accent 0xFFFD but still error on invalid utf (compatible) (H.Hagen)
     * omitinfodict added: \pdfvariable omitinfodict 1 omit Info
dicttionary (H.Hagen)

--
luigi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://tug.org/pipermail/luatex/attachments/20230114/d47ada37/attachment.html>


More information about the luatex mailing list.