[luatex] UTF-8 byte sequence 0xEF, 0xBF, 0xBD causes invalid sequence error whereas ^^^^fffd does not
luigi scarso
luigi.scarso at gmail.com
Sat Jan 14 13:11:12 CET 2023
On Sat, 14 Jan 2023 at 12:53, Vítek Novotný <witiko at mail.muni.cz> wrote:
> On Sat, Jan 14, 2023 at 10:50:21AM +0100, luigi scarso wrote:
> > On Sat, 14 Jan 2023 at 10:27, luigi scarso <luigi.scarso at gmail.com>
> wrote:
> >
> > >
> > >
> > > On Fri, 13 Jan 2023 at 22:36, Vítek Novotný <witiko at mail.muni.cz>
> wrote:
> > >
> > >> Dear LuaTeX developers,
> > >>
> > >> assume the following plain TeX document `example.tex`:
> > >>
> > >> \newwrite\outfile
> > >> \openout\outfile\jobname.out
> > >> \write\outfile{^^^^fffd}
> > >> \closeout\outfile
> > >> \bye
> > >>
> > >> Running `luatex example` will correctly produce file `example.out`
> with
> > >> the
> > >> UTF-8 encoding of U+FFFD: 0xEF, 0xBF, and 0xBD.
> > >>
> > >> $ hexdump -C
> > >> 00000000 ef bf bd 0a |....|
> > >> 00000004
> > >>
> > >> Now, let's change `example.tex` as follows:
> > >>
> > >> \input\jobname.out
> > >> \bye
> > >>
> > >> Running `luatex example` produces the following error:
> > >>
> > >> ! String contains an invalid utf-8 sequence.
> > >>
> > >> I would expect that LuaTeX would treat ^^^^fffd and the byte sequence
> > >> 0xEF,
> > >> 0xBF, and 0xBD the same. This issue was co-discovered by
> @lostenderman at
> > >> <https://github.com/lostenderman/markdown/issues/34>.
> > >>
> > >>
> > > hm, checking it now.
> > >
> > >
> > hm I am not able to reproduce the error... My log says,
> > Missing character: There is no � (U+FFFD) in font cmr10!
> > but luatex exits fine
> >
> > $ luatex --credits
> > This is LuaTeX, Version 1.15.1 (TeX Live 2023/dev)
> >
> > The LuaTeX team is Hans Hagen, Hartmut Henkel, Taco Hoekwater, Luigi
> Scarso.
> >
> > LuaTeX merges and builds upon (parts of) the code from these projects:
> >
> > tex : Donald Knuth
> > etex : Peter Breitenlohner, Phil Taylor and friends
> > omega : John Plaice and Yannis Haralambous
> > aleph : Giuseppe Bilotta
> > pdftex : Han The Thanh and friends
> > kpathsea : Karl Berry, Olaf Weber and others
> > lua : Roberto Ierusalimschy, Waldemar Celes and Luiz Henrique de
> > Figueiredo
> > metapost : John Hobby, Taco Hoekwater, Luigi Scarso, Hans Hagen and
> friends
> > pplib : Paweł Jackowski
> > fontforge : George Williams (partial)
> > luajit : Mike Pall (used in LuajitTeX)
> >
> > Compiled with libpng 1.6.39; using 1.6.39
> > Compiled with lua version 5.3.6
> > Compiled with mplib version 2.02
> > Compiled with zlib 1.2.13; using 1.2.13
> >
> > Development id: 7554
>
> Here is mine:
>
> # luatex --credits
> This is LuaTeX, Version 1.15.0 (TeX Live 2022)
>
> The LuaTeX team is Hans Hagen, Hartmut Henkel, Taco
> Hoekwater, Luigi Scarso.
>
> LuaTeX merges and builds upon (parts of) the code from
> these projects:
>
> tex : Donald Knuth
> etex : Peter Breitenlohner, Phil Taylor and friends
> omega : John Plaice and Yannis Haralambous
> aleph : Giuseppe Bilotta
> pdftex : Han The Thanh and friends
> kpathsea : Karl Berry, Olaf Weber and others
> lua : Roberto Ierusalimschy, Waldemar Celes and Luiz
> Henrique de Figueiredo
> metapost : John Hobby, Taco Hoekwater, Luigi Scarso, Hans
> Hagen and friends
> pplib : Paweł Jackowski
> fontforge : George Williams (partial)
> luajit : Mike Pall (used in LuajitTeX)
>
> Compiled with libpng 1.6.37; using 1.6.37
> Compiled with lua version 5.3.6
> Compiled with mplib version 2.02
> Compiled with zlib 1.2.11; using 1.2.11
>
> Development id: 7509
>
> Here is the full sequence of commands to reproduce the issue:
>
> $ docker run --rm -it texlive/texlive
>
> # cat > example.tex << EOF
> > \newwrite\outfile
> > \openout\outfile\jobname.out
> > \write\outfile{^^^^fffd}
> > \closeout\outfile
> > \bye
> > EOF
>
> # luatex example
> This is LuaTeX, Version 1.15.0 (TeX Live 2022)
> restricted system commands enabled.
> (./example.tex
> [1{/usr/local/texlive/2022/texmf-var/fonts/map/pdftex/updmap/pdf
>
> tex.map}])</usr/local/texlive/2022/texmf-dist/fonts/type1/public/amsfonts/cm/cm
> r10.pfb>
> Output written on example.pdf (1 page, 8143 bytes).
> Transcript written on example.log.
>
> # cat > example.tex << EOF
> > \input\jobname.out
> > \bye
> > EOF
>
> # luatex example
> This is LuaTeX, Version 1.15.0 (TeX Live 2022)
> restricted system commands enabled.
> (./example.tex (./example.out
> ! String contains an invalid utf-8 sequence.
> l.1
> ?
> ? q
> OK, entering \batchmode
>
> I don't have the opportunity to test the latest LuaTeX but if I
> understand you correctly, it seems that this issue has been fixed
> sumetime between Development id 7509 and 7554 and will no longer be
> present in TeX Live 2023.
>
> Best,
> Vit
>
yes, maybe this patch we have done fixes the issue:
2022-08-16 Luigi Scarso <luigi.scarso at gmail.com>
* Accent 0xFFFD but still error on invalid utf (compatible) (H.Hagen)
* omitinfodict added: \pdfvariable omitinfodict 1 omit Info
dicttionary (H.Hagen)
--
luigi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://tug.org/pipermail/luatex/attachments/20230114/d47ada37/attachment.html>
More information about the luatex
mailing list.