[XeTeX] xetex doesn't recognize/replace all invalid utf8 bytes

Peter Ragosch peter.ragosch at kabelmail.de
Tue Dec 29 17:35:42 CET 2009

On Tue, 29 Dec 2009 15:15:14 +0100
Ulrike Fischer <news3 at nililand.de> wrote:

> Hello,
> when I compile the following example encoded as ansinew (cp1252)
> xetex complains and replaces in the second and the third case the
> non-ASCII char but not in the first case. As far as I can see all
> "lonely continuation bytes" are simply passed through without
> complain (and sometimes even give the correct char). 
> \documentclass{article}
> \usepackage{fontspec}
> \begin{document}
> %invalid (lonely continuation byte)
> x§x %Hex A7 (bin 10100111)
> %invalid (continuation byte missing)
> xéx %Hex E9 (bin 11101001) 
> %invalid (second continuation byte missing)
> xö§x %Hex F7 (bin 11110110) + A7 (bin 10100111)
> \end{document} 
> gives two replacement messages:
> Invalid UTF-8 byte or sequence at line 8 replaced by U+FFFD.
> Missing character: There is no � in font
> [lmroman10-regular]:mapping=tex-text
> !
> Invalid UTF-8 byte or sequence at line 11 replaced by U+FFFD.
> Missing character: There is no � in font
> [lmroman10-regular]:mapping=tex-text
> !
> Is there a reason behind this behaviour? Or is it a bug?
> (Tested with XeTeX, Version 3.1415926-2.2-0.999.7 (MiKTeX 2.7))

Hi Ulrike - I got no problem:

peter at raven:~/work/Tex/xetex/Characters> xelatex characters.tex
This is XeTeX, Version 3.1415926-2.2-0.9995.2 (TeX Live 2009)
 \write18 enabled.
entering extended mode
LaTeX2e <2009/09/24>
Babel <v3.8l> and hyphenation patterns for english, usenglishmax,
dumylang, nohyphenation, german-x-2009-06-19, ngerman-x-2009-06-19,
ancientgreek, ibycus, arabic, basque, bulgarian, catalan, pinyin,
coptic, croatian, czech, danish, dutch, esperanto, estonian, farsi,
finnish, french, galician, german, ngerman, mono greek, greek,
hungarian, icelandic, indonesian, interlingua, irish, italian, ku
rmanji, latin, latvian, lithuanian, mongolian, mongolian2a, bokmal,
nynorsk, po lish, portuguese, romanian, russian, sanskrit, serbian,
slovak, slovenian, span ish, swedish, turkish, ukenglish, ukrainian,
uppersorbian, welsh, loaded.
(/usr/local/texlive/2009/texmf-dist/tex/latex/base/article.cls Document
Class: article 2007/10/19 v1.4h Standard LaTeX document class
fontspec.cfg loaded.
(/home/peter/work/Tex/xetex/Characters/characters.aux) [1]
(/home/peter/work/Tex/xetex/Characters/characters.aux) ) (see the
transcript file for additional information) Output written on
characters.pdf (1 page). Transcript written on characters.log.


More information about the XeTeX mailing list