[XeTeX] xetex doesn't recognize/replace all invalid utf8 bytes

Ulrike Fischer news3 at nililand.de
Tue Dec 29 15:15:14 CET 2009


Hello,

when I compile the following example encoded as ansinew (cp1252)
xetex complains and replaces in the second and the third case the
non-ASCII char but not in the first case. As far as I can see all
"lonely continuation bytes" are simply passed through without
complain (and sometimes even give the correct char). 

\documentclass{article}
\usepackage{fontspec}
\begin{document}
%invalid (lonely continuation byte)
x§x %Hex A7 (bin 10100111)

%invalid (continuation byte missing)
xéx %Hex E9 (bin 11101001) 

%invalid (second continuation byte missing)
xö§x %Hex F7 (bin 11110110) + A7 (bin 10100111)

\end{document} 
 
gives two replacement messages:

Invalid UTF-8 byte or sequence at line 8 replaced by U+FFFD.
Missing character: There is no � in font
[lmroman10-regular]:mapping=tex-text
!
Invalid UTF-8 byte or sequence at line 11 replaced by U+FFFD.
Missing character: There is no � in font
[lmroman10-regular]:mapping=tex-text
!


Is there a reason behind this behaviour? Or is it a bug?

(Tested with XeTeX, Version 3.1415926-2.2-0.999.7 (MiKTeX 2.7))


-- 
Ulrike Fischer 



More information about the XeTeX mailing list