[XeTeX] xetex doesn't recognize/replace all invalid utf8 bytes

George N. White III gnwiii at gmail.com
Tue Dec 29 18:36:10 CET 2009


On Tue, Dec 29, 2009 at 12:35 PM, Peter Ragosch
<peter.ragosch at kabelmail.de> wrote:
> On Tue, 29 Dec 2009 15:15:14 +0100
> Ulrike Fischer <news3 at nililand.de> wrote:
>
>> Hello,
>>
>> when I compile the following example encoded as ansinew (cp1252)
>> xetex complains and replaces in the second and the third case the
>> non-ASCII char but not in the first case. As far as I can see all
>> "lonely continuation bytes" are simply passed through without
>> complain (and sometimes even give the correct char).
>>
>> \documentclass{article}
>> \usepackage{fontspec}
>> \begin{document}
>> %invalid (lonely continuation byte)
>> x§x %Hex A7 (bin 10100111)
>>
>> %invalid (continuation byte missing)
>> xéx %Hex E9 (bin 11101001)
>>
>> %invalid (second continuation byte missing)
>> xö§x %Hex F7 (bin 11110110) + A7 (bin 10100111)
>>
>> \end{document}
>>
>> gives two replacement messages:
>>
>> Invalid UTF-8 byte or sequence at line 8 replaced by U+FFFD.
>> Missing character: There is no � in font
>> [lmroman10-regular]:mapping=tex-text
>> !
>> Invalid UTF-8 byte or sequence at line 11 replaced by U+FFFD.
>> Missing character: There is no � in font
>> [lmroman10-regular]:mapping=tex-text
>> !
>>
>>
>> Is there a reason behind this behaviour? Or is it a bug?
>>
>> (Tested with XeTeX, Version 3.1415926-2.2-0.999.7 (MiKTeX 2.7))
>>
>>
>
> Hi Ulrike - I got no problem:
>
> peter at raven:~/work/Tex/xetex/Characters> xelatex characters.tex
> This is XeTeX, Version 3.1415926-2.2-0.9995.2 (TeX Live 2009)
>  \write18 enabled.
>[...]
> transcript file for additional information) Output written on
> characters.pdf (1 page). Transcript written on characters.log.

Did you actually look at characters.log?   I get the same result
as Ulrike and Akira after I edit the document to set the codes
as indicated (mail system managed to convert to unicode, but
the glyphs were not  the same as with cp1252), except that
Apple terminal puts a ? in a black diamond in the message
text.

-- 
George N. White III <aa056 at chebucto.ns.ca>
Head of St. Margarets Bay, Nova Scotia


More information about the XeTeX mailing list