[XeTeX] handling malformed UTF-8 input
Bernhard Barkow
bb at creativeeyes.at
Wed Feb 20 12:32:51 CET 2008
Hi,
On 2008-02-20, at 10:23, Ulrike Fischer wrote:
>> Surely it is most likely that someone has Copy/Pasted
>> something with the wrong encoding into a portion of
>> an otherwise valid file.
>
> I can't see how this can happen unless you are using some hex-
> editor and
> copy the bytes directly. In normal editors, if you copy from a file to
> another and then _save the file_ the editor will save the complete
> file
> in one encoding not in two. The meaning of the copied chars will
> perhaps
> be wrong but not their encoding. So if xetex encounters one non-utf8
> continuation bit it is very probable that the whole file is non-utf8.
this happens continuously to me when copying text from a PDF into
Dreamweaver (working in a utf-8 document there; and I would have
assumed that Preview.app copies text to the clipboard utf-8-encoded).
It is first displayed correctly, but still something gets messed up:
after saving and re-opening parts of the text are useless byte
sequences. (Just as an example; I have to admit I did not investigate
the issue very closely yet.)
Bernhard
____________________________________________________
_________________________________ Bernhard Barkow __
___________________ gpg key ID _ A89F09C45921020D __
More information about the XeTeX
mailing list