[XeTeX] handling malformed UTF-8 input

Wed Feb 20 12:32:51 CET 2008

Hi,

On 2008-02-20, at 10:23, Ulrike Fischer wrote:
>> Surely it is most likely that someone has Copy/Pasted
>> something with the wrong encoding into a portion of
>> an otherwise valid file.
>
> I can't see how this can happen unless you are using some hex- 
> editor and
> copy the bytes directly. In normal editors, if you copy from a file to
> another and then _save the file_ the editor will save the complete  
> file
> in one encoding not in two. The meaning of the copied chars will  
> perhaps
> be wrong but not their encoding. So if xetex encounters one non-utf8
> continuation bit it is very probable that the whole file is non-utf8.

this happens continuously to me when copying text from a PDF into  
Dreamweaver (working in a utf-8 document there; and I would have  
assumed that Preview.app copies text to the clipboard utf-8-encoded).  
It is first displayed correctly, but still something gets messed up:  
after saving and re-opening parts of the text are useless byte  
sequences. (Just as an example; I have to admit I did not investigate  
the issue very closely yet.)

Bernhard

____________________________________________________
_________________________________ Bernhard Barkow __
___________________ gpg key ID _ A89F09C45921020D __