[XeTeX] handling malformed UTF-8 input

Peter Dyballa Peter_Dyballa at Web.DE
Thu Feb 21 12:32:13 CET 2008


Am 21.02.2008 um 11:12 schrieb Jonathan Kew:

> What do others think about this -- should "invalid UTF-8 byte
> sequence" be an error rather than a warning and fallback?

I'd like to write: make it an error starting with TeX Live 2010!  
Right now XeTeX should behave in a more compatible mode and emit just  
warnings.

In the end this or that process will fail, as already reported, so  
there is no real compatibility mode in which XeTeX can work. And  
since it might be able to produce something that works but is faulty  
(10 % of code assumed as some senseless bytes?), producing an error  
report and stopping work is more sensible.


IMO it's not that bad to include in some non-English language support  
file comments in that language in non-7-bit US-ASCII. Those who will  
use this supported language will be able to read and understand the  
comments. The trouble comes with the TeX Live setup that uses a dozen  
or more languages in its default setup *and* allows the use of  
problematic characters. These two issues need a change from a XeTeX  
point of view. It would be better if XeTeX would clean known  
problematic files from their irritating comment lines – before  
building a FMT file or such. The TeX code inside cannot be faulty ...

This could be a kind of compatibility mode until TeX Live 2010 is  
released.

--
Greetings

   Pete

A lot of us are working harder than we want, at things we don't like  
to do. Why? ...In order to afford the sort of existence we don't care  
to live.
				– Bradford Angier





More information about the XeTeX mailing list