[tlbuild] 2010 builds

Thu Jun 10 16:09:39 CEST 2010

On 10 Jun 2010, at 07:29, Peter Breitenlohner wrote:

> On Wed, 9 Jun 2010, Jonathan Kew wrote:
> 
>>> If so, I'd still not consider this as a good solution, because chktex should
>>> certainly not depend on the current locale when deciding if a character is a
>>> space character.  Or should it????
>> 
>> Does it have any other way of inferring the encoding of the file?
>> 
>> If not, it seems to me the results are little better than a lottery.
> 
> Hi Jonathan,
> 
> you are expecting way too much.  Chktex is a simple minded program,
> comparable to lacheck, that assumes 7 or 8-Bit ASCII input files.

No, ASCII is a 7-bit encoding with code values 0-127. It's sufficiently prevalent that it'd be reasonable to assume this is what is meant by the byte values up to 127 in a text file. However, "8-bit ASCII" is not a useful term here. There are numerous *incompatible* "extensions of ASCII" to 8 bits, including various ISO-8859-n encodings, Windows codepages, Mac codepages, etc, etc. So it's essentially meaningless for Chktex to assume ANYTHING about the codes 128-255 in the absence of some kind of documentation or specification of what encoding is being used.

If the Test.tex file was written by an English speaker on Windows, the "extended" codes in it are most likely Windows-1252; but if it was written on a Mac, they'd be more likely to be MacRoman. And these are just the likelier guesses, assuming a Western/English-speaking author. If it's written by an Asian Unix user, its encoding might be something quite different....

(Though as Jukka notes in the following messages, the fundamental problem in this case is that the code passes negative integers to isspace, so the resulting behavior is undefined regardless of the intended encoding of the input or the current locale.)

JK