[XeTeX] latin-1 encoded characters in commented out parts trigger log warnings

Jonathan Kew jfkthame at gmail.com
Mon Feb 22 00:39:12 CET 2021


On 21/02/2021 22:55, Ross Moore wrote:
>> The file reading has failed  before any tex accessible processing has 
>> happened (see the ebcdic example in the TeXBook)
> 
> OK.
> But that’s changing the meaning of bit-order, yes?
> Surely we can be past that.

No, it's not about bit-order; it's about changing the mapping of code 
units in the external file to character codes in TeX's internal 
(ASCII-based) code.

> 
> 
>>
>> \danger \TeX\ always uses the internal character code of Appendix~C
>> for the standard ASCII characters,
>> regardless of what external coding scheme actually appears in the files
>> being read.  Thus, |b| is 98 inside of \TeX\ even when your computer
>> normally deals with ^{EBCDIC} or some other non-ASCII scheme; the \TeX\
>> software has been set up to convert text files to internal code, and to
>> convert back to the external code when writing text files.
>>
>>
>> the file encoding is failing at the  "convert text files to internal 
>> code" stage which is before the line buffer of characters is consulted 
>> to produce the stream of tokens based on catcodes.
> 
> Yes, OK; so my model isn’t up to it, as Bruno said.
>   … And Jonathan has commented.
> 
> Also pdfTeX has no trouble with an xstring example.
> It just seems pretty crazy that the comments need to be altered
> for that package to be used with XeTeX.
> 

Well.... as long as the Latin-1 accented characters are only in 
comments, it arguably doesn't "really" matter; xetex logs a warning that 
it can't interpret them, but if you know that part of the line is going 
to be ignored anyway, you can ignore the warning.

(pdfTeX doesn't care because it simply reads the bytes from the file; 
any interpretation of bytes as one encoding or another is handled at the 
TeX macro level.)

JK


More information about the XeTeX mailing list.