[luatex] BOM

Javier Bezos listas at tex-tipografia.com
Fri May 15 10:12:30 CEST 2009


>>>   \catcode "FEFF = 9
>>>
>>> as part of the initex initialization code. That would do the trick, yes?
>> Yes, provided the source is in uft-8. 
> 
> That is irrelevant. \catcodes in luatex are applied after file reading,
> and they apply to Unicode characters. The byte representation of BOM in
> various encodings may differ, but they certainly all map to U+FEFF.

You are presuming the file has been preprocessed somehow, but 
please, continue reading my post:

   ... except if we first set how the file should be read -- since
   the BOM must be the very first thing in the file, this means we
   need to do some kind of preprocessing, which is not always
   desirable or convenient.

Indeed, after the preprocessing it's irrelevant, but then
the catcode is irrelevant, too, because the BOM string of
bytes may be discarded at that stage. ¿Can this preprocessing be
done from inside luatex? Very likely, with a pseudo-utf8 like
that in luainputenc, but I'm not sure if in this particular
case there can appear undesirable side effects.

Javier


More information about the luatex mailing list