[luatex] BOM

Taco Hoekwater taco at elvenkind.com
Thu May 14 19:22:07 CEST 2009


Yannis Haralambous wrote:
> This has probably already been brought up, but please take care of the 
> BOM character: it must
> be ignored by the LuaTeX engine.
> 
> Here is why: BOM is useful when writing in UCS-16 (or UTF-16) to find 
> out whether the file is written in
> big-endian or small-endian way. In UTF-8 it makes no sense because UTF-8 
> is written bite-wise, in logical order.
> 
> Nevertheless software like M$ Notepad (under Vista) will systematically 
> insert a BOM at file begin (and I didn't found any way to prevent it).
> 
> Other text editors, such as Ultra-Edit (Win) or BBEdit (Mac) will let 
> the user choose, but by default they will still insert a BOM.
> 
> LaTeX then sees a character at file begin which is not a backslash or a 
> comment, and stops because there should
> be no text character before \begin{document}.
> 
> If one could, once and for all, decide to ignore that character, it 
> would be the best. Using lua code for that would be a waste of time and 
> energy....

I could set

   \catcode "FEFF = 9

as part of the initex initialization code. That would do the trick, yes?

Best wishes,
Taco



More information about the luatex mailing list