[OS X TeX] Invisible character

Jonathan Kew jonathan_kew at sil.org
Mon Jun 26 09:39:13 CEST 2006


On 26 Jun 2006, at 1:18 am, Ross Moore wrote:

> Does TeX need to have the \catcode idea extended
> to have flexibility with more characters ?
>
> With 32-bit and 64-bit machines now quite common (indeed standard),
> it shouldn't be too hard to implement this.
> Certainly it would need a new primitive, \UTFcatcode say,
> that would consider multiple bytes on input, and either set flags
> within the extra (currently unused) bytes, or adjust the
> normal \catcode of each byte in some appropriate way.
> Interesting concept.

Forget the bytes; think in terms of Unicode characters. And then set  
the \catcode for a *character*, whether that character was  
represented in the input as a single (ASCII) byte or a multi-byte  
UTF-8 sequence (or a UTF-16 value, for that matter).

So you can say \catcode`\क = 11 or \catcode`\你 = 12 or whatever,  
and it works.

Which happens to be how xetex does it. :)

>
> One day we'll want to move to UTF16 input as well.
> Thus TeX's method of tokenisation really will need
> to be changed to accommodate this.

xetex reads UTF-16 as well as UTF-8, and it makes no difference at  
all to macro processing, catcodes, etc., as everything works in terms  
of the Unicode characters.

JK


------------------------- Info --------------------------
Mac-TeX Website: http://www.esm.psu.edu/mac-tex/
          & FAQ: http://latex.yauh.de/faq/
TeX FAQ: http://www.tex.ac.uk/faq
List Archive: http://tug.org/pipermail/macostex-archives/




More information about the macostex-archives mailing list