[tex-hyphen] How to output a character whose catcode is active?

Arthur Reutenauer arthur.reutenauer at normalesup.org
Fri Mar 1 12:36:44 CET 2013


  I don't think you can, that's the whole point of being active, after
all.

  That's something I commented on during my talk about hyph-utf8 at
BachoTeX in 2009: it's a small miracle that the approach we used worked
at all, as we relied on a coincidental property of two completely
distinct and unrelated sets of byte values:

  1. The initial bytes of UTF-8 byte sequences
  2. The byte values used for accented and related letters in TeX's font encodings

  Set 1 is (more or less) the range [0xC0, 0xDF] for Unicode characters
whose UTF-8 representation uses two bytes, while set 2 is (again, more
or less) the range [0xE0, 0xFF] for most of the font encodings used in
the TeX world (corresponding -- more or less -- to the byte positions of
these characters in ISO 8859 character sets).

  As can be seen, these two sets have the remarkable property that
they're disjoint.  That is extremely useful since we can make the bytes
(i. e., characters in 8-bit TeX variants) in the set 1 active, while the
ones in set 2 are used as printable letters.  That property is, however,
a pure coincidence.

  For Georgian, it so happens that UTF-8 encodes its characters on 3
bytes (the Unicode characters that are encoded on 2 bytes in UTF-8 are
in the range U+0080 - U+07FF; the Georgian block is beyond that, at
U+10A0 - U+10FF).  And the initial bytes of 3-byte UTF-8 sequences are
in [0xE0, 0xEF].  Clash.

  There might be a dirty trick to make that work, and I would be really
interested to find out, but unless you want to spend many sleepless
nights I suggest you use the pTeX approach ;-)

	Arthur


More information about the tex-hyphen mailing list