[tex-hyphen] Unicode code points or UTF-8 codes?

Mojca Miklavec mojca.miklavec.lists at gmail.com
Wed Apr 13 01:01:31 CEST 2016


On 13 April 2016 at 00:41, Claudio Beccari wrote:
> How must be coded in a *.pat.txt file the address of a glyph thet has a
> pretty high code point?
> For example suppose you want to set into a pat.txt file the combining acute
> accent: is had code point U+0301; speaking TeX or LuaTeX, must that glyph be
> inserted as ^^^^0301 or the utf-8 way ^^^^cc81?

None of those. You should not use any "carets" (no literal "^"
characters should be used). Just put the combining character in the
patterns.

Well, you'll probably end up with one weird-looking pattern "8́"
(looking like "eight with acute" and in fact saying "do not hyphenate
before the combining acute accent"), but such is life ...

We should probably nevertheless do some double-checking to make sure
that XeTeX doesn't do any strange normalization at this point.

> Where and how do you assign it a non zero lccode?

The codes are determined by the Unicode standard and you can find the
definitions in unicode-letters.tex (luatex-unicode-letters.tex). We
would put those definitions to loadhyph-<languagename>.tex. I'm not
yet sure how to handle this properly for LuaTeX, but I believe that
*we* (me and Arthur) have some homework to do.

Mojca



More information about the tex-hyphen mailing list