[XeTeX] Lowercase Unicode code points in hyphenation patterns

Doug McKenna doug at mathemaesthetics.com
Sun Nov 24 01:06:47 CET 2019


When the LaTeX format is built, there are tests for whether or not a Unicode-aware TeX engine is doing the work.  I presume that XeTeX is such a Unicode-aware engine, though I'm not familiar with what the definition of "Unicode-aware TeX engine" actually is (separate issue).

During the input of various hyphenation pattern files (a group for each language code), the first such file that uses non-ASCII Unicode code points is for Ancient Greek, in the file

/usr/local/texlive/2017/texmf-dist/tex/generic/hyph-utf8/patterns/texhyph-grc.tex

at line 61, which starts out

α1 ε1 η1 ι1 ο1 υ1 ω1 ϊ1 ...

TeX's code and specification says that only lowercase letters can appear in pattern words, and the definition within TeX's source code of a lowercase letter is any entry in the \lccode table that, when indexed by a character, delivers itself.

But as near as I can tell, during the building of the LaTeX format (i.e., running "latex.ltx") there is no TeX source code that installs any of these Greek letters into the \lccode table.  Therefore, I'm concluding that the XeTeX engine does this itself when it initializes, rather than awaiting any TeX source code to do it.

But there are a whole lot of lowercase letters in Unicode, so I'm wondering how XeTeX determines legal lowercase letters for initial pattern files?

I've tried looking at some version of the xetex.web code, but without illumination, I'm afraid.

TIA,

Doug McKenna





More information about the XeTeX mailing list