[tex-hyphen] Unicode code points or UTF-8 codes?

Arthur Reutenauer arthur.reutenauer at normalesup.org
Wed Apr 13 01:28:19 CEST 2016


> Well, you'll probably end up with one weird-looking pattern "8́"
> (looking like "eight with acute" and in fact saying "do not hyphenate
> before the combining acute accent"), but such is life ...

  Exactly.  For Latin-script languages there shouldn’t be too many of
those anyway.  In fact, I fail to see why it should be necessary at all
to have a digit in the middle of a combining sequence.

> We should probably nevertheless do some double-checking to make sure
> that XeTeX doesn't do any strange normalization at this point.

  Maybe.  It’s hard to say without a concrete use case.  But the
sequence <00E6 LATIN SMALL LETTER AE, 0301 COMBINING ACUTE ACCENT> (ǽ)
we’ve been discussing recently won’t cause any problem.

> The codes are determined by the Unicode standard and you can find the
> definitions in unicode-letters.tex (luatex-unicode-letters.tex). We
> would put those definitions to loadhyph-<languagename>.tex. I’m not
> yet sure how to handle this properly for LuaTeX, but I believe that
> *we* (me and Arthur) have some homework to do.

  This should be taken care of by the files produced by the LaTeX team,
that are also used by plain Tex.

	Best,

		Arthur


More information about the tex-hyphen mailing list