[XeTeX] Hyphenation in Transliterated Sanskrit

Mon Sep 12 09:59:39 CEST 2011

On Mon, Sep 12, 2011 at 09:36, Yves Codet wrote:
> Hello.
>
> A question to specialists, Arthur and Mojca maybe :) Is it necessary to have two sets of hyphenation rules, one in NFC and one in NFD? Or, if hyphenation patterns are written in NFC, for instance, will they be applied correctly to a document written in NFD?

That depends on engine.

>From what I understand, XeTeX does normalize the input, so NFD should
work fine. But I'm only speaking from memory based on Jonathan's talk
at BachoTeX. I might be wrong. I'm not sure what LuaTeX does. If one
doesn't write the code, it might be that no normalization will ever
take place.

I can also easily imagine that our patterns don't work with NFD input
with Hyphenator.js. I'm not sure how patterns in Firefox or OpenOffice
deal with normalization. I never tested that.

But in my opinion engine *should* be capable of doing normalization.
Else you can easily end up with exponential problem. A patterns with 3
accented letters can easily result in 8 or even more duplicated
patterns to cover all possible combinations of composed-or-decomposed
characters.

Arthur had some plans to cover normalization in hyph-utf8, but I
already hate the idea of duplicated apostrophe, let alone all
duplications just for the sake of "stupid engines that don't
understand unicode" :).

Mojca