[XeTeX] Hyphenation in Transliterated Sanskrit

In 1993 you invited me to give a talk about hyphenation at RHBNC.  I started
out my lecture by demolishing the old chestnut that British is hyphenated
etymologically while American isn't.  Reality is much more blurry.

Hugh Williamson got it right, as so often:

The customs of word-division derive partly from etymology,
partly from meaning, partly from pronunciation, and partly from
tradition. Effective communication depends upon conventions, in
word-division as elsewhere, and the best conventions are those the
reader is likely to expect. The first part of a divided word should
not mislead the reader about the pronunciation or meaning of the
second part.
Word-division for the benefit of the reader, however, is best
determined by a reader’s perceptions; different customs apply to
different words, and a few simple rules are not enough to find the
right place.
-- Methods of Book Design, pp. 48, 89.

You are perfectly right, though, that a single set of patterns couldn't
support British and American hyphenation at once.  Their hyphenation points
differ in approximately 30% of cases, that is for words that are spelt the


> Jonathan Kew wrote:
> > On 12 Sep 2011, at 08:59, Mojca Miklavec wrote:
> >
> >> Arthur had some plans to cover normalization in hyph-utf8, but I
> >> already hate the idea of duplicated apostrophe,
> >
> > That's a bit different, and hard to see how we could avoid it except via
> special-case code somewhere that "knows" to treat U+0027 and U+2019 as
> equivalent for certain purposes, even though they are NOT canonically
> equivalent characters and would not be touched by normalization.
> >
> > IMO, the "duplicated apostrophe" case is something we have to live with
> because there are, in effect, two different orthographic conventions in use,
> and we want both to be supported. They're alternate spellings of the word,
> and so require separate patterns - just like we'd require for "colour" and
> "color", if we were trying to support both British and American conventions
> in a single set of patterns.
> It may be that you are intentionally putting up a straw-man argument here,
> but if you are not, may I comment that "trying to support both British and
> American conventions in a single set of patterns" would (IMHO) be
> impossible, since British English hyphenation is based primarily on
> etymology whilst American is based on syllable boundaries.  I wish
> I understood more about the "duplicate apostophe" problem, in order
> to be able to offer a more directly relevant (and constructive) comment :
> Google throws up nothing relevant.
