Dear Phil,<br><br>You should know better. :-)<br><br>In 1993 you invited me to give a talk about hyphenation at RHBNC. I started out my lecture by demolishing the old chestnut that British is hyphenated etymologically while American isn't. Reality is much more blurry.<br>
<br>Hugh Williamson got it right, as so often:<br><br><div style="margin-left: 40px;">The customs of word-division derive partly from etymology,<br>partly from meaning, partly from pronunciation, and partly from<br>tradition. Effective communication depends upon conventions, in<br>
word-division as elsewhere, and the best conventions are those the<br>reader is likely to expect. The first part of a divided word should<br>not mislead the reader about the pronunciation or meaning of the<br>second part.<br>
Word-division for the benefit of the reader, however, is best<br>determined by a reader’s perceptions; different customs apply to<br>different words, and a few simple rules are not enough to find the<br>right place.<br>-- Methods of Book Design, pp. 48, 89.<br>
</div><br><br>You are perfectly right, though, that a single set of patterns couldn't support British and American hyphenation at once. Their hyphenation points differ in approximately 30% of cases, that is for words that are spelt the same.<br>
<br>Dominik<br>
<br><br><div class="gmail_quote">On 12 September 2011 12:09, Philip TAYLOR (Webmaster, Ret'd) <span dir="ltr"><<a href="mailto:P.Taylor@rhul.ac.uk">P.Taylor@rhul.ac.uk</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div class="im"><br>
Jonathan Kew wrote:<br>
> On 12 Sep 2011, at 08:59, Mojca Miklavec wrote:<br>
><br>
</div><div class="im">>> Arthur had some plans to cover normalization in hyph-utf8, but I<br>
>> already hate the idea of duplicated apostrophe,<br>
><br>
> That's a bit different, and hard to see how we could avoid it except via special-case code somewhere that "knows" to treat U+0027 and U+2019 as equivalent for certain purposes, even though they are NOT canonically equivalent characters and would not be touched by normalization.<br>
><br>
> IMO, the "duplicated apostrophe" case is something we have to live with because there are, in effect, two different orthographic conventions in use, and we want both to be supported. They're alternate spellings of the word, and so require separate patterns - just like we'd require for "colour" and "color", if we were trying to support both British and American conventions in a single set of patterns.<br>
<br>
</div>It may be that you are intentionally putting up a straw-man argument here,<br>
but if you are not, may I comment that "trying to support both British and<br>
American conventions in a single set of patterns" would (IMHO) be<br>
impossible, since British English hyphenation is based primarily on<br>
etymology whilst American is based on syllable boundaries. I wish<br>
I understood more about the "duplicate apostophe" problem, in order<br>
to be able to offer a more directly relevant (and constructive) comment :<br>
Google throws up nothing relevant.<br>
<font color="#888888"><br>
Philip Taylor<br>
</font><div><div></div><div class="h5"><br>
<br>
--------------------------------------------------<br>
Subscriptions, Archive, and List information, etc.:<br>
<a href="http://tug.org/mailman/listinfo/xetex" target="_blank">http://tug.org/mailman/listinfo/xetex</a><br>
</div></div></blockquote></div><br>