[XeTeX] Strange hyphenation with polyglossia in French

Philip Taylor (Webmaster, Ret'd) P.Taylor at Rhul.Ac.Uk
Thu Oct 21 12:29:04 CEST 2010

Tobias Schoel wrote:

> That's difficult, because languages and scripture are evolving. Is there
> a difference between Montenegrin and Serbian? Will there be a difference
> for German German and Swiss German (the standardardizations of both
> languages are nearly identical, but there is an important typographical
> difference: ß)
> The cedilla/comma below shows the real problem: There is no fixed way of
> writing a letter/sign/glyph (else there wouldn't be different fonts) but
> the unicode model glyph=f(meaning)=F(codepoint) doesn't work that way
> all the times. The relation glyph <-> meaning is more difficult and
> depends on the people.
> So setting up different planes for different languages might be helpful,
> but its positive impact won't be so great, I think. But who knows all
> the problems arising from that?

Thank you for your comments, Toscho : I suppose that my underlying
thesis is that Unicode is a very well-intentioned mistake.  I am
convinced that if the originators of Unicode were to sit around
a table today, they would not come up with the same model as that
with which we have presently to cope.  I also fully accept your
introductory remark, that languages and scripts are constantly
evolving, but I think that they evolve at a sufficiently leisurely
pace that it would not be unduly onerous for those responsible
for maintaining the standard to ensure that it is at all times
reasonably up-to-date.

As to why different planes for different languages (or dialects),
there are many reasons, of which (for me) the two most important
are : (1) all characters required for a single language would form
a contiguous cluster within the character set; and (2) any text encoded
using this system would automatically carry with it implicit <language>
(or <language:dialect>) tags for every stretch of text, no matter
how long or how short.

** Phil.

