[XeTeX] Strange hyphenation with polyglossia in French

Philip Taylor (Webmaster, Ret'd) P.Taylor at Rhul.Ac.Uk
Wed Oct 20 10:44:36 CEST 2010

Khaled Hosny wrote:

> Unicode is full of "compatibility with legacy encodings" non-sense, IMO
> it should just be ignored. AFAIK, comma forms were added in Unicode
> 3.0.0 and that more than 10 years now, if we continue to support the old
> broken practice it will never vanish.

Nor will it vanish just because the TeX community no longer support it.
Unicode is, in many ways, a complete and utter mess [1], but if XeTeX and
LuaTeX are to be based on it, then they must be based on it and not
on some variant that we feel is preferable (for whatever reason).

> Anyway, I were actually concerned
> about fonts that have a loca(lised) feature to map cedilla forms to
> comma forms a practice that is making no one any favour.

If you are transcribing an extant document, and the author used
the comma form, then the transcription should also use the comma
form, otherwise it ceases to be a transcription and becomes
something else.

All IMHO, of course.
** Phil.
[1] IMVVVHO, a successor to Unicode should have one plane per
written language (and perhaps even per dialect thereof), so
that a document written using this encoding will automatically
carry the appropriate language semantics without requiring
explicit tagging.  A font implementing such a system need
be very little larger than existing Unicode fonts, since
glyphs could be recycled across languages where appropriate,
but the underlying encoding should keep the languages
entirely separate.

