[XeTeX] [OT] character encoding ideas [was: Re: Strange hyphenation with polyglossia in French]

Jonathan Kew jfkthame at googlemail.com
Thu Oct 21 13:02:40 CEST 2010

On 21 Oct 2010, at 11:29, Philip Taylor (Webmaster, Ret'd) wrote:
> As to why different planes for different languages (or dialects),
> there are many reasons, of which (for me) the two most important
> are : (1) all characters required for a single language would form
> a contiguous cluster within the character set; and (2) any text encoded
> using this system would automatically carry with it implicit <language>
> (or <language:dialect>) tags for every stretch of text, no matter
> how long or how short.

Sorry, Phil, but I don't think such a scheme would be even remotely workable. Aside from the sheer number of such "planes" that would have to be defined and supported (have you browsed http://www.ethnologue.com/ lately? And that's just for the living languages...), it would be utterly impossible to reach any consensus regarding where the dividing lines should be drawn, and we'd have a massive increase in acrimonious political debates regarding linguistic and cultural identity.

Unicode may be far from perfect, representing as it does a compromise between many often-conflicting requirements, but it's a more reasonable approach than that.


