[XeTeX] anti-xunicode ;-)

Adam Twardoch list.adam at twardoch.com
Sun Jul 23 15:09:56 CEST 2006


Ralf Stubner wrote:
> I am not sure if it where legal to
> reorder such a decomposed sequence.
>   
Unicode assigns different combining classes to different diacritical 
marks, and prescribe a canonical order of marks. For example, the 
canonical order for the Yoruba character we’ve been discussing is
\u0045\u0323\u0301 and not \u0045\u0301\u0323.

However, both sequences are canonically equivalent, and Unicode 
recommends: "Rendering systems should handle any of the canonically 
equivalent orders of combining marks."

(See Unicode section 3.11 and 5.13, PDFs available from 
http://www.unicode.org/versions/Unicode4.1.0/ ).

A rendering system can reorder marks to their canonical form or can just 
attempt to render them as they are. Well-made fonts should not rely on 
the application doing the reordering, and should have provisions for all 
mark combinations necessary. Note that for marks that are either all 
above or all below, the sequence in which they are typed is significant. 
For example, E followed by acute followed by grave should be rendered so 
that the grave is above the acute, but E followed by grave followed by 
acute should be rendered so that the acute is above the grave. So these 
would not be canonically equivalent. But combinations of marks that have 
different locations, such as one below and one above, as in the example 
I used, would be. \u0045\u0323\u0301 and \u0045\u0301\u0323 should be 
processed and rendered the same.

Of course XeTeX would do good if it did canonical reordering of marks. 
As I’ve written, *well-made* fonts should not rely on marks being 
canonically ordered, but some fonts will only contain rendering rules 
for canonically ordered marks. Canonical reordering surely would 
minimize the risk of bad renderings.

Regards,
Adam


-- 

Adam Twardoch
http://www.twardoch.com/




More information about the XeTeX mailing list