[XeTeX] XeTeX and diacritics

Mon Mar 24 21:12:10 CET 2008

Ross Moore wrote:
>
> By the way, that  junicode  font looks nice.
> It's documentation has some advice that I don't know whether
> it is standard or not.
> e.g.
>
> Characters with diacritics.
> Both Unicode and MUFI contain large numbers
> of characters with diacritics. Make it a habit never to use these  
> “precomposed”
> characters directly; rather use the “plain” character followed by
> a character from the Unicode “Combining Diacritics” range. (This  
> works
> with Word for Windows when Uniscribe is enabled, and also with other
> OpenType-aware applications.) In almost all cases the application  
> will either
> substitute the correct precomposed character or position the diacritic
> correctly.
>
>
> Presumably this is based upon an expectation that the combining
> characters are likely to work more often, rather than expecting
> a font to have all the precomposed ones available.
> Yet JK's remark concerning Gentium contradicts this.
>
> Furthermore, I setup  xunicode.sty  to use precomposed characters
> when there is a Unicode code-point allocated, with combining
> sequences as a fallback --- especially with the standard accents
> (i.e, those that occur in the older latin-based font encodings).
>
> So my question is:
>   Is there really any advantage in following the above advice?
>
> Another question is:
>   Does it matter what is done at the input or macro levels?
>   Does XeTeX produce the same output in the PDF whether a precomposed
>   character or combining sequence is used, when there is a choice?
>
>
>   
The advice in the Junicode doc is specifically for Junicode users, and 
especially for medievalists who need some of the hundreds of precomposed 
characters in the Medieval Unicode Font Initiative 
(http://gandalf.aksis.uib.no/mufi/) encoded in the PUA. I strongly 
oppose including PUA characters directly in documents: to keep documents 
portable I suggest instead using sequences of letter + combining 
diacritic (+ combining diacritic . . .).  Junicode uses ccmp to 
substitute the precomposed character when possible, and uses anchors as 
a fallback. Both of these methods, of course, work brilliantly with 
XeTeX. One of Junicode's goals is to support all of MUFI while making it 
unnecessary to actually insert any PUA characters into a document.

For the common combinations in the Unicode Latin ranges I expect that 
most will just include the precomposed characters, and these will be 
portable enough. But even there I can see some advantages to using the 
combining diacritics.

None of this works in Gentium because it has little or no OpenType 
support. How I wish it did!

Peter Baker