[XeTeX] anti-xunicode ;-)

Sun Jul 23 14:21:08 CEST 2006

Ross Moore <ross at ics.mq.edu.au> writes:

>   xunicode.sty  currently implements fall-backs to combining characters
> when the precomposed glyph isn't declared to be available.
>
> You can use the following LaTeX coding to test this feature.
>
> \documentclass{article}
> \usepackage{fontspec}
> \usepackage{xunicode}
> \setromanfont[Mapping=tex-text]{Lucida Grande}
> \begin{document}
> \"u
> \UndeclareUTFcomposite[U]{x00FC}{\"}{u}
> \"u
> \end{document}
>
> Here the first \"u gets set as  Ux00A8
> while the 2nd gives the combination  u Ux0308 .
>
>
>   xunicode.sty  could (and perhaps should) be modified to also include
> a check for the precomposed glyph, then use the fall-back if not found.

I don't think this is necessary, since XeTeX actually does this checking
allready. For example, the eogonek in

\documentclass[a4paper]{article}
\usepackage{fontspec,xunicode}
\setromanfont[Mapping=tex-text]{Gentium}
\begin{document}
\k{e}
 \UndeclareUTFcomposite[U]{x0119}{\k}{e}
\k{e}
e\char"0328
\end{document}

uses the procomposed form in all three cases, even though Gentium does
not contain any smart features like mark or ccmp. ('mark' wouldn't help
here much anyway). I am not sure, but maybe this means that many of the
'\DeclareUTFcomposite' in xunicode.sty are not necesary.

BTW, XeTeX also seems to decompose characters while looking for the best
way to render them. Example: The current development version of FPL Neu
does not contain an Obreve U+014E. It does contain a combining breve
U+0306 with suitable 'mark' features, though. This is correctly used
regardless of whether I input Obreve as precomposed character or in
decomposed form. 

> But this thread is about having yet another kind of fall-back,
> whereby TeX positions the accent-glyph over a letter, as it
> used to do, and still does with OT1 encoding.
[...]
> The above examples show that XeLaTeX already has the ability to do
> what the OP requested; namely to have a sequence of fall-backs
> available for accented characters, according to what is available
> in the font, or font-encoding.

I am not sure. I think the OP wanted to use Unicode input, not TeX
commands. Adam has explained the different possibilities that exist when
one starts from decomposed input, which can be assumed without loss of
generality: 

a) look for a composed form
b) look for a matching 'ccmp' feature
c) look for an applicable 'mark' (or 'mkmk') feature
d) some fallback

I have the impression that XeTeX implements a)-c), which is really
great, since that way XeTeX tries to use everything that the font has to
offer. The current fallback d) is very simple, though. The combining
accent is just printed as it is in the font. Depending an the precise
position of this accent, this may or may not look good. If the font
doesn't even have combining accents (eg Minion Pro), the '.notdef' glyph
is used.

Here it would be useful if one where able to define a TeX command which
is used if none of a)-c) has succeeded to find a suitable glyph. I think
the OP tried to implement something like this, but at a to early stage,
ie, before b) and c) have been tried, since AFAIK only a) can be done
directly using TeX commands.

cheerio
ralf

PS: I have only found on situation where XeTeX could be more intelligent
during a)-c): Consider a font with E, Egrave, combining grave, and
combining dot bellow but without any 'ccmp' or 'mark' feature. If I
input <E><gravecomb><dotbellowcomb>, then <Egrave> is used together with
the fallback for <dotbellowcomb>. However, If I input
<E><dotbellowcomb><gravecomb>, then the fallback is used for both
<dotbellowcomb> and <gravecomb>. I am not sure if it where legal to
reorder such a decomposed sequence.