[XeTeX] anti-xunicode ;-)

Ross Moore ross at ics.mq.edu.au
Mon Jul 24 05:18:34 CEST 2006


Hi Ralf,

On 23/07/2006, at 10:21 PM, Ralf Stubner wrote:

> Ross Moore <ross at ics.mq.edu.au> writes:

>>   xunicode.sty  could (and perhaps should) be modified to also  
>> include
>> a check for the precomposed glyph, then use the fall-back if not  
>> found.
>
> I don't think this is necessary, since XeTeX actually does this  
> checking
> allready. For example, the eogonek in
>
> \documentclass[a4paper]{article}
> \usepackage{fontspec,xunicode}
> \setromanfont[Mapping=tex-text]{Gentium}
> \begin{document}
> \k{e}
>  \UndeclareUTFcomposite[U]{x0119}{\k}{e}
> \k{e}
> e\char"0328
> \end{document}
>
> uses the procomposed form in all three cases, even though Gentium does
> not contain any smart features like mark or ccmp. ('mark' wouldn't  
> help
> here much anyway).

This is a property of the font, surely,
so cannot be relied upon in general.

> I am not sure, but maybe this means that many of the
> '\DeclareUTFcomposite' in xunicode.sty are not necesary.

I don't see this as a consequence at all.
This is the mechanism whereby LaTeX is told that a particular
combination of accent and base-letter is available as a
single character, within the declared encoding.

>
> BTW, XeTeX also seems to decompose characters while looking for the  
> best
> way to render them. Example: The current development version of FPL  
> Neu
> does not contain an Obreve U+014E. It does contain a combining breve
> U+0306 with suitable 'mark' features, though. This is correctly used
> regardless of whether I input Obreve as precomposed character or in
> decomposed form.

Again, a property of the font surely?


>> The above examples show that XeLaTeX already has the ability to do
>> what the OP requested; namely to have a sequence of fall-backs
>> available for accented characters, according to what is available
>> in the font, or font-encoding.
>
> I am not sure. I think the OP wanted to use Unicode input, not TeX
> commands. Adam has explained the different possibilities that exist  
> when
> one starts from decomposed input, which can be assumed without loss of
> generality:
>
> a) look for a composed form
> b) look for a matching 'ccmp' feature
> c) look for an applicable 'mark' (or 'mkmk') feature
> d) some fallback

Sure. But one needs a way of achieving this within the context
of multiple fonts in the same document, perhaps with multiple
different encodings, and only wanting the fallback to be used
in some (but not all) situations.

Thus with declarations such as:

\catcode `ḍ = \active
\DeclareRobustCommand{ḍ}{%
   \iffontchar\font"1E0D\char"1E0D%
   \else
     ... fall back expansion ...
   \fi}

the problem is to define the "fallback expansion" appropriately.


I'm asserting that, with appropriate commands in the header
to support a (pseudo-)encoding 'UX' say, then this should
be something like:

   {\changeencoding{UX}\d{d}}

to temporarily (note the extra braces) change the encoding,
so as to make use of TeX's default handling of accents by
a box-like construction.


Perhaps better is to first check for a combining character,
and use this --- if it actually produces acceptable results:

   \iffontchar\font"1E0D\relax d\char"1E0D\else
     {\changeencoding{UX}\d{d}}\fi



Alternatively, you might wish to force the fall-back method
with a particular character in a particular font; e.g.,
where ḍ *is* available, but you don't like its appearance.

Here's how to do it, without upsetting how  ḍ  is handled
as a normal character with other fonts in the same document.

{\catcode `ḍ = \active
  \newcommand{ḍ}{{\changeencoding{UX}\d{d}}}
   \global\let\composeddot ḍ
   \gdef\activateddot{\catcode `ḍ = \active\let ḍ\composeddot}
}

Now whenever you switch to that particular font, you need
to also \activateddot ; e.g.

  \newcommand\selectGentium{%
    \setromanfont[Mapping=tex-text]{Gentium}%
    \activateddot}

and only use \selectGentium within a grouping or environment,
so that the \catcode change remains contained.


For a robust version of this use instead:

\catcode `ḍ = \active
  \DeclareRobustCommand{\ddotu}{{\changeencoding{UX}\d{d}}}
   \def\activateddotu{\catcode `ḍ = \active\let ḍ\ddotu}
\catcode `ḍ = 12

so that  \section{All about ḍ.}
will write out the aux-file string as:  All about \ddotu .

Make sure that, when the T-of-C is constructed, there is an
expansion for \ddotu  appropriate to the font being used there.



The extra header declarations required to support this technique
was in a previous email, but I repeat it here for completeness.


% Define a macro to change encoding flag, without requiring
% the support of an  <encoding>enc.def  file.
\makeatletter
  \def\changeencoding#1{\def\cf at encoding{#1}}
\makeatother

% Declare an accent with UX encoding, to fall-back
% to using the  OT1 (or T1) \add at accent  method.
% Make sure this *precedes* loading any packages
% that use \DeclareTextAccent for the same accent;
%  e.g.  hyperref.sty  for PD1-encoding.

%  umlaut  using the "00A8 character
\let\realaccent\"
  \DeclareTextAccent{\"}{UX}{"00A8}
\let\"\realaccent

%  ogonek  using the "02DB character
\let\realaccent\k
  \DeclareTextAccent{\k}{UX}{"02DB}
\let\k\realaccent

%  dot-above  using the "02D9 character
\let\realaccent\.
  \DeclareTextAccent{\.}{UX}{"02D9}
\let\.\realaccent

%  dot-under  using the "002E character (full-stop)
%    or we could use the combining-char at "0323
%    for a lower, larger dot in some fonts
\let\realaccent\d
  \DeclareTextAccent{\d}{UX}{"002E}
  % this next line is because \accent  is *not* used for this
  \expandafter\let\csname UX\string\d\expandafter\endcsname\csname OT1 
\string\d\endcsname
\let\d\realaccent

% It seems to be immaterial whether these are loaded before or after:
\usepackage{fontspec}
\usepackage{xunicode}



> I have the impression that XeTeX implements a)-c), which is really
> great, since that way XeTeX tries to use everything that the font  
> has to
> offer. The current fallback d) is very simple, though. The combining
> accent is just printed as it is in the font. Depending an the precise
> position of this accent, this may or may not look good.

TeX can do better than this, so why not let it?

> If the font
> doesn't even have combining accents (eg Minion Pro), the '.notdef'  
> glyph
> is used.
>
> Here it would be useful if one where able to define a TeX command  
> which
> is used if none of a)-c) has succeeded to find a suitable glyph. I  
> think
> the OP tried to implement something like this, but at a to early  
> stage,
> ie, before b) and c) have been tried, since AFAIK only a) can be done
> directly using TeX commands.

I think the above methods cater for all the possibilities,
and do so in the most compatible way with alternative uses
of the same characters in different fonts, within the same
LaTeX document.


>
> cheerio
> ralf


Hope this helps,

	Ross

------------------------------------------------------------------------
Ross Moore                                         ross at maths.mq.edu.au
Mathematics Department                             office: E7A-419
Macquarie University                               tel: +61 +2 9850 8955
Sydney, Australia  2109                            fax: +61 +2 9850 8114
------------------------------------------------------------------------




More information about the XeTeX mailing list