[XeTeX] anti-xunicode ;-)

Tue Jul 25 00:45:44 CEST 2006

Hi Ross,

Ross Moore <ross at ics.mq.edu.au> writes:

>> uses the procomposed form in all three cases, even though Gentium does
>> not contain any smart features like mark or ccmp. ('mark' wouldn't  
>> help
>> here much anyway).
>
> This is a property of the font, surely,
> so cannot be relied upon in general.

I think the only font property involved here is that the precomposed
glyph exists abd is advertized via the cmap table. Gentium does not
contain any smart rendering features. I can't tell if that works when
fonts are accessed via ATSUI, though.

>> I am not sure, but maybe this means that many of the
>> '\DeclareUTFcomposite' in xunicode.sty are not necesary.
>
> I don't see this as a consequence at all.
> This is the mechanism whereby LaTeX is told that a particular
> combination of accent and base-letter is available as a
> single character, within the declared encoding.

I am just not sure if LaTeX needs to know that a certain accented
character is available, since XeTeX seems to find that character when
presented with suitable decomposed input.

>> BTW, XeTeX also seems to decompose characters while looking for the  
>> best
>> way to render them. Example: The current development version of FPL  
>> Neu
>> does not contain an Obreve U+014E. It does contain a combining breve
>> U+0306 with suitable 'mark' features, though. This is correctly used
>> regardless of whether I input Obreve as precomposed character or in
>> decomposed form.
>
> Again, a property of the font surely?

It is a property of the font the <O><brevecomb> is rendered correctly
even though there is no <Obreve> in the font. It is a property of XeTeX
that this decomposed form is tried even though the input contains
precomposed characters.

>> a) look for a composed form
>> b) look for a matching 'ccmp' feature
>> c) look for an applicable 'mark' (or 'mkmk') feature
>> d) some fallback
>
> Sure. But one needs a way of achieving this within the context
> of multiple fonts in the same document, perhaps with multiple
> different encodings, and only wanting the fallback to be used
> in some (but not all) situations.

I think we are discussing different topics here. You are explaining what
is possible *now*, which is very valuable, while I am doing sort of a
brainstorming about possible extensions. Right now we have to make
characters active in order to provide suitable fall backs. I see at
least two problems with this approach, though:

* With XeTeX one can test for a glyph for the precomposed character. One
  can also test for a combining accent. One cannot test for the
  existance of suitable smart font features that make the latter work
  properly, though. Imagine the case of U+1E0B ḋ together with a font
  that contains a combining dot accent but no prebuild ḋ. The ideal
  rendering for this character might have the dot to the left of the
  ascender of the d. Such a behaviour can be implemeted via a 'mark'
  feature, but we can't be sure. Just letting XeTeX render <d><dotcomb>
  might also produce results where TeX's fallback of centering the dot
  above the <d> would be prefereable.

* It is not extensible to charcters without a precomposed form in
  Unicode. For example, the E with dot below and acute above mentioned
  in this thread does not exist in Unicode in precomposed form. One has
  to input it via combining accents. And I currently don't see any way
  to define fallback operations on the TeX-level for combining accents.
  I think this is the place where an additional hook would be useful,
  but that requires an extension of XeTeX.

The TeX-coding for such a fall back would probably be along the lines
you explained. I just think that such a fall back should be used as late
as possible.

cheerio
ralf