[XeTeX] anti-xunicode ;-)

Ross Moore ross at ics.mq.edu.au
Tue Jul 25 02:56:17 CEST 2006


Hi Ralf,

On 25/07/2006, at 8:45 AM, Ralf Stubner wrote:

> Hi Ross,
>
> Ross Moore <ross at ics.mq.edu.au> writes:
>
>>> uses the procomposed form in all three cases, even though Gentium  
>>> does
>>> not contain any smart features like mark or ccmp. ('mark' wouldn't
>>> help
>>> here much anyway).
>>
>> This is a property of the font, surely,
>> so cannot be relied upon in general.
>
> I think the only font property involved here is that the precomposed
> glyph exists abd is advertized via the cmap table. Gentium does not
> contain any smart rendering features. I can't tell if that works when
> fonts are accessed via ATSUI, though.

One thing that I noticed in doing this work is that it's
impossible to tell exactly what goes into the PDF file,
as the streams are encoded for compression.
Is there a way to turn off compression ?  (on a Mac)
JK ?

  xetex --help   tells nothing about this.

>
>>> I am not sure, but maybe this means that many of the
>>> '\DeclareUTFcomposite' in xunicode.sty are not necesary.
>>
>> I don't see this as a consequence at all.
>> This is the mechanism whereby LaTeX is told that a particular
>> combination of accent and base-letter is available as a
>> single character, within the declared encoding.
>
> I am just not sure if LaTeX needs to know that a certain accented
> character is available, since XeTeX seems to find that character when
> presented with suitable decomposed input.

That's true when you know that you have a "smart"(-ish) font.

It's not so with old 7- or 8-bit fonts that you may be using
within the same document, and which may still be needed for
accents.
So you need the means to be able to tell the difference.

The over-riding point is that you should not have to change
the LaTeX input body-source because you choose to change fonts.
This should be handled by definitions and declarations in the
preamble, along with markup to say what font/style is desired
at appropriate places within the body.

In other words, "separation of form and content".


> It is a property of the font the <O><brevecomb> is rendered correctly
> even though there is no <Obreve> in the font. It is a property of  
> XeTeX
> that this decomposed form is tried even though the input contains
> precomposed characters.

Not actually any coding in XeTeX does this.
This is done by whatever is the main font-rendering machinery for
your platform.

>>> a) look for a composed form
>>> b) look for a matching 'ccmp' feature
>>> c) look for an applicable 'mark' (or 'mkmk') feature
>>> d) some fallback
>>
>> Sure. But one needs a way of achieving this within the context
>> of multiple fonts in the same document, perhaps with multiple
>> different encodings, and only wanting the fallback to be used
>> in some (but not all) situations.
>
> I think we are discussing different topics here. You are explaining  
> what
> is possible *now*, which is very valuable, while I am doing sort of a
> brainstorming about possible extensions.

Agreed, there is a difference.
However, we don't want people writing ad hoc macros that solve the
problem in a very limited way. Later they will try to extend these
macros into more complicated situations, fail at this, then ask
for help fixing them.

Better is to do it right, in the most general way, first off.


> Right now we have to make
> characters active in order to provide suitable fall backs. I see at
> least two problems with this approach, though:
>
> * With XeTeX one can test for a glyph for the precomposed  
> character. One
>   can also test for a combining accent. One cannot test for the
>   existance of suitable smart font features that make the latter work
>   properly, though. Imagine the case of U+1E0B ḋ together with a  
> font
>   that contains a combining dot accent but no prebuild ḋ. The ideal
>   rendering for this character might have the dot to the left of the
>   ascender of the d. Such a behaviour can be implemeted via a 'mark'
>   feature, but we can't be sure. Just letting XeTeX render  
> <d><dotcomb>
>   might also produce results where TeX's fallback of centering the dot
>   above the <d> would be prefereable.

Such things can only be determined by a human, having a look and
being dissatisfied with what the system can give automatically.

So yes, get around it with the active-character trick.
But make sure that the expansion of the macro it uses allows
for it to be used in more than just the simplest case.
Otherwise it's the same problem as mentioned above.


>
> * It is not extensible to charcters without a precomposed form in
>   Unicode. For example, the E with dot below and acute above mentioned
>   in this thread does not exist in Unicode in precomposed form. One  
> has
>   to input it via combining accents. And I currently don't see any way
>   to define fallback operations on the TeX-level for combining  
> accents.
>   I think this is the place where an additional hook would be useful,
>   but that requires an extension of XeTeX.
>
> The TeX-coding for such a fall back would probably be along the lines
> you explained. I just think that such a fall back should be used as  
> late
> as possible.

Precisely.
When you see that some characters are badly formed for some fonts,
but not others, then adopt a solution that will detect only the thing
that really needs to be fixed, leaving all other situations untouched.

E.g., you only want ḋ to be active when using the particular font
where it looks wrong. Don't make it active globally, but make it so
within critical environments only.

>
> cheerio
> ralf


Hope this helps,

	Ross


------------------------------------------------------------------------
Ross Moore                                         ross at maths.mq.edu.au
Mathematics Department                             office: E7A-419
Macquarie University                               tel: +61 +2 9850 8955
Sydney, Australia  2109                            fax: +61 +2 9850 8114
------------------------------------------------------------------------




More information about the XeTeX mailing list