[XeTeX] a bug of xltxtra.sty

Sun Oct 12 22:06:16 CEST 2008

Hi Ulrike,

On 12/10/2008, at 11:47 PM, Ulrike Fischer wrote:

> Am Sun, 12 Oct 2008 11:50:35 +1100 schrieb Ross Moore:
>
>
>>> There is no simple solution to this problem: with xetex it is  
>>> easy to
>>> use fonts and -- as it relies on unicode -- you no longer have to
>>> bother with all the encoding stuff but the price is that there
>>> aren't good fall-backs if glyphs are missing.
>
>>
>> Yes, this is correct to some extent: if you use fonts that don't have
>> the Unicode characters that you want, then you have to do something
>> about it.
>> But giving up "all the encoding stuff" is quite wrong.
>
> I didn't suggest to "give up" something. I only try to describe the
> situation: The standard LaTeX encodings describe concrete character  
> sets
> and their order. And if you load a T1-font you can be quite sure  
> that it
> contain all the glyphs of the T1-encoding. But with fontspec all fonts
> that you load through its interface are encoded in "EU1" encoding --

   Yes; this is true, and a bit unfortunate. The  fontspec  package
uses a macro  \zf at enc  which expands to either  `EU1' or `U' according
to whether you use the  'lm-default'  or  'cm-default' options,
with Latin Modern being the fall-back. There should be more options
than this...

> regardless of the glyphs you provide. So you no longer can use the
> current setting of the encoding to decide if a glyph is in the font or
> if you should switch to an appropriate accent command or even to  
> another
> font.

  ... allowing a user to set arbitrary values according to which font
is actually to be used, and which characters it actually supports.

In  xunicode.sty  the value of  \UTFencname  is such a variable.
If this macro has a value before the package is loaded, then this
value is preserved. Only if unknown is the value of 'U' set.

The intention is that a font-package can set a value which *does*
represent an encoding in the proper sense; that is, a string which
is meant to be an identifier associated with (identifying ???) the
set of supported characters.

The default of 'U' is meant to indicate 'Unknown' or 'Unspecified',
rather than 'Unicode'.

>>> "You can try to define other commands to get the accents also with
>>> fonts which don't have the actual chars e.g. (the code isn't
>>> perfect, it should only show you some possibilities):
>>
>> Sorry, but going down this path just creates even more
>> incompatibilities.
>> You will surely regret this some time in the future, when you try to
>> reuse older documents that have been prepared this way.
>
> I'm certainly not happy with the code. But I don't see why defining  
> new
> accent commands can lead to incompatibilities. I do find it more
> dangerous to redefining or disable existing commands as this can  
> affect
> fonts where the original accents commands work without proble.

Correct. Also, redefining accents to use 2 characters rather than 1
also affects searching within the final PDF, and the result
of Copy/Paste of content from that PDF. So you really do not want
to do this unless there is no alternative.

> Also what are currently the alternatives?

I agree, that there aren't many alternatives, yet.

> I have seen this question in
> the past weeks three or four times. In each case another font and
> another accent command was involved. Font A has glyphs with accent  
> a but
> not with accent b, Font B has glyphs with accent b but not with accent
> a.

What is needed is for those people who first have need of a particular
font to examine it carefully and note what is properly supported and
what is not. Record this information in a way that will be useful
to others --- simplest would be to record just the differences from
what would be expected from a font that properly implements all possible
Latin-based characters in the Unicode spec.
The user-level commands in  xunicode.sty  give a means to do this:

   \DeclareUTFcharacter
   \UndeclareUTFcharacter

   \DeclareUTFcomposite
   \UndeclareUTFcomposite

   \DeclareUTFmulticomposite

The \Undeclare...  commands allow for overriding the corresponding
  \Declare....  command, when the appropriate characters are in fact
not supported at the assigned Unicode code-point.

This thread has indicated the need for an extra command, such as
    \DeclareUTFnoncomposite
that allows a fallback to LaTeX's  \set at accent  command.
The need for this is because a font may have the proper "Combining
accent" characters, but these may not work correctly in some
letter combinations for which there is an assigned code-point.

This seems to be the case with \.O and \v{O} in Latin Modern,
and probably also for quite a few other characters within
the "Latin Extended-B" range of Unicode.

> Your code is certainly better that my crude versions, but it doesn't
> solve the fundamental problem: That you have to correct individual  
> fonts
> in a quite specific way. If you add or correct accent commands with my
> simple commands or your more sophisticated ones and then decide to use
> another font you problably have to check all the corrections and  
> change
> some of them.

Absolutely.
Until someone has examined the font in detail and provided an  
appropriate
listing of what is, or is not, correctly supported then this is  
necessary.

Of course in practice this only applies when you want to use "unusual"
letter--accent combinations, that were not specifically designed into
the chosen font.

>
> It also doesn't solve all problems to attach the changes to a font
> command like \rmfamily. At first you can't rely on users or  
> packages to
> really use this command to switch to the font,

Agreed, it is better to use a different encoding string. But this only
works if there is a command to switch encodings when the font-face is
changed.  Exactly the same problem --- it cannot be avoided.

> and second it could also
> be that the correction are e.g. needed only for the small caps  
> shape or
> the bold series.

A different encoding string should be attached to these stylistic  
variants,
when the range of characters supported is different.

>
> In the long run it will probably be necessary to do font specific
> corrections, e.g. throught some configuration file for each font or  
> font
> family.

Indeed.
Now that people are using fonts designed for use independent from TeX,
then it is perfectly natural that some font-specific configuration will
be needed. Who will create these configuration files? Surely it must be
someone who has a specific need to have the font supported.

> -- 
> Ulrike Fischer

Hope this helps,

	Ross

------------------------------------------------------------------------
Ross Moore                                       ross at maths.mq.edu.au
Mathematics Department                           office: E7A-419
Macquarie University                             tel: +61 (0)2 9850 8955
Sydney, Australia  2109                          fax: +61 (0)2 9850 8114
------------------------------------------------------------------------