[XeTeX] xunicode.sty bug

Jonathan Kew jonathan_kew at sil.org
Tue Jul 18 11:49:26 CEST 2006


On 18 Jul 2006, at 3:01 am, Ross Moore wrote:

> Hi Toralf,
>
> On 18/07/2006, at 1:38 AM, Toralf Senger wrote:
>
>> Hi,
>>
>> There is a tiny Bug with xunicode.sty: \textminus is wrongly mapped
>> to the
>> hyphen-minus glyph by xunicode.sty. The 'minus' of \textminus must
>> have the width of 'plus'
>>
>> Before Ross issues an updated version of xunicode.sty, you can
>> patch it yourself by replacing
>>
>> \DeclareUTFcharacter[\UTFencname]{x00AD}{\textminus}
>> by
>> \DeclareUTFcharacter[\UTFencname]{x2212}{\textminus}
>
> Yep; I accept that that's a mistake.

Yes, U+2212 would be the correct choice.

You might want to consider adding a "fallback" mechanism to  
xunicode.sty's mappings (not just for this example, but for general  
use). If the current font doesn't support U+2212, it might be nice to  
use U+002D as a fallback.

If the expansion of \textminus led to something like

   \iffontchar"2212\font \char"2212 \else \char"2D \fi

then I think it ought to work OK. Don't know exactly how you'd want  
to specify that, though (if you think it's worth doing at all).

> I've always been confused about the roles of the
> various dash/hyphen/minus characters:
>
> Ux002D  hypen-minus    (the ASCII  -  character)

Ambiguous semantics, typically rendered with an "average" width; not  
preferred in Unicode text, but of course very common because most  
legacy standards don't distinguish varieties of dash.

> Ux00AD  soft hyphen

This is the Unicode character that means essentially the same as  
TeX's "\-". A non-printing layout control that indicates a potential  
break point, not a visible character in its own right. If the line  
actually breaks there, the appropriate visible manifestation is  
script/language-dependent; a common default would be to insert U+2010  
before the break, but this is not universally correct.

> Ux2010  hyphen

The preferred character for a hyphen in text.

> Ux2011  non-breaking hyphen

Similar, but does not permit a line-break afterwards.

> Ux2012  figure dash      how wide is this ???

Same width as numerals (if monospaced). Same ambiguous semantics as  
hyphen-minus.

> Ux2013  endash           as from  --
> Ux2014  emdash           as from  ---
> Ux2015  horizontal bar   how wide is this ???

Typically, longer than an em-dash. Used to introduce speech in some  
typographic traditions (not normally used in English).

> Ux2212  minus
>
> UxFE58  small emdash
> UxFE63  small hyphen-minus
> UxFF0D  full-width hyphen-minus

These are for compatibility with older CJK standards, and can  
probably be ignored for most purposes.

>
> In particular, just what distinguishes "soft" hyphen
> from the others.

see above

>   Are there LaTeX macro-names for the
> different kinds of hyphen, in any commonly-used package?
>
>   amsmath  has a \nobreakdash  command that must be used
> *before* a hyphen or dash to suppress the possibility of
> a line-break. But it doesn't insert a character itself.
> It's definition could be rewritten to turn a Ux002D
> into Ux2011 .

I suppose in theory this should add U+2060 WORD JOINER after things  
like en- and em-dash. But I don't think there's much point, at least  
from a xetex point of view; it will respect the penalties that  
amsmath is presumably using. And if you were using a font that didn't  
include U+2060, you could get .notdef showing up instead.

>
> textcomp.sty  has two extra dashes which may correspond
> to the figure-dash and horiz-bar:
>    \textthreequartersemdash
>    \texttwelveudash
> If so, which is which, and why ?

AFAICT, figure dash would probably be shorter than either of these,  
in most cases; and horizontal bar would be longer!

> Also, looking at section 7.5.4 (textcomp) of The LaTeX
> Companion, 2nd ed.  I see many more LaTeX commands that
> xunicode.sty does not yet support.
> e.g.  \capitalacute  (as distinct from \' ) and similar
> commands for other accents
>   --- for XeTeX I guess these should be the same, leaving it
> up to the font to decide on the rendering.

Probably. What is the difference supposed to be? (I don't have a  
LaTeX Companion.)

JK



More information about the XeTeX mailing list