[XeTeX] xunicode.sty -- pinyin and TIPA shortcuts

Ross Moore ross at ics.mq.edu.au
Wed Apr 5 01:36:22 CEST 2006


On 05/04/2006, at 1:49 AM, Robert Spence wrote:

> >> b) I can't seem to access LATIN SMALL LETTER U WITH DIAERESIS AND
> >> CARON (Unicode 01DA) via the current shortcut \v{\"u} (which puts
> >> the caron _after_ the dieresised u), and am too lazy to type
> >> \textdieresiscaron{u} each time, but I discovered that if I put
> >>   \DeclareUTFcomposite[\UTFencname]{x01DA}{\v}{v}
> >> in the preamble (or even in the body) of my document then I can use
> >> the shortcut \v{v} without any problems.  (Am I missing something
> >> with \v{\"u}?  I tried just about every other variation I could
> >> think of, but to no avail.)

 > Interesting. \v{\"u} works fine for me with some fonts -- e.g.,
 > Lucida Grande, Charis SIL -- but not others; it seems that it depends

Yes, it works for me too, with  Lucida Grande.
But  \"{\v u} doesn't work so well -- perhaps to be expected, since
the order of accents does matter.

 > on the level of Unicode support in the font. In particular, it
 > doesn't work with the Latin characters in OS X Chinese fonts(!).


> > I'm not sure *why* it's behaving this way.... maybe Ross would
> > understand exactly what characters xunicode is trying to access in
> > each case.

These lines tell you what characters are used with an accent:

\DeclareEncodedCompositeCharacter{\UTFencname}{\"}{0308}{00A8}  %  
Combining diaeresis

\DeclareEncodedCompositeCharacter{\UTFencname}{\v}{030C}{02C7}  %  
Combining caron

so that  \"u  gives the same as   u\char"0308
and \v{\"u}  might be expected to give  u\char"0308\char"030C .

The 2nd group in each case (i.e., {00A8} and {02C7}) tells what to use
in those rare cases where there is no argument to the accent command;
i.e, you just what the accent as a stand-alone character.


However, there is also this line:

    \DeclareUTFcomposite[\UTFencname]{x01DA}{\v}{\"u}

which results in  \char"01DA  overriding the generic treatment of \v 
{\"..}
when the letter is a 'u'.
(With Lucida Grande, the result of the two methods looks identical to  
me.)



Does this answer your question adequately ?

BTW, \char"0308  seems to be in quite a lot of the Mac fonts,
but  \char"030C  is not in many at all;
and  \char"01DA  seems to be quite well supported too.


>
> Does it perhaps have something to do with surrogates?  I don't know  
> enough about Unicode, I'm afraid.

Neither do I.
I just looked for what seemed to be the best way to implement
the intention of existing LaTeX commands from various standard
packages, using the "Character Palette" to see what sensible
options are available.

It's quite possible that  xunicode.sty  has some errors, where
I didn't make the best choice. It's up to the experts in appropriate
fields to inform me (or Jonathan) of any such mistakes.


> > It might make good sense to have a little "unicode-pinyin" package
> > that gives you more convenient ways to access characters used in
> > Pinyin transcription, such as using v as a shorthand for u-dieresis.
> > It just doesn't belong in xunicode.sty, IMO.
>
> Point taken.  I was thinking of writing something like that, a kind  
> of patch to run after loading xunicode.sty---just hadn't got around  
> to it, and still don't understand all the implications of what's in  
> xunicode.sty anyway.

Yes. Any variation that reflects a particular usage that
need not be 'Universal' should be done by loading a package
*after*  xunicode.sty  itself.

By all means, use the same commands that  xunicode.sty  uses
to declare the associations between macros and Unicode points.

Examples of the main commands are:

\DeclareUTFcharacter[\UTFencname]{x01CC}{\nj}
\DeclareUTFcomposite[\UTFencname]{x01CD}{\v}{A}
\DeclareEncodedCompositeCharacter{\UTFencname}{\u}{0306}{02D8}  %  
Combining breve
\DeclareEncodedCompositeAccents{\UTFencname}{\texthookcircum}{0309} 
{0302}

Note that these commands setup associations that depend on the encoding,
via the  [\UTFencname]  optional parameter.


There is also  \UndeclareUTFcharacter  and  \UndeclareUTFcomposite   
which
can be used to cancel declarations when the code-points are not  
supported
in the font being used.

e.g.
\UndeclareUTFcomposite[Pin]{x01DA}{\v}{\"u}

would allow  u\char"0308\char"030C  to be used instead of  \char"01DA ,
when using encoding  Pin .


More generally,
    \UndeclareUTFcharacter[Pin]{x01CC}{\nj}

would mean that (Xe)LaTeX might throw up a warning message when \nj
is used with encoding 'Pin' , rather than inserting  \char"01CC
assuming (wrongly) that there is support in the font for it.
Without the  \Undeclare...  the only way to know that there's a possible
problem is to notice characters missing from the final PDF output.


Implicit here is that, if you are going to make non-universal  
declarations,
then it's a good idea to:
   1.  *change the encoding* first
   2.  load the  xunicode.sty  package
   3.  make your changes in the encoding

e.g.
  \newcommand{\UTFencname}{Pin}
  \usepackage{xunicode}
  \usepackage{unicode-pinyin}


>
> Thanks again for your time,


Hope this helps,

	Ross

>
> --- Robert Spence
> Applied Linguistics
> Saarland University
> Germany



------------------------------------------------------------------------
Ross Moore                                         ross at maths.mq.edu.au
Mathematics Department                             office: E7A-419
Macquarie University                               tel: +61 +2 9850 8955
Sydney, Australia  2109                            fax: +61 +2 9850 8114
------------------------------------------------------------------------




More information about the XeTeX mailing list