[XeTeX] xunicode.sty -- pinyin and TIPA shortcuts

Robert Spence spence at saar.de
Tue Apr 4 05:54:26 CEST 2006


Dear XeTeXnicians,

These are two "sorcerer's apprentice" type questions; I've only been  
using XeTeX for a few days (---but what a wonderful few days they've  
been!---), and I do hope I'm not guilty here of "rushing in where  
angels fear to tread"...

QUESTION 1) Pinyin Keyboarding Shortcuts (cf. the thread "[XeTeX]  
Chinese: vertical typesetting, pinyin tones, and Japanese macrons?"  
from mid-July 2005)

Might it perhaps be an idea in a future version of xunicode.sty to  
change line 1131 of version 0.5 [2005/02/26] from
   \DeclareUTFcomposite[\UTFencname]{x01DA}{\v}{\"u}
to
   \DeclareUTFcomposite[\UTFencname]{x01DA}{\v}{v}
?

REASONING:
a) I find the easiest way of getting tone marks for pinyin is to use  
old-style keyboarding shortcuts like \=a \'a \v{a} \`a with an  
appropriate local font setting, e.g.
    \newfontinstance\pinyin{STHeiti}%gives a nicer lowercase g than  
STKaiti
    \pinyin
(and I guess the line
    \defaultfontfeatures{Mapping=tex-text}
in the preamble is also pulling its weight here...).

b) I can't seem to access LATIN SMALL LETTER U WITH DIAERESIS AND  
CARON (Unicode 01DA) via the current shortcut \v{\"u} (which puts the  
caron _after_ the dieresised u), and am too lazy to type  
\textdieresiscaron{u} each time, but I discovered that if I put
   \DeclareUTFcomposite[\UTFencname]{x01DA}{\v}{v}
in the preamble (or even in the body) of my document then I can use  
the shortcut \v{v} without any problems.  (Am I missing something  
with \v{\"u}?  I tried just about every other variation I could think  
of, but to no avail.)

c) The Chinese themselves routinely use a lowercase v to stand for a  
lowercase u with dieresis, for example in internet addresses; it  
makes sense, because v is the only letter of the Roman alphabet they  
don't need, and they only have one sound for which they don't have a  
simple Roman letter without (non-tone) diacritic available; the fact  
that v comes directly after u in the Roman alphabet is an added  
bonus, because the letter you're trying to typeset is conceptually  
like "u, _plus_ something"; and in any case, it was only very  
recently (by Chinese time-measuring standards) that u and v stopped  
being just variant forms of the same letter...

d) v is used for u-with-dieresis in the macros in Werner Lemberg's  
pinyin.sty (/usr/local/teTeX/share/texmf.local/tex/latex/cjk/texinput/ 
pinyin.sty), allowing e.g. \nv3 to be used to get n plus u-with- 
dieresis-and-caron in LaTeX --- although whether the result is  
acceptable depends on the font you're using; as Werner Lemberg says  
at lines 223--224 of pinyin.sty [ Version 4.6.0 (11-Aug-2005) ]:
   % the previous definitions are almost trivial. The only tricky  
macro is the
   %     following one.


QUESTION 2) TIPA Keyboarding Shortcuts (I think this relates to a  
thread initiated by Ross Moore in late July 2004:
[XeTeX] New feature request for XeTeX
where there was a discussion about "active characters versus encoding  
mappings", which I only vaguely understand the implications of):

Is it acceptable or advisable, as a temporary workaround to a problem  
I encountered, to make the following changes to a working copy of  
xunicode.sty (version 0.5 [2005/02/26]) in my "home" texmf tree, in  
order to get all the TIPA shortcuts working?  The following lines  
appeared in Terminal when I ran the "diff" command (which I confess  
to never having used before in my life until a few moments ago!) on  
the original file and my altered version of it:

702,703c702,703
<  \def 2{\textezh}%
<  \def 3{\textvarepsilon}%
---
 >  \def 2{\textturnv}%
 >  \def 3{\textrevepsilon}%NOT VAR
1374c1374
< %\DeclareUTFcharacter[\UTFencname]{x028A}{\textscupsilon}  % TIPA-U
---
 > \DeclareUTFcharacter[\UTFencname]{x028A}{\textscupsilon}  % TIPA-U
1391c1391
< \DeclareUTFcharacter[\UTFencname]{x0292}{\ezh}          % TIPA-Z
---
 > \DeclareUTFcharacter[\UTFencname]{x0292}{\textezh}          % TIPA-Z

REASONING:
With these changes in place I can (I think) access all the TIPA  
characters I need, in the argument of a \textipa{...} command[*but  
see footnote], via the "active characters" strategy, using the old- 
fashioned keyboarding habits that phoneticians are used to having to  
resort to when sending emails. I was a bit worried about the fact  
that the uppercase U as an active character had been commented out at  
line 1374 --- I thought it might have been done to avoid a nasty  
clash in some potential situation where U was already active and was  
being used for some other (more important) purpose.

*footnote:
One thing I've come to appreciate over the past few days of XeTeXing  
is that doing something like {\anyoldcommand ...} is often more  
robust than doing the corresponding \textanyoldcommand{...}. I  
discovered this while I was trying to find a way of typesetting  
phonetic transcriptions in colo(u)r---for teaching purposes, as  
otherwise the phonetic symbols seemed to blend in just a bit too  
seamlessly with the surrounding text when using the beautiful Gentium  
font. I had defined a colour called WSPRgreen (are those Will  
Robertson's initials, by any chance?), then found I had to rename it  
to wsprgreen if I wanted to use it _inside_ the argument of a \textipa 
{...} command, because all uppercase characters were active!  And  
when I tried
    \textcolor{WSPRgreen}{\textipa{...}}
it (of course?) caused me to lose the phonetic encoding, so I had to  
learn to write
    {\color{WSPRgreen} \textipa{...}}
instead. All highly educational. Just a thought: on page 13 of  
tipaman.pdf, Fukui Rei defines three ways of telling LaTeX that you  
want phonetics:
    \textipa{...}
    {\tipaencoding ...}
    \begin{IPA} ... \end{IPA}
Of these, only the first is implemented in xunicode.sty; would there  
be any point in trying to implement either the "argumentlesss-command- 
within-a-group" or the "environment" solution, instead of or in  
addition to the "command-with-argument" one? (Way out of my depth  
here, but FWIW...)

I hope the kind of old-fashioned keyboarding habits underlying both  
of my questions aren't too much of an annoyance to the project  
developers.  In the short time since I started using XeTeX I've  
realized that it's better to avoid anything that even remotely  
involves the fontenc, inputenc, and babel packages, and just type  
into your document the unicode characters you want to typeset,  
changing the keyboard layout as necessary and using the Keyboard  
Viewer to help train new keyboarding habits.  So far I've found I can  
do this well enough for switching between English, German, French,  
Russian, Hebrew, and Greek (although the LGR shortcuts described in  
9.4.2 of The LaTeX Companion, 2nd ed., were nicer), and for Chinese  
characters it's fairly easy to use the ITABC input method, but there  
doesn't seem to be a phonetics keyboard available with Mac OS X  
10.4.5 (and in any case, the solution would probably need to be more  
like one of the Chinese Input Methods, where pressing one or more  
keys gets you a list of relevant characters and you select the one  
you want).

My sincere thanks to everyone involved in the XeTeX project.  I  
really appreciate what you're doing, and will try to contribute in  
whatever ways I can.  Please bear with me until I have a bit more of  
a grasp of it all!  (BTW: the ability that fontspec.sty gives you to  
play around with so many---usually fairly tasteless---combinations of  
all those beautiful OSX system fonts makes me appreciate what an  
excellent job GTA did in selecting the default combinations for  
gtamacfonts, despite what one may or may not think about the best  
weight for Gill Sans with Hoefler Text in koma-script-style section  
headings... ;-)

-- Robert Spence
Applied Linguistics
Saarland University
Germany



More information about the XeTeX mailing list