[XeTeX] xunicode.sty -- pinyin and TIPA shortcuts

Robert Spence spence at saar.de
Tue Apr 4 17:49:17 CEST 2006

Dear Jonathan,

Thanks for your prompt reply, and sorry to distract you from more  
interesting tasks such as typesetting Chaldean ;-)

 > On 4 Apr 2006, at 4:54 am, Robert Spence wrote:
 >> Might it perhaps be an idea in a future version of xunicode.sty to
 >> change line 1131 of version 0.5 [2005/02/26] from
 >>   \DeclareUTFcomposite[\UTFencname]{x01DA}{\v}{\"u}
 >> to
 >>   \DeclareUTFcomposite[\UTFencname]{x01DA}{\v}{v}
 >> ?

 > I don't think this would be appropriate in a general package such as
 > xunicode; the sequence \v{v} would normally be expected to produce a
 > v with caron. Mapping it to u-dieresis-caron might be a useful
 > convenience for Pinyin, but it's specific to that particular usage,
 > and doesn't belong in a generic package.

You're right, of course;  it was a silly idea of mine.  I'd never  
actually seen a font that has a glyph for v with caron in it, but of  
course that's not the point.  There's bound to be some obscure  
context in which someone would need to put a caron on a v --- maybe  
there's a tonal language somewhere that the SIL literacy people  
haven't got around to working on yet that can have voiced fricatives  
as tone-carrying syllable nuclei, for example.

 >> b) I can't seem to access LATIN SMALL LETTER U WITH DIAERESIS AND
 >> CARON (Unicode 01DA) via the current shortcut \v{\"u} (which puts
 >> the caron _after_ the dieresised u), and am too lazy to type
 >> \textdieresiscaron{u} each time, but I discovered that if I put
 >>   \DeclareUTFcomposite[\UTFencname]{x01DA}{\v}{v}
 >> in the preamble (or even in the body) of my document then I can use
 >> the shortcut \v{v} without any problems.  (Am I missing something
 >> with \v{\"u}?  I tried just about every other variation I could
 >> think of, but to no avail.)

 > Interesting. \v{\"u} works fine for me with some fonts -- e.g.,
 > Lucida Grande, Charis SIL -- but not others; it seems that it depends
 > on the level of Unicode support in the font. In particular, it
 > doesn't work with the Latin characters in OS X Chinese fonts(!).

 > Have you considered using a Latin font with good Unicode support for
 > your Pinyin? It looks like that would work.

Thanks for the tip.  I'll check it out.  I'm struggling to try to  
find an appropriate compromise between having lots of typefaces  
available (to make it clear what each portion of typeset text is  
actually "doing" --- classic ontological problem of linguists,  
writers of documentation for computer programs, etc.) and conforming  
to the traditional aesthetic wisdom of not mixing too many fonts in  
one document.  I was a bit frightened at first by the wide line- 
spacing of Charis SIL, which seemed to make it somewhat impractical  
for inline font-switching.  Then while trying to find a quick and  
nasty way of transcribing English intonation patterns (misusing  
accent and other diacritics on the vowels of syllables that were  
carrying the nucleus of an intonation contour, instead of using a  
separate symbolic or iconic representation), I realized that the font  
I was working with didn't contain all the combinations of vowel  
letters and diacritics I needed, so I started using Charis in  
minipage environments instead.

One of my purposes in experimenting with pinyin is to persuade my  
Chinese teacher to switch to XeTeX.  She works with Mac OS X, has had  
no end of problems with glyphs "disappearing" due to different  
versions of Word, has done old-fashioned ("real") typesetting with a  
high level of proficiency, and is currently working with a kind of  
cut-and-paste solution for pinyin that ends up mixing different OS X  
system fonts in an impossible way.  But her course notes are so good  
that they really deserve to be properly typeset and published as a book.

 > I'm not sure *why* it's behaving this way.... maybe Ross would
 > understand exactly what characters xunicode is trying to access in
 > each case.

Does it perhaps have something to do with surrogates?  I don't know  
enough about Unicode, I'm afraid.

 > It might make good sense to have a little "unicode-pinyin" package
 > that gives you more convenient ways to access characters used in
 > Pinyin transcription, such as using v as a shorthand for u-dieresis.
 > It just doesn't belong in xunicode.sty, IMO.

Point taken.  I was thinking of writing something like that, a kind  
of patch to run after loading xunicode.sty---just hadn't got around  
to it, and still don't understand all the implications of what's in  
xunicode.sty anyway.

 >> (...) there doesn't seem to be a phonetics keyboard available
 >> with Mac OS X 10.4.5 (and in any case, the solution would probably
 >> need to be more like one of the Chinese Input Methods, where
 >> pressing one or more keys gets you a list of relevant characters
 >> and you select the one you want).

 > There are some IPA keyboard layouts around, though OS X doesn't ship
 > with one as standard; if you want to look at some options, check
 > http://scripts.sil.org/InputResources for links (see under "Mac
 > Unicode keyboard layouts").

Thanks for the pointer---I'll check them out.  (SIL comes to the  
rescue yet again!  BTW, it was one of your people who first  
introduced me to linguistics, way back in 1975, and looking back, it  
was the best introduction one could possibly have had at the time.   
It's sad that Kenneth Pike's Tagmemics appears to be dying out as a  

In connection with direct unicode keyboarding I still have a general  
philosophical problem about not having some kind of easily accessible  
"metadata" in publicly distributed files that would make it easier  
for corpus linguists to write search routines to extract just the  
"relevant" bits of text for their purposes.  I've always conceived of  
this in terms of classical markup---like having standardized TEI- 
style XML tags (or the opening { and closing } at the end of e.g. a  
\foreignlanguage{german}{...} command using babel.sty) that would  
make it easier for machines to "ignore the next bit, because it's not  
in the language we're collecting data about". I haven't had time to  
read whatever has been written about this already in the list  
archive, so I won't waste time by trying to go into this point any  
further here.  But if there's one thing I've learnt in the course of  
trying to become more of an "algorithmic" type of thinker, it's that  
it's usually better, right at the start of a project, to define not  
just more variables than you think you'll need, but also more  
"layers" of variables.  It seems to me that there are _other_  
purposes---besides just applying the right set of hyphenation  
patterns, or lining up diacritics the proper Vietnamese way, for  
example---for which it might be nice to have a more explicit approach  
to language-switching.  Maybe most of this could be done in future by  
using the language attributes of fonts, but it still seems to me that  
there's a fundamental conceptual difference involved---like the one  
that tripped me up with the "meaning" of caron and v.  I guess the  
human mind hasn't had time to recover yet from the degree of  
multimodal complexity that came with the invention of writing. I  
mean, mapping a visual semiotic system onto an auditory one was a  
pretty complicated and daring thing to do, when you come to think  
about it...

Thanks again for your time,

--- Robert Spence
Applied Linguistics
Saarland University

More information about the XeTeX mailing list