[XeTeX] xunicode.sty -- pinyin and TIPA shortcuts
Robert Spence
spence at saar.de
Tue Apr 4 17:49:17 CEST 2006
Dear Jonathan,
Thanks for your prompt reply, and sorry to distract you from more
interesting tasks such as typesetting Chaldean ;-)
> On 4 Apr 2006, at 4:54 am, Robert Spence wrote:
>>
>> Might it perhaps be an idea in a future version of xunicode.sty to
>> change line 1131 of version 0.5 [2005/02/26] from
>> \DeclareUTFcomposite[\UTFencname]{x01DA}{\v}{\"u}
>> to
>> \DeclareUTFcomposite[\UTFencname]{x01DA}{\v}{v}
>> ?
> I don't think this would be appropriate in a general package such as
> xunicode; the sequence \v{v} would normally be expected to produce a
> v with caron. Mapping it to u-dieresis-caron might be a useful
> convenience for Pinyin, but it's specific to that particular usage,
> and doesn't belong in a generic package.
You're right, of course; it was a silly idea of mine. I'd never
actually seen a font that has a glyph for v with caron in it, but of
course that's not the point. There's bound to be some obscure
context in which someone would need to put a caron on a v --- maybe
there's a tonal language somewhere that the SIL literacy people
haven't got around to working on yet that can have voiced fricatives
as tone-carrying syllable nuclei, for example.
>> b) I can't seem to access LATIN SMALL LETTER U WITH DIAERESIS AND
>> CARON (Unicode 01DA) via the current shortcut \v{\"u} (which puts
>> the caron _after_ the dieresised u), and am too lazy to type
>> \textdieresiscaron{u} each time, but I discovered that if I put
>> \DeclareUTFcomposite[\UTFencname]{x01DA}{\v}{v}
>> in the preamble (or even in the body) of my document then I can use
>> the shortcut \v{v} without any problems. (Am I missing something
>> with \v{\"u}? I tried just about every other variation I could
>> think of, but to no avail.)
> Interesting. \v{\"u} works fine for me with some fonts -- e.g.,
> Lucida Grande, Charis SIL -- but not others; it seems that it depends
> on the level of Unicode support in the font. In particular, it
> doesn't work with the Latin characters in OS X Chinese fonts(!).
> Have you considered using a Latin font with good Unicode support for
> your Pinyin? It looks like that would work.
Thanks for the tip. I'll check it out. I'm struggling to try to
find an appropriate compromise between having lots of typefaces
available (to make it clear what each portion of typeset text is
actually "doing" --- classic ontological problem of linguists,
writers of documentation for computer programs, etc.) and conforming
to the traditional aesthetic wisdom of not mixing too many fonts in
one document. I was a bit frightened at first by the wide line-
spacing of Charis SIL, which seemed to make it somewhat impractical
for inline font-switching. Then while trying to find a quick and
nasty way of transcribing English intonation patterns (misusing
accent and other diacritics on the vowels of syllables that were
carrying the nucleus of an intonation contour, instead of using a
separate symbolic or iconic representation), I realized that the font
I was working with didn't contain all the combinations of vowel
letters and diacritics I needed, so I started using Charis in
minipage environments instead.
One of my purposes in experimenting with pinyin is to persuade my
Chinese teacher to switch to XeTeX. She works with Mac OS X, has had
no end of problems with glyphs "disappearing" due to different
versions of Word, has done old-fashioned ("real") typesetting with a
high level of proficiency, and is currently working with a kind of
cut-and-paste solution for pinyin that ends up mixing different OS X
system fonts in an impossible way. But her course notes are so good
that they really deserve to be properly typeset and published as a book.
> I'm not sure *why* it's behaving this way.... maybe Ross would
> understand exactly what characters xunicode is trying to access in
> each case.
Does it perhaps have something to do with surrogates? I don't know
enough about Unicode, I'm afraid.
> It might make good sense to have a little "unicode-pinyin" package
> that gives you more convenient ways to access characters used in
> Pinyin transcription, such as using v as a shorthand for u-dieresis.
> It just doesn't belong in xunicode.sty, IMO.
Point taken. I was thinking of writing something like that, a kind
of patch to run after loading xunicode.sty---just hadn't got around
to it, and still don't understand all the implications of what's in
xunicode.sty anyway.
>> (...) there doesn't seem to be a phonetics keyboard available
>> with Mac OS X 10.4.5 (and in any case, the solution would probably
>> need to be more like one of the Chinese Input Methods, where
>> pressing one or more keys gets you a list of relevant characters
>> and you select the one you want).
> There are some IPA keyboard layouts around, though OS X doesn't ship
> with one as standard; if you want to look at some options, check
> http://scripts.sil.org/InputResources for links (see under "Mac
> Unicode keyboard layouts").
Thanks for the pointer---I'll check them out. (SIL comes to the
rescue yet again! BTW, it was one of your people who first
introduced me to linguistics, way back in 1975, and looking back, it
was the best introduction one could possibly have had at the time.
It's sad that Kenneth Pike's Tagmemics appears to be dying out as a
metalanguage...)
In connection with direct unicode keyboarding I still have a general
philosophical problem about not having some kind of easily accessible
"metadata" in publicly distributed files that would make it easier
for corpus linguists to write search routines to extract just the
"relevant" bits of text for their purposes. I've always conceived of
this in terms of classical markup---like having standardized TEI-
style XML tags (or the opening { and closing } at the end of e.g. a
\foreignlanguage{german}{...} command using babel.sty) that would
make it easier for machines to "ignore the next bit, because it's not
in the language we're collecting data about". I haven't had time to
read whatever has been written about this already in the list
archive, so I won't waste time by trying to go into this point any
further here. But if there's one thing I've learnt in the course of
trying to become more of an "algorithmic" type of thinker, it's that
it's usually better, right at the start of a project, to define not
just more variables than you think you'll need, but also more
"layers" of variables. It seems to me that there are _other_
purposes---besides just applying the right set of hyphenation
patterns, or lining up diacritics the proper Vietnamese way, for
example---for which it might be nice to have a more explicit approach
to language-switching. Maybe most of this could be done in future by
using the language attributes of fonts, but it still seems to me that
there's a fundamental conceptual difference involved---like the one
that tripped me up with the "meaning" of caron and v. I guess the
human mind hasn't had time to recover yet from the degree of
multimodal complexity that came with the invention of writing. I
mean, mapping a visual semiotic system onto an auditory one was a
pretty complicated and daring thing to do, when you come to think
about it...
Thanks again for your time,
--- Robert Spence
Applied Linguistics
Saarland University
Germany
More information about the XeTeX
mailing list