[XeTeX] xunicode.sty -- pinyin and TIPA shortcuts

Thu Apr 6 23:03:25 CEST 2006

Dear Ross,

On 05 Apr 2006, at 01:36 , Ross Moore wrote:

> It's quite possible that  xunicode.sty  has some errors, where
> I didn't make the best choice. It's up to the experts in appropriate
> fields to inform me (or Jonathan) of any such mistakes.

There might be a couple of mistakes in the implementation of the TIPA  
commands, but I haven't checked them all systematically yet, only the  
ones I need for doing basic transcriptions for English.  I haven't  
used TIPA a great deal since I stopped teaching phonetics classes  
regularly, but if I can make some time I'll go through everything in  
the tipa.sty package and try to get a better overview of what's  
currently implemented in xunicode.sty, and what might perhaps have to  
be done in an additional separate package, if it can be done easily  
at all.  The user base is probably still a bit small, but I assume  
that it won't be too much longer before someone does something  
similar to (and hopefully compatible with) XeTeX on another platform.

> (Jonathan Kew:)
>> > It might make good sense to have a little "unicode-pinyin" package
>> > that gives you more convenient ways to access characters used in
>> > Pinyin transcription, such as using v as a shorthand for u- 
>> dieresis.
>> > It just doesn't belong in xunicode.sty, IMO.
>>
>> Point taken.  I was thinking of writing something like that, a  
>> kind of patch to run after loading xunicode.sty---just hadn't got  
>> around to it, and still don't understand all the implications of  
>> what's in xunicode.sty anyway.
>
> Yes. Any variation that reflects a particular usage that
> need not be 'Universal' should be done by loading a package
> *after*  xunicode.sty  itself.
>
> By all means, use the same commands that  xunicode.sty  uses
> to declare the associations between macros and Unicode points.
>
> Examples of the main commands are:
>
> \DeclareUTFcharacter[\UTFencname]{x01CC}{\nj}
> \DeclareUTFcomposite[\UTFencname]{x01CD}{\v}{A}
> \DeclareEncodedCompositeCharacter{\UTFencname}{\u}{0306}{02D8}  %  
> Combining breve
> \DeclareEncodedCompositeAccents{\UTFencname}{\texthookcircum}{0309} 
> {0302}
>
> Note that these commands setup associations that depend on the  
> encoding,
> via the  [\UTFencname]  optional parameter.
>
>
> There is also  \UndeclareUTFcharacter  and  \UndeclareUTFcomposite   
> which
> can be used to cancel declarations when the code-points are not  
> supported
> in the font being used.
>
> e.g.
> \UndeclareUTFcomposite[Pin]{x01DA}{\v}{\"u}
>
> would allow  u\char"0308\char"030C  to be used instead of   
> \char"01DA ,
> when using encoding  Pin .
>
>
> More generally,
>    \UndeclareUTFcharacter[Pin]{x01CC}{\nj}
>
> would mean that (Xe)LaTeX might throw up a warning message when \nj
> is used with encoding 'Pin' , rather than inserting  \char"01CC
> assuming (wrongly) that there is support in the font for it.
> Without the  \Undeclare...  the only way to know that there's a  
> possible
> problem is to notice characters missing from the final PDF output.
>
>
> Implicit here is that, if you are going to make non-universal  
> declarations,
> then it's a good idea to:
>   1.  *change the encoding* first
>   2.  load the  xunicode.sty  package
>   3.  make your changes in the encoding
>
> e.g.
>  \newcommand{\UTFencname}{Pin}
>  \usepackage{xunicode}
>  \usepackage{unicode-pinyin}

Ah, now I understand!  When I first saw the lines in xunicode.sty  
about changing the encoding I didn't have enough background knowledge  
to interpret them properly.

This could be a real Pandora's Box... you could end up with hundreds  
of special, local encodings (like the Pin encoding for unicode-pinyin  
you suggest).  Although I probably already have the skills I'd need  
to reimplement Werner Lemberg's macros from pinyin.sty for unicode  
fonts---and it would be a lot easier with xunicode.sty syntax than  
the way he had to do it---I think it would be better to get some  
feedback from sinologists first about their workflows, and maybe  
check with the Chinese ministry of education (or whoever is  
ultimately responsible) about the official standards.

I'm pretty sure the Chinese government prefers the upright italic  
shape for lowercase a and g in pinyin, but I'm not sure how strict  
this is.  There's an English-to-Chinese dictionary from Singapore  
printed in a font that doesn't have upright italic glyph shapes, and  
the publishers were obviously so against the idea of using the normal  
roman a in pinyin transcriptions that they decided to use the _real_  
italic a, in the middle of text set in roman!  A typographic  
nightmare!  On the other hand, they don't make a fuss about the g at  
all, and just use the roman (pair-of-spectacles-rotated-ninety- 
degrees) shape. It may not be so much of an issue any more now that  
the PRC has decided to go with simplified characters rather than  
abolishing them in favour of Pinyin romanization, but given the sheer  
size of the speech community involved...

Thanks for your time,

-- Rob Spence