[XeTeX] xunicode.sty -- pinyin and TIPA shortcuts

Mon Apr 10 18:14:14 CEST 2006

Hi Ross,

You almost certainly had better things to do with your weekend than  
answering stupid beginner's questions like mine...  Sorry!

On 09 Apr 2006, at 03:47 , Ross Moore wrote:

> Since xunicode.sty already has:
>    \DeclareUTFcharacter[\UTFencname]{x0261}{\textscriptg}
>
> then all that is needed is:     \let\textg\textscriptg
> which can be done easily in a document preamble.
>
> Since tipa.sty has both:
>
>   \DeclareTextSymbolDefault\textscriptg{T3}
>   \DeclareTextSymbolDefault\textg{T3}              % Text G
>
> are you saying that they both actually represent the same concept,
> so are just variant print-forms ?

My fault---I'd completely failed to notice that xunicode.sty also had  
the \textscriptg command. That means xunicode.sty contains all the  
resources that tipa.sty did.  I think I must have been working on the  
basis of a slightly faulty memory of the tipa.sty commands.  Probably  
I misread page 34 of tipaman.pdf, and in any case I see in footnote 4  
on page 8 of that manual that the International Phonetic Association  
has apparently stopped worrying about the difference in the shapes of  
the glyphs anyway.  So it looks as though nowadays they _are_ just  
variant print-forms.  I think the whole problem only arose in the  
first place because Fukui Rei had to make a decision about how to  
design a new 8-bit encoding scheme for phonetics (T3) that would be  
as compatible as possible with T1.  (Now that we have Unicode, let's  
just hope we don't encounter any extra-terrestrial species who  
consider anything less than 128-bit or 256-bit encoding for fonts to  
be unspeakably primitive...)

>> And if you were inside the argument of a \textipa{...} command  
>> you'd have a choice: if you knew your font had a glyph of the  
>> right shape at x0067 you could just type g, otherwise you could  
>> type \textg as you'd be confident that you had a font containing  
>> all the glyphs for the phonetic characters up in that Unicode code- 
>> point range.

I should have said "\textscriptg" here, not "\textg".

> Ouch. This is mixing visual markup in with logical markup.
> I know it has a practicality, but for electronic documents
> it's a possible source of confusion.

I know---it's a nasty hangover from pre-Unicode days.  And partly  
it's a design fault in the IPA, which tries to reuse glyphs likely to  
be already familiar to users of European languages.  The designers  
(in the late nineteenth century, I think) had a long feud about  
whether j should stand for the sound it stands for in German, or the  
sound it stands for in French, for example.  It might have been  
better if they'd just invented completely new glyphs, instead of  
trying to reuse all the lowercase letters of the Latin alphabet and  
then scrounge around for the other symbols they needed.  They were  
obviously thinking in terms of the visual distinctiveness of shapes,  
because they chose _both_ print variants of Latin lowercase a (single- 
decker and double-decker), assigning them to the recognizably  
different sounds you would come out with depending on where exactly  
the doctor put the paddlepop stick on your tongue when he told you to  
say "aaaah".  Definitely not the sort of thing it's worth wasting too  
much time on, but I guess from a mathematician's point of view it's  
related to the problem of mapping discontinuous and continuous  
functions onto each other in a minimally "lossy" way... (i.e.: when- 
where-how-why-and-for-whom are two phenomena merely "insignificant  
variants of the same thing", and when-where-how-why-and-for-whom are  
they "typical representatives of two conceptually distinct  
categories"? --- the central problem of linguistics and semiotics.)

>> No doubt there's a dangerous double bend section somewhere in the  
>> Dirty Tricks chapter of the TeXbook that would help... but it's  
>> certainly a bit beyond anything _I'd_ dare play around with in  
>> TeX.  (Maybe you or Will or Jonathan might be more daring,  
>> though... ;-)
>
> I'm sure I could devise a way around this, if necessary.
> But I don't see the necessity, in this case -- at least
> not for  xunicode.sty  itself.
> Certainly if could go into a specialised add-on package,
> when a particular author really needed it, or to cope with
> a large number of legacy documents.

I was only kidding!  The user base is probably far too small to worry  
about such things...

> I'm glad someone finds these things useful.
> Personally I have no specific use, so would not even
> be able to tell whether it was all done correctly.

I'm not sure that there _is_ a correct way of doing it, given that  
you're trying to satisfy so many conflicting demands at once here.   
But I'll certainly practise using the xunicode.sty macros, and  
comment on any issues that come up.

Probably I haven't yet really grasped the full implications of  
xunicode.sty at all, and I still don't understand the overall program  
design parameters for allowing old-style font encodings to be mixed  
together with a unified Unicode encoding in one and the same  
document.  XeTeX seems to already know about encodings like OT1 and  
OML and so on when it starts up (or at least, that's what I deduce  
from the error messages I get when I make a blooper with e.g.  
fontspec.sty's \setromanfont in the preamble).  And old-style command  
sequences like
{\fontencoding{T1}\fontfamily{cmr}\fontseries{m}\fontshape{n}\fontsize 
{10.95}{12}\selectfont This is Computer Modern Roman.}
also seem to work just fine, but anything to do with encodings like  
T3 or LGR fails without fontenc.sty, and I even managed a couple of  
times to get XeTeX calling Metafont to try to generate pk fonts from  
non-existent .tfm files for Chinese fonts (!), so I'm trying to learn  
to work with just _one_ encoding now.

Thanks again for your time,

-- Rob Spence