[XeTeX] Re: Unicode/font mixing

Tue Jan 25 12:44:16 CET 2005

On 25 Jan 2005, at 11:08 am, Simon Spiegel wrote:

> Thank you for your answer. Maybe this is a naive question, but how are 
> people supposed to deal with this sitation. I thought one of the main 
> points of Unicode is that you don't have to change encoding all the 
> time.

You don't have to change *encoding*; but you may still need to change 
*font*, as there is no font that supports every character in Unicode.

>  If I have to manually change the used font, everytime I use a 
> non-Latin glyph, I don't see much advantage?

Yes, for non-Latin characters, you may well need to change fonts 
(depending on your choice of typefaces). Some typefaces may cover 
several scripts (e.g., Latin/Greek/Cyrillic), so if you have a document 
that mixes these scripts, it may be appropriate to choose such a 
typeface.

The "advantage" of Unicode isn't so much that it lets you forget about 
changing fonts as that it means your actual text data can be in a 
standard, documented, interoperable encoding, rather than in a mixture 
of more- or less-well-understood legacy encodings and/or special 
control sequences, where a given byte value means one thing in one word 
and an entirely different thing in another word; the data cannot be 
reliably understood without knowledge of the specific fonts used to 
render it. With Unicode, the meaning of the characters is 
unambiguous--even in the absence of any font that can render them!

>  Or do I have just have restrict myself to the few fonts which support 
> a great of Unicode glyphs if I want XeLaTeX to behave this way? One of 
> the reason I'm asking this is because BibDesk is getting unicode 
> support, which would allow me to enter non-Latin characters directly 
> into bibTeX files (bibtex is kind of encoding agnostic). How are other 
> people handling this situation?

At the moment, I typically mark fragments in non-Latin languages using 
shorthand commands created to suit the occasion; e.g., {\ar some Arabic 
text} or {\dn Devanagari}. But I'm not a LaTeX user (much), nor a 
BibDesk user (at all), so I'm not the one to comment on the best 
approach there.

There have been requests in the past for a way to, in effect, declare 
several "current fonts" each covering a different Unicode range, so 
that mixed-script text wouldn't require explicit font changes. This is 
an interesting possibility, but coming up with a design that would 
reliably do "the right thing", especially with characters such as 
punctuation or numerals that may be "shared" between scripts, is not a 
trivial thing.

JK