[XeTeX] XeTeX Digest, Vol 48, Issue 13

Sun Mar 9 14:04:28 CET 2008

On 9 Mar 2008, at 4:38 am, Barry MacKichan wrote:

> I'd like to express my support for this idea.
> Would it be worth considering an enhancement to fontspec so that we
> could write, e.g.,
> \setmainfont[UprightFont={Hebrew={Adobe Hebrew},CJK={NSimSun}}]{Adobe
> Garamond Pro]
> to choose 'Adobe Garamond Pro' as the main font, except in the Hebrew
> unicode area, in which case use 'Adobe Hebrew', and the CJK area, in
> which case use 'NSimSun'?
>
> I realize this is a significant amount of work, but it would free
> authors of multilingual works to choose the best available fonts for
> each language. The syntax would probably be different from my example,
> which I made up without much thought.

Hi Barry,

This is a suggestion/request that has come up several times, and I  
can certainly understand the attraction. Essentially, you're asking  
for a model with several simultaneous "current fonts" for different  
scripts, and an engine that chooses the appropriate one on a per- 
character basis.

However, a general solution to this is trickier than people think,  
IMO. The main problem I see is how to (reliably) deal with the  
characters classified as "script=Common" in Unicode -- primarily  
punctuation and symbols (see http://www.unicode.org/Public/UNIDATA/ 
Scripts.txt). Many of these are used in conjunction with a number of  
scripts (hence the "common" classification), but their exact style  
might need to change according to the script with which they're being  
used.

For example, Hebrew uses the same Unicode characters for parentheses,  
quote marks, question mark, etc., as Latin; but the actual font  
design might be different. So in a bilingual English + Hebrew  
document, punctuation that belongs to the English text should be  
rendered with the Latin-script font, but the same characters when  
used as part of a Hebrew run should be rendered with the Hebrew font.

In many cases, fairly simple heuristics could be used to choose a  
font for "common" characters based on the script of neighboring  
script-specific letters, but this will not always be right (and  
sometimes there may not be any "neighboring" letters available to the  
engine at the point where it needs to choose a font). So we could  
find ourselves in a situation where people assume they can mix  
scripts without providing markup, and the engine guesses right much  
of the time -- but sometimes makes inappropriate choices. Then people  
have to be alert to catch these in proof-reading, and know how to  
provide some kind of overriding hints for the edge cases. I'm not  
sure that is a good thing.

(I see this as closely related to the "font fallback" issue, where  
software automatically picks a different font if it encounters  
characters that can't be displayed in the default one in use. This is  
great in something like a plain-text editor, or a web browser, whose  
task is primarily to present the text so that it can be read --  
although the page author may have requested particular fonts, the  
precise appearance is secondary. But a typesetting program is a  
different matter. In this context, I want to specify exactly what  
fonts will be used for each piece of text, and I don't want the  
program using best-guess heuristics to make substitutions for me.)

Having said all this, I am interested in this kind of idea, and may  
try to implement something one day. But I think there are significant  
questions of the design and desirable behavior that need careful  
consideration; it's not as straightforward as it seems at first glance.

Oh, another point is that authors need to remember that in general,  
multilingual documents still require markup to identify language --  
otherwise things like hyphenation won't be handled correctly. And  
with something like English + Hebrew, controls for text direction and  
the embedding of different-direction runs are needed for proper  
layout. So in many cases, some kind of markup would still be needed  
at language changes -- in which case it's trivial to link a font  
change with this at the macro level.

JK