[XeTeX] XeTeX Digest, Vol 48, Issue 13
Jonathan Kew
jonathan_kew at sil.org
Sun Mar 9 14:04:28 CET 2008
On 9 Mar 2008, at 4:38 am, Barry MacKichan wrote:
> I'd like to express my support for this idea.
> Would it be worth considering an enhancement to fontspec so that we
> could write, e.g.,
> \setmainfont[UprightFont={Hebrew={Adobe Hebrew},CJK={NSimSun}}]{Adobe
> Garamond Pro]
> to choose 'Adobe Garamond Pro' as the main font, except in the Hebrew
> unicode area, in which case use 'Adobe Hebrew', and the CJK area, in
> which case use 'NSimSun'?
>
> I realize this is a significant amount of work, but it would free
> authors of multilingual works to choose the best available fonts for
> each language. The syntax would probably be different from my example,
> which I made up without much thought.
Hi Barry,
This is a suggestion/request that has come up several times, and I
can certainly understand the attraction. Essentially, you're asking
for a model with several simultaneous "current fonts" for different
scripts, and an engine that chooses the appropriate one on a per-
character basis.
However, a general solution to this is trickier than people think,
IMO. The main problem I see is how to (reliably) deal with the
characters classified as "script=Common" in Unicode -- primarily
punctuation and symbols (see http://www.unicode.org/Public/UNIDATA/
Scripts.txt). Many of these are used in conjunction with a number of
scripts (hence the "common" classification), but their exact style
might need to change according to the script with which they're being
used.
For example, Hebrew uses the same Unicode characters for parentheses,
quote marks, question mark, etc., as Latin; but the actual font
design might be different. So in a bilingual English + Hebrew
document, punctuation that belongs to the English text should be
rendered with the Latin-script font, but the same characters when
used as part of a Hebrew run should be rendered with the Hebrew font.
In many cases, fairly simple heuristics could be used to choose a
font for "common" characters based on the script of neighboring
script-specific letters, but this will not always be right (and
sometimes there may not be any "neighboring" letters available to the
engine at the point where it needs to choose a font). So we could
find ourselves in a situation where people assume they can mix
scripts without providing markup, and the engine guesses right much
of the time -- but sometimes makes inappropriate choices. Then people
have to be alert to catch these in proof-reading, and know how to
provide some kind of overriding hints for the edge cases. I'm not
sure that is a good thing.
(I see this as closely related to the "font fallback" issue, where
software automatically picks a different font if it encounters
characters that can't be displayed in the default one in use. This is
great in something like a plain-text editor, or a web browser, whose
task is primarily to present the text so that it can be read --
although the page author may have requested particular fonts, the
precise appearance is secondary. But a typesetting program is a
different matter. In this context, I want to specify exactly what
fonts will be used for each piece of text, and I don't want the
program using best-guess heuristics to make substitutions for me.)
Having said all this, I am interested in this kind of idea, and may
try to implement something one day. But I think there are significant
questions of the design and desirable behavior that need careful
consideration; it's not as straightforward as it seems at first glance.
Oh, another point is that authors need to remember that in general,
multilingual documents still require markup to identify language --
otherwise things like hyphenation won't be handled correctly. And
with something like English + Hebrew, controls for text direction and
the embedding of different-direction runs are needed for proper
layout. So in many cases, some kind of markup would still be needed
at language changes -- in which case it's trivial to link a font
change with this at the macro level.
JK
More information about the XeTeX
mailing list