[XeTeX] fontspec and hyperref

Michiel Kamermans pomax at nihongoresources.com
Thu Sep 24 06:49:18 CEST 2009

Jonathan Kew wrote:
> (It might be worth looking into the polyglossia package to manage 
> language/script switching, but even without that, I think this should 
> help.)

The problem is that I need switch within language/script. I need some 
way to be able to indicate that Chinese glyph A is in extension block A 
and thus needs font X, while Chinese glyph B from extension block B 
needs to use font Y, simply because these blocks are so vast there are 
no good looking unified fonts for them.

So far, it seems a toss-up between using a script to check which glyph 
is from which unicode block and prepend with the appropriate fontspec 
code where needed on the one hand, and relying on interchartoks because 
I'm using CJK and then explicitly adding fontspec codes where I need to 
override what the interchartoks behaviour is on the other.

Sadly, the first approach just breaks hyperref. While your suggestion of 
defining new font families and then using their shorthand command works 
in terms of no longer having xetex generate a terminal error, hyperref 
still "throws away" the font change instructions, and so any text that 
lacks glyphs in Palatino Linotype ends up either missing or using 
'unknown' glyphs.

The second approach sadly also fails, because there doesn't seem to be a 
way to dynamically turn interchartok processing on/off, which is a 
problem because it looks like the interchartok behaviour is suppressing 
fontspec instructions. If I issue a font change with fontspec, the 
interchartoks instructions seem to win from the fontspec instruction, 
and text that was supposed to be using font X according to the fontspec 
command I issued actually ends up using the font that I told 
interchartoks to use when going from class 0/255 to 1/2/3, or vice versa 
(if there is a way to dynamically toggle interchartoks behaviour on/off, 
then this might be the a solution for my problem, but I couldn't find 
any mention of it after googling for anything interchartoks related)

It might sounds like an obscure problem, but for CJK there's a very real 
problem in that even though the CJK languages all use the same unicode 
glyphs, glyphs can look radically different between the various 
languages. This makes it impossible to use, for instance, a more 
complete Chinese font when the text also contains Japanese, because a 
lot of the glyphs will be plain wrong. So, essentially, neither (plain) 
interchartoks, nor polyglossia, nor exCJK is quite specific enough from 
what I can tell from the documentation on each of them. None of them 
seem to offer me the ability to say "for glyph ..., use font ..." or 
even a more general "for unicode block ..., use font ...". Relying on 
being able to only specify behaviour for language, or script, is simply 
not good enough: unicode (and consequently, fonts that implement parts 
of it) is divided into more specific categories, multiple of which can 
be used by a single script or language =(

To make this problem more obvious, a bit from the material I'm working on:

stroke drawing order examples

㇄ top to bottom, then left to right, as one stroke 兦, 山
㇅ left to right, then top to bottom, then left to right 凹
㇇ left to right, then a hook curving down left 水
㇆ left to right, then top to bottom with a serif to the upper left 刀, 方
�� left to right, then top to bottom 囗
乚 top to bottom, then left to right with a serif upward at the end 礼
乁 top left to right, then curving down right with an upward serif at 
the end 虱,丮

In unicode aware email clients, the above data shouldn't really pose any 
problems except perhaps the character 囗.

This seems reasonably innocent data, but I'm wracking my brain on how to 
typeset this in a way that doesn't lead to at best a character not being 
drawn, and at worst causing xetex to throw up a terminal error.

On the first line, the first character is from the unicode CJK STROKES 
block, the second character is from CJK UNIFIED IDEOGRAPHS, but is rare, 
and thus not found in most Japanese fonts (thus requiring an explicit 
fontchange), and the third is also from the CJK UNIFIED IDEOGRAPHS 
block, but common and found in any Japanese font. The example character 
for the fifth stroke is one from CJK UNIFIED IDEOGRAPHS EXTENSION B, and 
isn't even found in most "complete" Japanese, Chinese or even "Unicode" 
fonts like Code2000/1/2 or Arial Unicode, instead being only found in 
special fonts that implement this particular extension block, because 
it's huge (containig close to 43,000 glyphs). However, *all* of these 
glyphs would fall under the "CJK", "Chinese", as well as "Japanese" 
language/script headers... so this really is a very serious problem for me.

If there are alternative ways to do what I would like to do, then I'll 
gladly use those instead, but by now I'm kind of running out of ideas on 
how to get around the lack of being able to rely on font linking (of the 
explicit, user indicated link order type) to get glyphs that are missing 
in one font being substituted for by glyphs from another font.

A last ditch attempt would be to create a huge list of 
XeTeXinterchartoks definitions for all the various characters that fall 
under CJK, but with over 50,000 characters to tag, this would be 
maddness O_o

- Mike

More information about the XeTeX mailing list