[XeTeX] turn off special characters in PDF

Alexey Kryukov anagnost at yandex.ru
Tue Dec 31 07:20:00 CET 2013


On Mon, 30 Dec 2013 10:45:39 +1100
Ross Moore wrote:

> I've played a lot with this kind of thing, and think that this
> is the wrong approach. One should use /ActualText to provide
> the correct Unicode replacement, when one exists. Thus one
> can extract textual information reliably, even when the PDF
> uses legacy fonts that may not contain a /ToUnicode resource,
> or if that resource is inadequate in special situations.

Well, the /ActualText approach looks an overcomplication for me. I
think it is intended for very special cases, like treating the 'ck'
claster in the old German hyphenation rules. For typical ligatures it
is sufficient to produce a ToUnicode CMap entry mapping the ligature to
its source characters. That's what xetex (actually xdvipdfmx) actually
does... unless, as Khaled has correctly specified, the font maps its
substitution glyphs to PUA or has no glyph names.

And I don't fully understand your remark regarding legacy fonts that may
not contain a /ToUnicode resource, since it's up to the PDF generation
software (xdvipdfmx in our case) to produce such a resource.

-- 
Regards,
Alexey Kryukov <anagnost at yandex dot ru>

Moscow State University
Faculty of History


More information about the XeTeX mailing list