[XeTeX] Stylistic sets: search & copy-paste

Alexey Kryukov anagnost at yandex.ru
Tue Jan 12 12:35:29 CET 2010


On Tue, 12 Jan 2010 10:54:18 +0100
Peter Dyballa wrote:

> This would just contain that some many regions
> are unchanged Unicode but this, that, and some other sequence of
> code points encode for other characters.

AFAIK that's not the way PDF CMAP works. In order to get a searchable
PDF you need to have an explicitly specified Unicode codepoint for
every glyph present in the font. So if you replace the CMAP
automatically generated by XeTeX/xdvipdfmx with your own custom
version (containing just a few mappings), you'll just break everything.

Moreover, PDF CMAPS (just like TTF cmap tables) are completely
font-specific: they map glyph IDs to Unicode codepoints. That's not a
problem for pdftex which normally deals with just a few standard 8-bit
encodings, but would be a problem for XeTeX, as every Unicode font has
its own glyph set (Unicode fonts are also just too large to prepare the
appropriate mapping manually).

For this reason using the CMAP package with XeTeX seems unpractical (in
fact it just would not work, as cmap.sty requires pdftex and assumes
fontenc is loaded), so I don't understand why do you recommend it.

-- 
Regards,
Alexey Kryukov <anagnost at yandex dot ru>

Moscow State University
Historical Faculty


More information about the XeTeX mailing list