[XeTeX] Non-searchable ligatures in PDF produced with xdvipdfmx
gabriel.sztorc at gmail.com
Sat Mar 3 00:09:36 CET 2012
I'm trying to produce a PDF using the Adobe Caslon Pro font. The PDF looks fine but ligatures can't be searched or copy-pasted. This problem disappears when using xdv2pdf (it breaks other things however). Could this be a bug in xdvipdfmx? Any ideas on how to make it work?
Or alternatively: how to make it work using xdv2pdf? The file is supposed to have an embedded .eps image but it doesn't show up in PDFs created with xdv2pdf.
I include some diagnostic information:
xdvipdfmx spits out the following relevant looking messages when run with the
pdf_font>> Input encoding "Identity-H" requires at least 2 bytes.
pdf_font>> The -m <00> option will be assumed for "ACaslonPro-Regular/H/65536/0/0".
pdf_font>> Type0 font "ACaslonPro-Regular/H/65536/0/0" cmap_id=<Identity-H,0> opened at font_id=<ACaslonPro-Regular/H/65536/0/0,0>.
><ACaslonPro-Regular(Adobe Caslon Pro:Regular)@24.79pt
fontmap: ACaslonPro-Regular/H/65536/0/0 -> ACaslonPro-Regular/H/65536/0/0(Identity-H)[map:<00>]
pdf_font>> Type0 font "ACaslonPro-Regular/H/65536/0/0" (cmap_id=0) found at font_id=0.
><ACaslonPro-Italic(Adobe Caslon Pro:Italic)@14.35pt<NATIVE-FONTMAP:ACaslonPro-Italic/H/65536/0/0>
fontmap: ACaslonPro-Italic/H/65536/0/0 -> ACaslonPro-Italic/H/65536/0/0(Identity-H)
(the above messages are repeated for every variant of the font)
And then there's:
otf_cmap>> Creating ToUnicode CMap for "ACaslonPro-Regular/H/65536/0/0"...
and quite a lot of messages of the form "No Unicode mapping available: GID=XXX, name=(none)"
I used qpdf to convert the PDFs created by xdvipdfmx and xdv2pdf to a human-readable (well, kind of) format and compare them. I noticed that most font objects in the file created by xdv2pdf mostly contain the entry '/Encoding /MacRomanEncoding' with only two containing '/Encoding' that points to an object. They are also the only two that contain a '/ToUnicode' entry. In the file created by xdvipdfmx the font objects have '/Encoding /Identity-H' and they all contain '/ToUnicode' entries. I don't know how the PDF format knows so that doesn't tell me all that much but it seems relevant.
More information about the XeTeX