[pdftex] ToUnicode map and virtual fonts

Werner LEMBERG wl at gnu.org
Thu Oct 5 21:04:55 CEST 2006

> ,--------
> | if the font is Zapf Dingbats (PostScript FontName
> | ZapfDingbats), and the component is in the ZapfDingbats
> | list, then map it to the corresponding character in that
> | list.
> `--------

Ah, yes, I forgot this one.

> and entries in texglyphlist.txt (from lcdf-typetools)?

Hmm, I don't have this.

> And at least some entries in AGL must be overwritten to make
> ligatures searchable. The AGL says that:
> ,--------
> | ff;FB00
> | ffi;FB03
> | ffl;FB04
> | fi;FB01
> `--------
> but I think they must be written as
> ,--------
> | \pdfglyphtounicode{ff}{00660066}
> | \pdfglyphtounicode{ffi}{006600660069}
> | \pdfglyphtounicode{ffl}{00660066006C}
> | \pdfglyphtounicode{fi}{00660069}
> `--------

Hmm.  There are a lot more of such beast in Unicode; just do

  grep '<compat>' UnicodeData.txt | less

to see the complete list of more than 600 entries.  In my opinion we
should stay with FB00 -- decomposing it to 00660066 looses more
information than necessary; I think it is the job of the PDF reader to
decompose this further in case it is necessary.

> > Only glyph names in OpenType fonts based on CFF and CID are worth to
> > support, I think.
> well, I am a little bit surprised to hear that from you since I
> think CJK fonts are mostly in TrueType fortmat.

Normally, those CJK fonts don't contain glyph names at all!  For this
case, pdftex should use the TrueType `cmap' table to reconstruct the
Unicode value -- at least for CJK fonts this works in most cases.

> Anyway, being a lazy person I will leave out TrueType support for
> now, unless someone asks for it.

The code for this would be quite complicated in case you want to do it
`right', namely, to keep track of the used OpenType tables (for
example, handling the `vert' feature in the GSUB table to map vertical
representation form glyphs back to Unicode).


More information about the pdftex mailing list