[pdftex] ToUnicode map and virtual fonts
Werner LEMBERG
wl at gnu.org
Thu Oct 5 21:04:55 CEST 2006
> ,--------
> | if the font is Zapf Dingbats (PostScript FontName
> | ZapfDingbats), and the component is in the ZapfDingbats
> | list, then map it to the corresponding character in that
> | list.
> `--------
Ah, yes, I forgot this one.
> and entries in texglyphlist.txt (from lcdf-typetools)?
Hmm, I don't have this.
> And at least some entries in AGL must be overwritten to make
> ligatures searchable. The AGL says that:
>
> ,--------
> | ff;FB00
> | ffi;FB03
> | ffl;FB04
> | fi;FB01
> `--------
>
> but I think they must be written as
>
> ,--------
> | \pdfglyphtounicode{ff}{00660066}
> | \pdfglyphtounicode{ffi}{006600660069}
> | \pdfglyphtounicode{ffl}{00660066006C}
> | \pdfglyphtounicode{fi}{00660069}
> `--------
Hmm. There are a lot more of such beast in Unicode; just do
grep '<compat>' UnicodeData.txt | less
to see the complete list of more than 600 entries. In my opinion we
should stay with FB00 -- decomposing it to 00660066 looses more
information than necessary; I think it is the job of the PDF reader to
decompose this further in case it is necessary.
> > Only glyph names in OpenType fonts based on CFF and CID are worth to
> > support, I think.
>
> well, I am a little bit surprised to hear that from you since I
> think CJK fonts are mostly in TrueType fortmat.
Normally, those CJK fonts don't contain glyph names at all! For this
case, pdftex should use the TrueType `cmap' table to reconstruct the
Unicode value -- at least for CJK fonts this works in most cases.
> Anyway, being a lazy person I will leave out TrueType support for
> now, unless someone asks for it.
The code for this would be quite complicated in case you want to do it
`right', namely, to keep track of the used OpenType tables (for
example, handling the `vert' feature in the GSUB table to map vertical
representation form glyphs back to Unicode).
Werner
More information about the pdftex
mailing list