documentation of \pdfgentounicode

Wed May 27 12:54:43 CEST 2020

When using \pdfgentounicode=1 without loading glyphtounicode.tex or
setting up some mapping with `\pdfglyphtounicode` one gets a warning

`pdfTeX warning: pdflatex-dev.exe: no GlyphToUnicode entry has been
inserted yet

and no cmap resource with ToUnicode entries in the pdf. That is what
I expected from the documentation.

Not quite expected was what happened when I added one nonsense
mapping:

\pdfglyphtounicode{xxxxx}{00B2}

In this case one get a cmap resources *and* this resources is
populated with lots of entries that pdftex seems to guess from the
font. 

Looking at texk/web2c/pdftexdir/tounicode.c I found that the rules
seem to be that 

 /* s is a multiple value of form "uniXXXX" */
 /* s matched an entry with numeric value in the
    database, or a value derived from "uXXXX" */

That is quite nice and means that for example the libertine math
font gets quite good tounicode values as it uses the uXXXX-syntax. 

Imho this should be better documented.

\documentclass{article}

\pdfcompresslevel=0
\pdfobjcompresslevel=0

%\pdfglyphtounicode{xxxxx}{00B2}
\pdfgentounicode=1

\DeclareFontFamily{OML}{nxlmi}{}
\DeclareFontShape{OML}{nxlmi}{m}{it}{<->nxlmi037}{}
\DeclareSymbolFont{letters}{OML}{nxlmi}{m}{it}

\begin{document}\pagestyle{empty}
 $a$  
\end{document} 

-- 
Ulrike Fischer 
http://www.troubleshooting-tex.de/