# documentation of \pdfgentounicode

Ulrike Fischer news3 at nililand.de
Wed May 27 12:54:43 CEST 2020

When using \pdfgentounicode=1 without loading glyphtounicode.tex or
setting up some mapping with \pdfglyphtounicode one gets a warning

pdfTeX warning: pdflatex-dev.exe: no GlyphToUnicode entry has been
inserted yet

and no cmap resource with ToUnicode entries in the pdf. That is what
I expected from the documentation.

Not quite expected was what happened when I added one nonsense
mapping:

\pdfglyphtounicode{xxxxx}{00B2}

In this case one get a cmap resources *and* this resources is
populated with lots of entries that pdftex seems to guess from the
font.

Looking at texk/web2c/pdftexdir/tounicode.c I found that the rules
seem to be that

/* s is a multiple value of form "uniXXXX" */
/* s matched an entry with numeric value in the
database, or a value derived from "uXXXX" */

That is quite nice and means that for example the libertine math
font gets quite good tounicode values as it uses the uXXXX-syntax.

Imho this should be better documented.

\documentclass{article}

\pdfcompresslevel=0
\pdfobjcompresslevel=0

%\pdfglyphtounicode{xxxxx}{00B2}
\pdfgentounicode=1

\DeclareFontFamily{OML}{nxlmi}{}
\DeclareFontShape{OML}{nxlmi}{m}{it}{<->nxlmi037}{}
\DeclareSymbolFont{letters}{OML}{nxlmi}{m}{it}

\begin{document}\pagestyle{empty}
$a$
\end{document}

--
Ulrike Fischer
http://www.troubleshooting-tex.de/

`