# [luatex] Problematic "tounicode" entries for ligatures in luatex-cache

Valery Yundin yuvalery at gmail.com
Tue Mar 10 00:34:27 CET 2015

Hi,

There seems to be a problem in how cache files for fonts are saved.

The font loader (?) writes "tounicode" entries as numbers if they do
not contain digits A-F. Which means that if such entry has leading
zeros, they will be lost, because Lua ignores leading zeros in numeric
literals. This leads to wrong CMap entries in PDF and
non-copy-pasteable glyphs. (Apparently this problem does not affect
single code point glyphs, probably because string format has a minimum
width of 4)

For example Carlito font has ligature for "ti", which doesn't have
standard unicode code point and has to be represented as a pair of "t"
and "i'.

The luatex generates the following cache file
\$HOME/.cache/texmf/fonts/luatex-cache/generic/fonts/otf/carlito-regular.lua:
["tounicode"]={
...
[2210]=00740069,
...

Note the missing quotes around 00740069. When we use Carlito again
luatex will read this cache file and leading zeroes will be ignored,
which will result in wrong CMap entry in PDF file:
<08A2> <740069>

While for instance another non-standard ligature "fj" has mapping
["tounicode"]={
...
[85]="0066006A",
...

Note quotes around due to hexadecimal digit "A". And CMap entry
<0055> <0066006A>

If I fix the cache file by hand and put quotes around 00740069 for
"ti" ligature, CMap in PDF will be correct for it too. Perhaps it
would make sense to write all "tounicode" values as strings.

Here is an example for lualatex:

\documentclass{article}
\usepackage{fontspec}
\setmainfont{Carlito}
\defaultfontfeatures{Ligatures=TeX}
\begin{document}
Beautiful fjord
\end{document}

First run (with empty font cache) will produce correct PDF file. While
all subsequent runs will mess up CMap for ligature "ti". You can check
it easily by running pdftotext on the resulting PDF.

With best regards, Valery Yundin.