[luatex] Wrong "tounicode" values (FFFD) with luaotfload-3.0 (Was: Oldstyle numerals differ in PDFs created with lualatex from TL2018 and 2019)

Nikola Lečić nikola.lecic at anthesphoria.net
Sat Oct 26 17:47:35 CEST 2019


On Fri, 25 Oct 2019 20:54:53 +0300
Nikola Lečić <nikola.lecic at anthesphoria.net> wrote:

> On Thu, 24 Oct 2019 01:57:04 +0300
> Nikola Lečić <nikola.lecic at anthesphoria.net> wrote:
> ... in other words, why 'abc123' generated by luatex-1.10.0 (TL2019)
> looks like this:
> 
> 6 beginbfchar
> <0042> <0061>
> <0043> <0062>
> <0044> <0063>
> <0874> <FFFD>
> <0875> <FFFD>
> <0876> <FFFD>
> endbfchar
> 
> and luatex-1.07.0 (TL2018) used to look like this:
> 
> 6 beginbfchar
> <0042> <0061>
> <0043> <0062>
> <0044> <0063>
> <0874> <F731>
> <0875> <F732>
> <0876> <F733>
> endbfchar

To reduce the problem even further.
If you run this code (from luaotfload manual):

\input luaotfload.sty
\directlua {
    local dumpfile = "fontdump.lua"
    local dump_font = function (tfmdata)
        local data = table.serialize(tfmdata)
        io.savedata(dumpfile, data)
    end
luatexbase.add_to_callback(
    "luaotfload.patch_font",
    dump_font,
    "my_private_callbacks.dump_font"
    )
}
\font \dumpme = name:CMUSerif-Roman
\bye

on TL2018 and TL2019, if will show that luaotfload-3.0 (TL2019) create a
huge number of wrong "tounicode" values, and, in my case, they are all
"FFFD". For example, for "oneoldstyle", the relevant part of the dump
looks like this:

  [63281]={
   ["height"]=296222.72,
   ["index"]=2164,
   ["tounicode"]="FFFD",
   ["unicode"]=63281,
   ["width"]=327680.0,
  },
 
while it should be

  [63281]={
   ["height"]=296222.72,
   ["index"]=2164,
   ["tounicode"]="F731",
   ["unicode"]=63281,
   ["width"]=327680,
  },
 
as it was with luaotfload-2.8.

I'd say the current behaviour is wrong; and it makes luatex in TL2019
unusable for many old documents, since it's not possible to copy
anything containing numbers from them.

Could someone take a look, please?

-- 
Nikola Lečić = Никола Лечић  :  https://www.hse.ru/staff/ndlecic
fingerprint : FEF3 66AF C90E EDC3 D878  7CDC 956D F4AB A377 1C9B
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



More information about the luatex mailing list