Paul Isambert zappathustra at free.fr
Sun Jan 15 10:28:00 CET 2012

Philip TAYLOR <P.Taylor at rhul.ac.uk> a écrit:
> Seems a very useful utility, Reinhard, and one of
> which I was previously unaware, but why does it
> eat all the "Th" (but not "th") groups ?!

Probably because you've used a font with the "Th" ligature and it isn't
recognized. Indeed, with a document in CM, "Th" renders to "Th", while
the same documents in Chaparral renders "Th" as some impossible glyph.

It also renders "ff" as a ligature, unless you include (in the TeX
document with pdfTeX or LuaTeX):

  \pdfglyphtounicode{ff}{0066 0066}

in which case it properly analyzes the ligature.

So something along those lines should be tried with "Th", provided you
find the glyph's name (not "Th" in Chaparal):

  \pdfglyphtounicode{<name>}{0054 0068}

Hopefully it works. But still that must be done before compilation, or
perhaps pdftotext as some option signalling such glyph must be mapped to
such character(s)?


