[XeTeX] Type0 fonts somehow not built correctly for Unicode text-extraction and Accessibility

Ross Moore
Tue Aug 7 01:07:50 CEST 2018

Hi all.

I think I’ve found a possible cause for this /ToUnicode  problem.
It’s with the way the  /CMapName  is constructed within the  CMap  resource itself,
at least when the font's name contains spaces.

See the attached image, where the window on the left is from a PDF constructed by XeLaTeX,
while the one on the right comes from the PDF/UA Association, and is properly valid.

Because the space character is normally a delimiter, this is certainly invalid Postscript coding
to assign a value to  /CMapName .  So presumably it’s wrong in PDF too.
Surely the space needs to be encoded as #20 here?
The ‘.’ and ‘,’ are questionable. I think these are actually OK.

Changing the font to ‘Times’, the resulting PDF validates just fine.

Is it really a good idea to use the full path to the file, as the name here?
The PDF spec says it should be the name used in the file: viz.



(Required) The name of the CMap. It shall be the same as the value of CMapName in the CMap file.

BTW, there was also an issue with Ghostscript, concerning the way  CMapName  is constructed.
see  https://bugs.ghostscript.com/show_bug.cgi?id=690114  .
There is was the  // at the start of the name that was questioned.
 dvipdfmx  seems to be encoding the directory delimiter as a `-` now.

On 6 Aug 2018, at 8:10 am, Ross Moore wrote:

There seems to be a subtle problem with the way subsetted Type0 fonts are built
by xdvipdfmx with XeLaTeX jobs, for the purposes of finding the /ToUnicode  resource.

So I cannot see why the /ToUnicode resource is not being found.

This error in naming is almost certainly the reason.





