[XeTeX] Type0 fonts somehow not built correctly for Unicode text-extraction and Accessibility

Ross Moore ross.moore at mq.edu.au
Tue Aug 7 01:07:50 CEST 2018

Hi all.

I think I’ve found a possible cause for this /ToUnicode  problem.
It’s with the way the  /CMapName  is constructed within the  CMap  resource itself,
at least when the font's name contains spaces.

See the attached image, where the window on the left is from a PDF constructed by XeLaTeX,
while the one on the right comes from the PDF/UA Association, and is properly valid.

[cid:26EE79DA-D1DD-4BDA-B30F-098631BE55E9 at telstra.com.au]

Because the space character is normally a delimiter, this is certainly invalid Postscript coding
to assign a value to  /CMapName .  So presumably it’s wrong in PDF too.
Surely the space needs to be encoded as #20 here?
The ‘.’ and ‘,’ are questionable. I think these are actually OK.

Changing the font to ‘Times’, the resulting PDF validates just fine.

Is it really a good idea to use the full path to the file, as the name here?
The PDF spec says it should be the name used in the file: viz.



(Required) The name of the CMap. It shall be the same as the value of CMapName in the CMap file.

BTW, there was also an issue with Ghostscript, concerning the way  CMapName  is constructed.
see  https://bugs.ghostscript.com/show_bug.cgi?id=690114  .
There is was the  // at the start of the name that was questioned.
 dvipdfmx  seems to be encoding the directory delimiter as a `-` now.

On 6 Aug 2018, at 8:10 am, Ross Moore <ross.moore at mq.edu.au<mailto:ross.moore at mq.edu.au>> wrote:

There seems to be a subtle problem with the way subsetted Type0 fonts are built
by xdvipdfmx with XeLaTeX jobs, for the purposes of finding the /ToUnicode  resource.

So I cannot see why the /ToUnicode resource is not being found.

This error in naming is almost certainly the reason.



Dr Ross Moore

Mathematics Dept | 12 Wally’s Walk, 734
Macquarie University, NSW 2109, Australia

T: +61 2 9850 8955  |  F: +61 2 9850 8114<tel:%2B61%202%209850%209695>
M:+61 407 288 255<tel:%2B61%20409%20125%20670>  |  E: ross.moore at mq.edu.au<mailto:rick.minter at mq.edu.au>


[cid:image001.png at 01D030BE.D37A46F0]<http://mq.edu.au/>

CRICOS Provider Number 00002J. Think before you print.
Please consider the environment before printing this email.<http://mq.edu.au/>

This message is intended for the addressee named and may
contain confidential information. If you are not the intended
recipient, please delete it and notify the sender. Views expressed
in this message are those of the individual sender, and are not
necessarily the views of Macquarie University.<http://mq.edu.au/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://tug.org/pipermail/xetex/attachments/20180806/f1df42c6/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2018-08-07 at 8.18.34 am.png
Type: image/png
Size: 226351 bytes
Desc: Screen Shot 2018-08-07 at 8.18.34 am.png
URL: <https://tug.org/pipermail/xetex/attachments/20180806/f1df42c6/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 4605 bytes
Desc: image001.png
URL: <https://tug.org/pipermail/xetex/attachments/20180806/f1df42c6/attachment-0003.png>

More information about the XeTeX mailing list