[XeTeX] Type0 fonts somehow not built correctly for Unicode text-extraction and Accessibility

Ross Moore ross.moore at mq.edu.au
Mon Aug 6 00:11:02 CEST 2018

There seems to be a subtle problem with the way subsetted Type0 fonts are built
by xdvipdfmx with XeLaTeX jobs, for the purposes of finding the /ToUnicode  resource.

The main view is fine, but when checking other aspects, for standards compliance, some basic tests fail.
See e.g. with included image.

[cid:4F0A2FCB-B291-48D1-9450-890808FE0D02 at telstra.com.au]

Firstly, the CIDSet is not built correctly, by not including all glyphs that are used.
 pdfTeX hs a similar problem with regard to Charset.
The issue seems to be that if an accented character is built internally from multiple glyphs,
then each of those glyphs should be included in the CIDSet, as well as the combined character.

Acrobat’s Preflight has a filter to remove such incomplete CIDSets, so this isn’t a crucial deficiency.

Secondly, although clearly present, the /ToUnicode  CMap resource is not being found.
The font seems to be named correctly here, according to:

page 279  of  ISO 32000_1:2008

§ 9.7.6  Type 0 Font Dictionaries
§  General
A Type 0 font dictionary contains the entries listed in Table 121.

                            Table 121 – Entries in a Type 0 font dictionary

BaseFont  name    (Required) The name of the font.
  If the descendant is a Type 0 CIDFont, this name should be the concatenation of the CIDFont’s BaseFont name, a hyphen,
  and the CMap name given in the Encoding entry (or the CMapName entry in the CMap).
  If the descendant is a Type 2 CIDFont, this name should be the same as the CIDFont’s BaseFont name.

Since this is a Type 2 CIDFont, the 2nd sentence is applicable.

And since it is a subset of the full font, the last sentence below is also applicable.

page 285  of  ISO 32000_1:2008

§9.8.3 Font Descriptors for CIDFonts
§  General
In addition to the entries in Table 122, the FontDescriptor dictionaries of CIDFonts may contain the entries listed in Table 124.

           Table 124 – Additional font descriptor entries for CIDFonts

CIDSet   stream    (Optional) A stream identifying which CIDs are present in the CIDFont file.
 If this entry is present, the CIDFont shall contain only a subset of the glyphs in the character collection defined by the CIDSystemInfo dictionary.
 If it is absent, the only indication of a CIDFont subset shall be the subset tag in the FontName entry (see 9.6.4, "Font Subsets").

So I cannot see why the /ToUnicode resource is not being found.

Would someone with more experience building fonts and subsetting, please have a look at this issue.



Dr Ross Moore
Mathematics Dept | 12 Wally’s Walk, 734
Macquarie University, NSW 2109, Australia
T: +61 2 9850 8955  |  F: +61 2 9850 8114
M:+61 407 288 255  |  E: ross.moore at mq.edu.au<mailto:ross.moore at mq.edu.au>


[cid:image001.png at 01D030BE.D37A46F0]<http://mq.edu.au/>

CRICOS Provider Number 00002J. Think before you print.
Please consider the environment before printing this email.

This message is intended for the addressee named and may
contain confidential information. If you are not the intended
recipient, please delete it and notify the sender. Views expressed
in this message are those of the individual sender, and are not
necessarily the views of Macquarie University.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://tug.org/pipermail/xetex/attachments/20180805/e2761069/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2018-08-06 at 7.42.13 am.png
Type: image/png
Size: 382426 bytes
Desc: Screen Shot 2018-08-06 at 7.42.13 am.png
URL: <https://tug.org/pipermail/xetex/attachments/20180805/e2761069/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 4605 bytes
Desc: image001.png
URL: <https://tug.org/pipermail/xetex/attachments/20180805/e2761069/attachment-0003.png>

More information about the XeTeX mailing list