[Fontinst] A question about reglyphfont

Lars Hellström Lars.Hellstrom at residenset.net
Thu Oct 8 12:59:37 CEST 2009


Pierre MacKay skrev:
> I am following this discussion with great interest, but I wonder whether 
> the problems of using a font with the Adobe Expert Character set names 
> have been looked at.
> 
> Adobe seems (it is difficult to be sure of the causes) to have set up 
> Acrobat reader 8 and 9 so that they trap names like Asmall  . . .  
> Zsmall, the old-style figures and the ff ligatures.  Unless I use the 
> on-line distiller at Acrobat.com, I get PDFs in which all characters 
> from the Expert character set are replaced by blank space.

*What do you mean by* "are replaced by blank space", exactly? Don't 
they show up when you view the document, are they missing when you 
print the document, or are they missing when you copy text from the 
document?

> Actually, 
> not all, because the accented glyphs in the range E0--FF come through.
> 
> It is, of course, possible to bypass the problem by using something 
> other than Reader 8 or 9.

/me typically uses Reader 5 (unless the document has compressed object 
streams), because the GUI quality seems (IMHO) to be a decreasing 
function of version number. ;-)

> Reader 6 and 7 did not have the problem, so 
> it is something introduced by Adobe in the later versions of Reader.  I 
> submitted a bug report over the problem when Reader 8 came out.  It was 
> acknowledged, and I was told that it would be corrected "in the next 
> major release."  It clearly has not been corrected.  One of the worst 
> aspects of this bug is that it destroys the archival value of all PDFs 
> distilled before the arrival of Reader 8.  (I don't know exactly when 
> the change was made in Acrobat Distiller, but I suspect that it was 
> contemporaneous with Reader 8).
> 
> A comparison of output from the online distiller at Adobe.com and output 
> from Ghostscript 8.63 shows that in the Adobe distiller, any font with 
> the names Asmall  . . .  Zsmall is treated to two consecutive 
> operations, the first of which is associated with "/Tounicode."  I have 
> been unable to find out what /Tounicode does.  Does it recode the entire 
> Adobe Expert Character set into a page in the Private use sector?

If the difference involves /ToUnicode, then it should only be Copy text 
and Search operations that misbehave, right? (IMO, that wouldn't 
destroy the archival value of PDFs, but nor would bugs specific to one 
PDF reader.)

FYI, the /ToUnicode entry in a PDF font dictionary sets up a mapping 
from slots in the font to Unicode code points; the PDF1.5 spec 
describes this in Section 5.9 "Extraction of Text Content". Providing 
such a map explicitly is really the only general way to assign an 
interpretation to the text in a PDF, but originally Acrobat Reader also 
had heuristics for guessing an interpretation from the glyph names. It 
is possible that the change in AR8 you observed was merely a retirement 
of some of these heuristics, so that "Asmall" is no longer on the list 
of known names, even though "a" might still be.

Fontinst has had the ability to generate /ToUnicode CMaps since v1.928 
(or thereabout), through the \etxtocmap command. Getting PDF generators 
to put it in at the right place is however not so straightforward; 
pdfTeX only gives such access to font dictionaries from the TeX side 
(whereas the mapfile would be more useful) and it only works for fonts 
that have been \font'defed (hence not for base fonts of virtual fonts). 
OTOH, recent pdfTeXes seem to have some built-in heuristics of their 
own for generating ToUnicode data; I haven't studied those in detail. 
Nor do I know what gs or dvipdfmx can currently do in this respect.

There is also the possibility of putting /ActualText data directly into 
the page content stream by using pdf: \specials. I've recently 
considered adding support for this to fontinst (the specials would be 
embedded into the VF; I have figured out how to do it elegantly), but 
that's probably only appropriate for faked glyphs (e.g. Euro from C and 
two rules). See also the accsupp LaTeX package.

Lars Hellström



More information about the fontinst mailing list