[pdftex] pdftex core dump when including certain pdf files
Ross Moore
ross.moore at mq.edu.au
Fri Jul 22 04:56:15 CEST 2016
Hi Werner, Reinhard, Norbert
On Jul 20, 2016, at 4:05 PM, Werner LEMBERG <wl at gnu.org<mailto:wl at gnu.org>> wrote:
Did you try to merge different font subsets into one (subsetted fonts
already exist in the files)
Yes, by including PDF snippets into a larger texinfo document created
by pdftex (or xetex). I tried then to execute
ps2pdf pdftex-output.pdf out.pdf
which indeed reduces the file size and the number of subsetted fonts
by 30%, but there is a problem that makes all links disappear... We
have to further investigate.
I’ve been thinking along the same lines, first trying to find a way
to generate PDFs using pdfTeX, having no fonts included.
I wasn’t able to achieve that.
However, I just ran a successful test with pdfTeX and GS as follows.
1. main document has some text in some font (I used CMR12).
It imports several copies of the same image which is a PDF
using the same font.
\includegraphics[… options…]{images/datadoc.pdf}
has a local file pdftex.map in the working directory.
This contains a line to include the whole font unsubsetted:
cmr12 CMR12 <<cmr12.pfb
2. image source named datadoc.tex —> datadoc.pdf
processed within a subdirectory images/
(so finds the standard pdftex.map )
3. after generating the image,
run:
ps2pdf -dEmbedAllFonts=false datadoc.pdf datadoc-nofonts.pdf
Note the significant reduction in file size:
-rw-r--r-- 1 ross staff 10692 Jul 22 11:33 datadoc.pdf
-rw-r--r-- 1 ross staff 3191 Jul 22 11:40 datadoc-nofonts.pdf
4. go back to the source document, change the names of images:
\includegraphics[… options…]{images/datadoc-nofonts.pdf}
Process it successfully with pdfTeX.
The Preview window now shows the images' text in a default system font.
(or a small filesize font supplied by GS.)
5. Edit in maindoc.pdf as follows:
locate the XObject where the image file is included; viz.
1 0 obj
<<
/Type /XObject
/Subtype /Form
/FormType 1
/PTEX.FileName (./images/datadoc-nofonts.pdf)
/PTEX.PageNumber 1
/PTEX.InfoDict 12 0 R
/BBox [0 0 595.28 841.89]
/Resources <<
/ProcSet [ /PDF /Text ]
/ExtGState <<
/R7 13 0 R
>>/Font << /R8 14 0 R>>
>>
/Length 146
/Filter /FlateDecode
>>
stream
...
note the line: /Font << /R8 14 0 R>>
We are going to change that object reference number.
Find the object corresponding to this font being used in the main document.
viz.
5 0 obj
<<
/Font << /F16 8 0 R /F15 9 0 R >>
/XObject << /Fm1 2 0 R /Fm2 3 0 R /Fm3 4 0 R >>
/ProcSet [ /PDF /Text ]
>>
endobj
and
8 0 obj
<<
/Type /Font
/Subtype /Type1
/BaseFont /CMR12
/FontDescriptor 24 0 R
/FirstChar 11
/LastChar 119
/Widths 22 0 R
>>
endobj
Go back and make the change: 2 characters
/Font << /R8 8 0 R>>
that is ’14’ —> ‘8 ‘ preserving byte lengths.
Save the edited PDF.
6. close the Preview window of maindoc.pdf
then open it again.
Perfect!! Images are now showing using the correct font.
Since I used the same image 3 times, a single edit coped with
all the XObject instances from \includegraphics .
With different images, there would need to be a single edit for each.
If different fonts are used in the images, you’d need an edit for each font.
or did you try to convince Ghostscript to insert missing fonts (PDF
files contain only references to external fonts e.g., /FontName, but
no physical font, subsetted or not)?
I'm not there yet, but this is the ultimate goal.
Can you build a workflow using the trick described above?
In particular, automating the edits of the font object references.
Some further notes.
I kept the content streams uncompressed, to be able to
search in the PDF, when needed:
\pdfcompresslevel 0
\pdfobjcompresslevel 0
Not sure how necessary this will always be.
But compression can be applied later anyway.
The main document had a 2nd font for the page numbering.
The image had a 2nd /ExtGState resource dictionary: /R7 13 0 R
13 0 obj
<<
/Type /ExtGState
/BM /Normal
/OPM 1
/TK true
>>
endobj
This is to ensure the image completely overprints what is on the page
beneath it, I think. In my test there was nothing. But you might have
a background image or pattern, or somesuch.
I used a Type1 font here.
Not sure whether there will be any differences with other kinds of fonts.
Can pdfTeX directly use OTF fonts?
Doesn’t it have to break them up into < 256-character subsets?
As for hyperlinks. I didn’t have any in my image PDF.
In your case, is the URL associated to the image as a whole?
Or can you have multiple links in an image PDF?
— Presumably the latter, as the former is easily coped with
in the main document.
In the latter case Ghostscript must be able to find the fonts.
Yes, I really hope that!
This method doesn’t require Ghostscript to find fonts at all.
All font-handling is done by pdfTeX and the manual edits.
But maybe you can develop a way to automate those edits?
[...] If you say
too much information is already lost during the subsetting process
I assume that your PDF files already contain font subsets.
Yes, this is current setup.
But what I have in mind is that you create PDF files which don't
contain any fonts at all but only the information which font
(/FontName) should be used.
Exactly this is my plan also.
The above technique does exactly that, I’d say.
What would be nice is a way to create the images w/o any fonts,
directly with pdfTeX, so not requiring GS at all.
Isn't this exactly what you want to achieve? Create zillions of
files which don't contain any fonts at all, merge them, and finally
insert the fonts in order to make the document portable?
Yes.
I must admit that I don't know anything about LilyPond except that
musicians like it. Presumably it creates PostScript code and
converts it to PDF. Right?
Yes. We have to modify lilypond to not embed the font resources into
the PS file but to collect them in a directory.
Werner
Hope this helps,
Ross
Dr Ross Moore
Mathematics Dept | Level 2, S2.638 AHH
Macquarie University, NSW 2109, Australia
T: +61 2 9850 8955 | F: +61 2 9850 8114
M:+61 407 288 255 | E: ross.moore at mq.edu.au<mailto:ross.moore at mq.edu.au>
http://www.maths.mq.edu.au
[cid:image001.png at 01D030BE.D37A46F0]
CRICOS Provider Number 00002J. Think before you print.
Please consider the environment before printing this email.
This message is intended for the addressee named and may
contain confidential information. If you are not the intended
recipient, please delete it and notify the sender. Views expressed
in this message are those of the individual sender, and are not
necessarily the views of Macquarie University.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/pdftex/attachments/20160722/f63dc591/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 4605 bytes
Desc: image001.png
URL: <http://tug.org/pipermail/pdftex/attachments/20160722/f63dc591/attachment-0001.png>
More information about the pdftex
mailing list