[tex-live] Making texts externally replaceable in PDFs, e.g. with sed(1)
Osipov, Michael
michael.osipov at siemens.com
Fri Dec 14 18:03:01 CET 2018
Hi Phil,
Am 2018-12-14 um 17:43 schrieb Philip Taylor:
>
>
> Osipov, Michael wrote:
>> Hi folks,
>>
>> we are using XeTeX 3.14159265-2.6-0.99999 (TeX Live 2018) on Windows
>> and FreeBSD.
>>
>> After studying the PDF specification [1] and how XeLaTeX and xdvipdfmx
>> work with Unicode (from PDF samples), I believe that my request is
>> (virtually) impossible.
>> I'd be happy if someone could either confirm this or prove me wrong.
>>
>> Task: We are producing PDFs on our server (from LaTeX source) for the
>> client which takes the PDF and uploads it to another service which may
>> replace placeholders, e.g., %DOCID% with the actual document ID in the
>> target system. [Remainder snipped[
>
> Well. I used the following source :
>
>> Now is the time \%DOCID\% for all good men
>>
>> to come to the aid of the party.
>>
>> \end
>>
>
> which generated the attached PDF (Test.pdf). I then opened "Test.pdf"
> in Adobe Acrobat Pro DC, selected "Tools / Edit PDF", and replaced
> "%DOCID%" by "The quick brown fox jumps right over the lazy dog's
> back". The text re-flowed as one would hope. If Adobe Acrobat Pro DC
> can do it, then it clearly can be done; all that is needed is to write
> code to emulate Adobe Acrobat Pro DC's behaviour w.r.t. editing text.
thanks for your quick reply, but neither of will work and suffers from
conceptual misunderstanding.
Look closely at the Test.pdf, it is compressed. Cannot be processed with
sed(1). Even if you decompress it, it contains a Type 1 font which has
no /Encoding or /Ording. /FontFile3 references 10 0 obj which contains
the entire font. This does not resemble my Unicode case at all.
The content is in 5 0 obj:
> q 1 0 0 1 72 769.89 cm BT /F1 9.9626 Tf 19.925 -9.963 Td[(No)27(w)-332(is)-333(the)-334(time)-333(%DOCID%)-333(for)-334(all)-333(go)-28(o)-28(d)-333(men)-333(to)-334(come)-333(to)-333(the)-334(aid)-333(of)-333(the)-334(part)27(y)83(.)]TJ 211.584 -654.747 Td[(1)]TJ ET Q
As for Adobe Acrobat Pro DC: It is a fully-fledged PDF suite which knows
the format best and operates on an abstract memory representation of the
PDF while sed(1) does operate on pure bytes.
The operations this suite performing doing can only be achieved with a
library like iText of PDFBox [1]. this isn't a route I really want to
go. Especially because the post-processing on the target side is out of
my hands at all.
Regards,
Michael
[1] https://stackoverflow.com/q/52027733/696632
More information about the tex-live
mailing list