[tex-live] Making texts externally replaceable in PDFs, e.g. with sed(1)

Osipov, Michael michael.osipov at siemens.com
Fri Dec 14 17:30:36 CET 2018


Am 2018-12-14 um 16:50 schrieb [ext] Osipov, Michael:
> Hi folks,
> 
> we are using XeTeX 3.14159265-2.6-0.99999 (TeX Live 2018) on Windows and 
> FreeBSD.
> 
> After studying the PDF specification [1] and how XeLaTeX and xdvipdfmx 
> work with Unicode (from PDF samples), I believe that my request is 
> (virtually) impossible.
> I'd be happy if someone could either confirm this or prove me wrong.
> 
> Task: We are producing PDFs on our server (from LaTeX source) for the 
> client which takes the PDF and uploads it to another service which may 
> replace placeholders, e.g., %DOCID% with the actual document ID in the 
> target system. So the PDF has to be uncompressed (xdvipdfmx -z 0) and 
> has to contain literal strings "(%DOCID%)Tj" or "[(%DOCID%)]TJ" 
> according  to the PDF spec.
> 
> XeLaTeX produces the following:
>> BT /F1 5.9776 Tf -40.819 -756.627 
>> Td[<00270052004e00580050004800510057005100580050005000480055>]TJ /F1 
>> 9.9626 Tf 0 -11.955 Td[<0008002700320026002c00270008>]TJ ET
> 
>> begincmap
>> /CMapName /C:-WINDOWS-fonts-siemens_global_roman.ttf,000-UTF16 def
>> /CMapType 2 def
>> /CIDSystemInfo <<
>>   /Registry (Adobe)
>>   /Ordering (UCS)
>>   /Supplement 0
>>>> def
>> 1 begincodespacerange
>> <0000> <FFFF>
>> endcodespacerange
>> 13 beginbfchar
>> <0008> <0025>
>> <0017> <0034>
>> <001B> <0038>
>> <002A> <0047>
>> <002C> <0049>
>> <002E> <004B>
>> <0032> <004F>
>> <0033> <0050>
>> <0039> <0056>
>> <005C> <0079>
>> <005D> <007A>
>> <008B> <00A9>
>> <00B3> <2014>
>> endbfchar
>> 5 beginbfrange
>> <0010> <0015> <002D>
>> <0024> <0028> <0041>
>> <0035> <0037> <0052>
>> <0044> <0053> <0061>
>> <0055> <0059> <0072>
>> endbfrange
>> endcmap
> 
> So it writes hexadecimal character codes which map to Unicode points in 
> our true type font Siemens Global.
> 
> So for a sed(1)-based postprocessor it is virtually impossible to map 
> "<0008002700320026002c00270008>" to "%DOCID%" w/o analyzing the PDF 
> objects.
> 
> Requesting XeLaTex to produce
>> BT /F1 5.9776 Tf -40.819 -756.627 
>> Td[<00270052004e00580050004800510057005100580050005000480055>]TJ /F1 
>> 9.9626 Tf 0 -11.955 Td[(%DOCID%)]TJ ET
> 
> will not work because the /ToUnicode cmap does not have a character 
> mapping from the literal "%" (etc.) to the corresponding Unicode point. 
> Especially because the to be replaced chars in the real document ID 
> would need to be in the bfchar listing.
> 
> Having procuded a capable, corresponding PDF with PDF XChange printer 
> driver embedded the Siemens Global twice. As Identity-H encoding 
> (subset) and with WinAnsiEncoding (completely). Without the char code to 
> glyph mapping it seems to be possible. So the approach has to be a 8-bit 
> font encoding:
>> /Type /Font
>> /Subtype /TrueType
>> /BaseFont /SiemensSansGlobal-Regular
>> /FirstChar 32
>> /LastChar 220
>> /Encoding /WinAnsiEncoding
> 
> This is something which is impossible because of XeLaTeX's Unicode 
> nature. It will always use CID with Indentity-H and UCS ordering.
> 
> This will get even more complicated if glyph spacing is involved.
> 
> I'd be happy if someone could drop a comment or two on the issue.
> 
> Regards,
> 
> Michael
> 
> PS: I haven't looked into the pdfx package yet how this could solve the 
> issue with XeLaTeX. Plus, my PDF spec and LaTeX knowledge is very little.
> 
> [1] 
> https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf

I have briefly tried pdfx with a-1b option and disabled hyperxmp. The 
PDF is generated properly, but the font configuration remains the same. 
So PDF/A doesn't give me any benefits.

Michael


More information about the tex-live mailing list