[XeTeX] Anchor names
Ross Moore
ross.moore at mq.edu.au
Thu Nov 3 21:31:02 CET 2011
Hi Heiko,
On 04/11/2011, at 1:58 AM, Heiko Oberdiek wrote:
> Hello,
>
> to get more to the point, I start a new thread.
Yes. very good idea.
> As we have learned, the PDF specification uses byte strings
> for anchor names. And there is a wish to use "normal" characters
> in anchor names.
Within the (La)TeX source, yes!
Of course it needs to be encoded to be safe within the PDF.
> Let's make an example:
>
> xetex --ini --output-driver='xdvipdfmx -V4' test
>
> \special{pdf:dest (änchør) [@thispage /XYZ @xpos @ypos null]}%
> \special{%
> pdf:ann width 4bp height 2bp depth 2bp<<%
> /Type/Annot%
> /Subtype/Link%
> /Border[0 0 1]%
> /C[0 0 1]% blue border
> /A<<%
> /S/GoTo%
> /D(änchør)%
> The link is not working. Looking into the PDF file we can find
> the link annotation:
>
> 4 0 obj
> <<
> /Type/Annot
> /Subtype/Link
> /Border[0 0 1]
> /C[0 0 1]
> /A<<
> /S/GoTo
> /D<feff00e4006e0063006800f80072>
In my reading of the PDF Spec. I came to the conclusion
that this UTF-16BE based format is not supported for Name objects.
But maybe I'm wrong here.
>>>
> /Rect[68 48 72 52]
>>>
> endobj
>
> and the destination:
>
> 7 0 obj
> [3 0 R/XYZ 30 150 null]
> endobj
> 8 0 obj
> <<
> /Names[<c3a46e6368c3b872>7 0 R]
>>>
> endobj
> The positions of both the link annotation and the destination are perfect.
> The name for "änchør" is given both times as hexadecimal string.
> That's ok, too. But the names are different:
>
> Destination: <c3a46e6368c3b872> ==> UTF-8
> Link annot.: <feff00e4006e0063006800f80072> ==> UTF-16BE with BOM
The spec reads that differences in Literal strings are allowed,
provided that they convert to the same thing in Unicode.
So there must be an internal representation that Adobe uses,
but is not visible to us, as builders of PDF documents.
>
> Conclusion:
> * The encoding mess with 8-bit characters remain even with XeTeX.
Well, surely it is manifest only in the driver part: xdvipdfmx
> Then I tried to be clever and a workaround by using
> /D<c3a46e6368c3b872> for the link name in the source.
> But it got converted and the PDF file still contains:
> /D<feff00e4006e0063006800f80072>
>
> Only the other way worked:
>
> \special{pdf:dest <feff00e4006e0063006800f80072> ...}
> \special{pdf:ann ... /D(änchør) ...}
OK.
Glad you did this test.
It shows two things:
1. that such text strings may well be valid for Names,
and that the PDF spec. is unclear about this;
2. these UTF16-BE strings are *not* equivalent to other
ways of encoding Name objects, after all.
This is something that should be reported as a bug to Adobe.
Can you produce a set of 3 or more PDFs that show the different
behaviours ?
Better still: a single PDF that illustrates the (non-)working
of hyperlinks according to the encodings of the Name objects
and Destinations.
Do it both with XeTeX and pdfTeX (with appropriate inputenc,
to handle the UTF8 input), to test whether there are any
differences.
I've not tested pdfTeX yet, because of the extra macro layer
required. Does hyperref handle the required conversions then?
>
> Result:
> * Even for nice short names the size is doubled and increased
> by two bytes.
> * Assymetrical behaviour of \special commands.
> * No documentation.
> * Unfair, arbitrary byte strings can't be written.
>
> Yours sincerely
> Heiko Oberdiek
Thanks for looking at this in detail.
Cheers,
Ross
------------------------------------------------------------------------
Ross Moore ross.moore at mq.edu.au
Mathematics Department office: E7A-419
Macquarie University tel: +61 (0)2 9850 8955
Sydney, Australia 2109 fax: +61 (0)2 9850 8114
------------------------------------------------------------------------
More information about the XeTeX
mailing list