[XeTeX] Anchor names

Ross Moore ross.moore at mq.edu.au
Thu Nov 3 21:31:02 CET 2011


Hi Heiko,

On 04/11/2011, at 1:58 AM, Heiko Oberdiek wrote:

> Hello,
> 
> to get more to the point, I start a new thread.

Yes. very good idea.

> As we have learned, the PDF specification uses byte strings
> for anchor names. And there is a wish to use "normal" characters
> in anchor names.

Within the (La)TeX source, yes!
Of course it needs to be encoded to be safe within the PDF.

> Let's make an example:
> 
> xetex --ini --output-driver='xdvipdfmx -V4' test
> 
>      \special{pdf:dest (änchør) [@thispage /XYZ @xpos @ypos null]}%

>       \special{%
>         pdf:ann width 4bp height 2bp depth 2bp<<%
>           /Type/Annot%
>           /Subtype/Link%
>           /Border[0 0 1]%
>           /C[0 0 1]% blue border
>           /A<<%
>             /S/GoTo%
>             /D(änchør)%

> The link is not working. Looking into the PDF file we can find
> the link annotation:
> 
>  4 0 obj
>  <<
>  /Type/Annot
>  /Subtype/Link
>  /Border[0 0 1]
>  /C[0 0 1]
>  /A<<
>  /S/GoTo
>  /D<feff00e4006e0063006800f80072>

In my reading of the PDF Spec. I came to the conclusion
that this UTF-16BE based format is not supported for Name objects.

But maybe I'm wrong here.

>>> 
>  /Rect[68 48 72 52]
>>> 
>  endobj
> 
> and the destination:
> 
> 7 0 obj
> [3 0 R/XYZ 30 150 null]
> endobj
> 8 0 obj
> <<
> /Names[<c3a46e6368c3b872>7 0 R]
>>> 
> endobj


> The positions of both the link annotation and the destination are perfect.
> The name for "änchør" is given both times as hexadecimal string.
> That's ok, too. But the names are different:
> 
> Destination: <c3a46e6368c3b872> ==> UTF-8
> Link annot.: <feff00e4006e0063006800f80072> ==> UTF-16BE with BOM

The spec reads that differences in Literal strings are allowed,
provided that they convert to the same thing in Unicode.
So there must be an internal representation that Adobe uses,
but is not visible to us, as builders of PDF documents.

> 
> Conclusion:
> * The encoding mess with 8-bit characters remain even with XeTeX.

Well, surely it is manifest only in the driver part:  xdvipdfmx

> Then I tried to be clever and a workaround by using
> /D<c3a46e6368c3b872> for the link name in the source.
> But it got converted and the PDF file still contains:
> /D<feff00e4006e0063006800f80072>
> 
> Only the other way worked:
> 
>  \special{pdf:dest <feff00e4006e0063006800f80072> ...}
>  \special{pdf:ann ... /D(änchør) ...}

OK. 
Glad you did this test.
It shows two things:

  1.  that such text strings may well be valid for Names,
      and that the PDF spec. is unclear about this;

  2.  these UTF16-BE strings are *not* equivalent to other
      ways of encoding Name objects, after all.

This is something that should be reported as a bug to Adobe.

Can you produce a set of 3 or more PDFs that show the different 
behaviours ?

Better still: a single PDF that illustrates the (non-)working
of hyperlinks according to the encodings of the Name objects
and Destinations.

Do it both with XeTeX and pdfTeX (with appropriate inputenc, 
to handle the UTF8 input), to test whether there are any 
differences.  
I've not tested pdfTeX yet, because of the extra macro layer
required. Does  hyperref  handle the required conversions then? 

> 
> Result:
> * Even for nice short names the size is doubled and increased
>  by two bytes.
> * Assymetrical behaviour of \special commands.
> * No documentation.
> * Unfair, arbitrary byte strings can't be written.
> 
> Yours sincerely
>  Heiko Oberdiek


Thanks for looking at this in detail.


Cheers,

	Ross

------------------------------------------------------------------------
Ross Moore                                       ross.moore at mq.edu.au 
Mathematics Department                           office: E7A-419      
Macquarie University                             tel: +61 (0)2 9850 8955
Sydney, Australia  2109                          fax: +61 (0)2 9850 8114
------------------------------------------------------------------------






More information about the XeTeX mailing list