[XeTeX] Anchor names

Sat Nov 5 02:39:38 CET 2011

On Sat, Nov 05, 2011 at 11:59:29AM +1100, Ross Moore wrote:

> >>> Conclusion:
> >>> * The encoding mess with 8-bit characters remain even with XeTeX.
> >> 
> >> Well, surely it is manifest only in the driver part:  xdvipdfmx
> > 
> > No, the problem are both parts. XeTeX can only write UTF-8,
> > the death for binary data.
> 
> But the bytes need to be encoded anyway, as hexadecimal.
> So why cannot this be done before writing out the resulting string?

See my example file, it get's reencoded.

> >>> Then I tried to be clever and a workaround by using
> >>> /D<c3a46e6368c3b872> for the link name in the source.
> >>> But it got converted and the PDF file still contains:
> >>> /D<feff00e4006e0063006800f80072>
> >>> 
> >>> Only the other way worked:
> >>> 
> >>> \special{pdf:dest <feff00e4006e0063006800f80072> ...}
> >>> \special{pdf:ann ... /D(änchør) ...}
> 
>  ... as this seems to be doing.
> I'd vote for *always* doing  pdf:dest  this way. 
> Then for consistency, do  pdf:ann  as if UTF-16BE  also.

It might be an accident that this way has worked. If the
bug is fixed, then it might only work the other way or
none way at all or ...
  Also instead of "http://.../test.pdf#Introduction" you 
would have to write something like
  "http://.../test.pdf#%FE%FF%49%6E%74%72%6F%64%75%63%74%69%6F%6E
Somehow I missed to see that as improvement?

> >> OK. 
> >> Glad you did this test.
> >> It shows two things:
> >> 
> >>  1.  that such text strings may well be valid for Names,
> >>      and that the PDF spec. is unclear about this;
> > 
> > I can't follow. Both string representations are covered
> > by the PDF specification, a literal string can be
> > specified in parentheses with an escaping mechanism (backslash)
> > or given as hex string in angle brackets. Unclear is the
> > syntax of the argument for \special{pdf:dest ...}.
> 
> Agreed.
> Can we standardise on the way that *looks like*  UTF-16BE with BOM.

That's a higher level and it's an artificial restriction of such a kind
that started the thread.
  Already the lower syntax level is unclear. The best solution would be,
if a syntax could be specified/implemented/
supported that allows byte strings. That means someone has to
dig into the sources and do some work, write some documentation ...

> >>  2.  these UTF16-BE strings are *not* equivalent to other
> >>      ways of encoding Name objects, after all.
> >> 
> >> This is something that should be reported as a bug to Adobe.
> > 
> > There is no problem with the PDF specification. A destination
> > name is a byte string. You can use UTF-16BE, invalid UTF-8,
> > a mixture of UTF-32BE with us-ascii, ... all are valid byte strings.
> > The problem is with xdvipdfmx that recodes the UTF-8 string
> > provided by XeTeX's specials in different ways.
> 
> Then convert the UTF-8 to the encoded HeX of the corresponding UTF16-BE,
> before passing it to  xdvipdfmx .
> 
> Surely that is feasible?

Except that the behaviour of 8-bit characters in destination strings
are unspecified and undocumented. It makes more sense to address
the problem upstream first.

> > pdfTeX is fine, because it doesn't reencode the strings.
> > Also \pdfescapestring, \pdfescapename, \pdfescapehex
> > are available for syntactically correct literal strings.
> 
> I've not used these primitives.
> Didn't you used to do such conversions within hyperref ?

Of course hyperref uses such conversions, these are required
by the PDF specification.

> Or with other utility packages in the 'oberdiek' bundle?

If a package misses such a necessary escaping make a bug
report.

Yours sincerely
  Heiko Oberdiek