[XeTeX] hyperref 6.79n breaks xetex's (xdvipdfmx) ability to set proper page dimensions (and overrides page dimensions on pdftex)

Heiko Oberdiek oberdiek at uni-freiburg.de
Mon Dec 14 18:15:46 CET 2009


Hello Jonathan,

On Mon, Dec 14, 2009 at 04:26:19PM +0000, Jonathan Kew wrote:

> On 14 Dec 2009, at 16:14, Vladimir Volovich wrote:
> 
> > "HO" == Heiko Oberdiek writes:
> > 
> >>> yes, thank you (and i don't even need to use "xetex" option at all -
> >>> since it appears to auto-detect xetex - just NOT using the "unicode"
> >>> option when using xetex appears to work). sorry for not finding it
> >>> out on my own.
> > 
> > HO> Not using unicode is definitely wrong, if you want to have unicode
> > HO> characters; hyperref then outputs the strings in PDFDocEncoding.
> > 
> > consider the file:
> > 
> > \documentclass{article}
> > %\usepackage{intcalc}
> > \usepackage[
> > %xetex,
> > %unicode,
> > pdftitle={test ^^^^0442^^^^0435^^^^0441^^^^0442}
> > ]{hyperref}
> > \usepackage{fontspec}
> > \setmainfont{Times New Roman}
> > \begin{document}
> > \tableofcontents
> > \section{test ^^^^0442^^^^0435^^^^0441^^^^0442}
> > This is a test.
> > \end{document}
> > 
> > if i run it with "xelatex -no-pdf test.tex", then i see inside the test.xdv:
> > 
> >  pdf:docinfo<</Title(test \xd1\x82\xd0\xb5\xd1\x81\xd1\x82)/Subject()/Creator(LaTeX with hyperref package)/Author()/Producer(XeTeX 0.9995)/Keywords()>>
> >  pdf:outline [-] 1<</Title(test \xd1\x82\xd0\xb5\xd1\x81\xd1\x82)/A<</S/GoTo/D(section.1)>>>>
> > 
> > i.e. the strings appear as utf-8 encoded strings
> 
> IIRC, utf-8 is the expected encoding form in xdv (assuming \special works similarly to other things like \write output)...

Only sometimes, not for strings hidden as PDF code, e.g. page labels.

> > (which is
> > non-standard), but nevertheless my pdf viewer (evince) is happily
> > showing pdf strings in UTF-8 encoding.
> 
> ...but what ends up in your PDF file? I'd expect xdvipdfmx to convert the
> utf-8 strings from the xdv file to utf16 for PDF bookmarks, because it
> knows this is the correct encoding form to use there.

Perhaps you can check for the byte order mark first.
Hyperref writes the 7-bit-byte sequence "\376\377",
but allowing the two bytes as 8-bit-bytes is also possible:

> >  pdf:docinfo<</Title(\\376\\377...

In this case the conversion should be suppressed and the
warning dropped.

I am still thinking about the case PDFDocEncoding.
What is the specification of the docinfo/outline specials?
Is there a way for pure PDFDocEncoded strings, or does hyperref
have to convert them to UTF-8?

Best regards
  Heiko <oberdiek at uni-freiburg.de>


More information about the XeTeX mailing list