[XeTeX] Anchor names
Ross Moore
ross.moore at mq.edu.au
Mon Nov 7 02:30:59 CET 2011
Hi Heiko, and Akira,
On 06/11/2011, at 3:55 AM, Heiko Oberdiek wrote:
> \special{%
> pdf:ann width 4bp height 2bp depth 2bp<<%
> /Type/Annot%
> /foo/ab#abc
> /Subtype/Link%
> /Border[0 0 1]%
> /C[0 0 1]% blue border
> /A<<%
> /S/GoToR%%
> /F(t.tex)%
> /D<66f6f8>%
> % Result: <66f6f8>, but ** WARNING ** Failed to convert input string toUTF16...
> % /D<c3a46e6368c3b872>%
> % Result: <feff00e4006e0063006800f80072>
> >>%
> >>%
> }%
I've verified that this is indeed what happens, with
This is XeTeX, Version 3.1415926-2.2-0.9997.4 (TeX Live 2010)
Now looking at the source coding, at:
http://ftp.tug.org/svn/texlive/trunk/Build/source/texk/xdvipdfmx/src/spc_pdfm.c?diff_format=u&view=log&pathrev=13771
it is hard to see how those results can occur.
The warning message is only produced when the function
maybe_reencode_utf8(pdf_obj *instring)
returns a value less than 1 (e.g. -1)
viz. lines 571--578: function: modstrings
>>> }
>>> else {
>>> r = maybe_reencode_utf8(vp);
>>> }
>>> if (r < 0) /* error occured... */
>>> WARN("Failed to convert input string to UTF16...");
>>> }
>>> break;
or lines 1145--1150 (for pdf:dest but not actually used here)
>>> #ifdef ENABLE_TOUNICODE
>>> error = maybe_reencode_utf8(name);
>>> if (error < 0)
>>> WARN("Failed to convert input string to UTF16...");
>>> #endif
>>> array = parse_pdf_object(&args->curptr, args->endptr, NULL);
Now that function should find only ASCII bytes in '<66f6f8>'
and '<c3a46e6368c3b872>' .
In both cases the string should have remained silently unmodified.
viz. lines 474--481 function: maybe_reencode_utf8
>>> /* check if the input string is strictly ASCII */
>>> for (cp = inbuf; cp < inbuf + inlen; ++cp) {
>>> if (*cp > 127) {
>>> non_ascii = 1;
>>> }
>>> }
>>> if (non_ascii == 0)
>>> return 0; /* no need to reencode ASCII strings */
What am I reading wrong? If anything.
Has there been an earlier de-coding of <....> hex-strings
into byte values, done either by XeTeX or xdvipdfmx ?
If so, then surely it is this which is unneccessary.
(Not XeTeX, since the string is correct in the .xdv file.)
e.g. function pst_string_parse_hex in pst_obj.c seems
to be doing this. But that is only supposed to be used with
coding from cmap_read.c and t1-load.c .
And these are only meant for interpreting the font data that goes
into content streams. So I'm at a loss in understanding this.
But 'modstrings' is applied recursively, and part of it
seems to be checking for a CMap (when appropriate?).
So maybe there is an unintended un-encoding that precedes
an encoding?
>
> It seems that *all* literal strings are affected by the
> unhappy reconversions. But the PDF specification lets no choice,
> there are various places for byte strings.
> In the example, if a file name has byte string XY and the destination Z,
> then the file name is XY and the file name Z and nothing else. Otherwise
> neither the file or the destination will be found.
>
> Thus either (XeTeX/)xdvipdfmx finds a way for specifying arbitrary
> byte strings (at least for PDF strings(/streams)) -- it is a
> requirement of the PDF specification. Or we have to conclude
> that 8-bit is not supported and that means US-ASCII.
>
> Yours sincerely
> Heiko Oberdiek
Hope this helps --- or you can help me :-)
Cheers,
Ross
------------------------------------------------------------------------
Ross Moore ross.moore at mq.edu.au
Mathematics Department office: E7A-419
Macquarie University tel: +61 (0)2 9850 8955
Sydney, Australia 2109 fax: +61 (0)2 9850 8114
------------------------------------------------------------------------
More information about the XeTeX
mailing list