[XeTeX] Unicode bookmarks in xdvipdfmx - prototype solution (long)
Timothy Eyre
tinytim1234 at hotmail.com
Wed Jun 14 09:00:50 CEST 2006
I've produced a modified version of xdvipdfmx that includes prototype
Unicode bookmark support.
It uses the usual dvipdfm special
\special{pdf: out 1 << /Title (<UTF8 string>) /Dest [ @thispage /FitH @ypos
] >>}
to receive the bookmark specification.
To make the change kick in you need to specify the following at the top of
your document.
\special{pdf:tounicode EUC-UCS2}
The EUC-UCS2 identifier is just a dummy; any CID mapping file on your system
will do. This is just required to trick the rest of the xdvipdfmx code into
calling reencodestring().
The change consists of a modification to the function reencodestring() in
the xdvipdfmx source file spc_pdfm.c. It also uses the modules
http://www.unicode.org/Public/PROGRAMS/CVTUTF/ConvertUTF.h
and
http://www.unicode.org/Public/PROGRAMS/CVTUTF/ConvertUTF.c
The change works by converting the UTF-8 string to big-endian UTF-16 and
prepending 0xFEFF. This is the Unicode format for text strings given in the
PDF spec.
I've only done a little testing in which I got mixed Cyrillic and Kanji to
display just fine. A limitation of this change is that it assumes the UTF-16
result is precisely two bytes long. I'm also concerned that it might convert
strings other than bookmark strings; although theorectically this shouldn't
matter.
I'm not planning to doing any further work on this. If someone else wants to
take on the task of bringing this up to product quality then I'd be very
grateful. I think that:
- The change should only kick in if the user has specified something like
\special{pdf:tounicode XETEX-UTF8-UCS2} and behave as before otherwise; and
- The code should cope with UTF16 output that is longer than a single pair
of bytes; at least by returning an error.
Tim
static int
reencodestring (CMap *cmap, pdf_obj *instring)
{
#define WBUF_SIZE 4096
unsigned char wbuf[WBUF_SIZE];
unsigned char *obufcur;
unsigned char *inbufcur;
long inbufleft, obufleft;
UTF16 *utf16TargetStart;
UTF8 *utf8SourceStart;
UTF16 utf16_result[3];
ConversionResult result;
if (!cmap || !instring)
return 0;
inbufleft = pdf_string_length(instring);
inbufcur = pdf_string_value (instring);
wbuf[0] = 0xfe;
wbuf[1] = 0xff;
obufcur = wbuf + 2;
obufleft = WBUF_SIZE - 2;
while (inbufleft > 0)
{
if (obufleft < 2)
{
return -1;
}
memset(utf16_result, 0, sizeof(utf16_result));
utf16TargetStart = utf16_result;
utf8SourceStart = inbufcur;
result = ConvertUTF8toUTF16((const UTF8 **)&utf8SourceStart,
&(inbufcur[trailingBytesForUTF8[inbufcur[0]]+1]),
&utf16TargetStart,
&(utf16_result[2]),
strictConversion);
if (result != conversionOK)
{
return -1;
}
*(obufcur) = *(((char *)utf16_result)+1);
obufcur++;
obufleft--;
*(obufcur) = *((char *)utf16_result);
obufcur++;
obufleft--;
inbufleft -= trailingBytesForUTF8[inbufcur[0]] + 1;
if (inbufleft < 0)
{
return -1;
}
inbufcur += trailingBytesForUTF8[inbufcur[0]] + 1;
}
pdf_set_string(instring, wbuf, WBUF_SIZE - obufleft);
return 0;
}
More information about the XeTeX
mailing list