[luatex] PDF strings.

Paul Isambert zappathustra at free.fr
Sun Nov 28 19:22:48 CET 2010

Le 28/11/2010 18:47, Heiko Oberdiek a écrit :
> On Sun, Nov 28, 2010 at 04:51:55PM +0100, Paul Isambert wrote:
>> As you may know, PDF reads strings encoded in either its own scheme,
>> or in UTF-16.
>> My problem is I want accented characters in bookmarks, e.g:
>> \pdfoutline goto name {there}{Héhé}
>> I can't feed PDF-encoded strings directly (LuaTeX would complain),
>> but PDF also understands \nnn-encoded characters, where \nnn is a
>> number in base-8, so I actually do
>> \pdfoutline goto name {there}{\octal{Héhé}}
>> where \octal calls Lua to convert each character into \nnn, so that
>> LuaTeX actually reads (the backslash has catcode 12):
>> \pdfoutline goto name {there}{\110\351\150\351}
>> which works fine, except the number are Unicode, which the PDF
>> encoding doesn't follow exactly, so that I also need a mapping from
>> Unicode to PDF encoding, and then octal.
> If hyperref is loaded, you can use \pdfstringdef.
> Otherwise package `stringenc' might help, PDFDocEncoding
> and UTF-16 are supported. But I think I haven't
> implemented big char support (chars with character code>  255
> for LuaTeX and XeTeX) yet (it's done in hyperref). If this is
> true, then the package might still help you at some conversion
> steps.

I do not load hyperref; as for stringenc, I might have used it, except 
it doesn't work. strings more than one letter produce the non-utf8 
character error (even if they don't contain any), as in:


and if I try with just one letter, bookmarks show nothing:

\pdfoutline goto name {there}{\outlinetitle}

>> The other solution is UTF-16BE, which can't be fed directly to
>> LuaTeX (you can't use \nnn to represent bytes). I.e you can't have:
>> \pdfoutline goto name {there}{\sixteen{Héhé}}
>> where \sixteen would return a string encoded in UTF-16BE, because
>> LuaTeX will complain about non-utf-8 characters. You can do that
>> directly in Lua, though, but there's no pdf.outline primitive:
>> pdf.outline(to_utf16("Héhé"))
> You can still use \nnn, because this notation is restricted
> to plain ASCII ('\', '0', ... '9'). Of course the \nnn are
> used for bytes only.

Meaning \nnn can denote bytes? [... TeXing ... TeXing ...] Oh yes! I 
thought \nnn only denoted characters. So the solution is really simple 
and works! Thanks Heiko.


More information about the luatex mailing list