[luatex] PDF strings.

Paul Isambert zappathustra at free.fr
Sun Nov 28 16:51:55 CET 2010


Hello all,

As you may know, PDF reads strings encoded in either its own scheme, or 
in UTF-16.
My problem is I want accented characters in bookmarks, e.g:

\pdfoutline goto name {there}{Héhé}

I can't feed PDF-encoded strings directly (LuaTeX would complain), but 
PDF also understands \nnn-encoded characters, where \nnn is a number in 
base-8, so I actually do

\pdfoutline goto name {there}{\octal{Héhé}}

where \octal calls Lua to convert each character into \nnn, so that 
LuaTeX actually reads (the backslash has catcode 12):

\pdfoutline goto name {there}{\110\351\150\351}

which works fine, except the number are Unicode, which the PDF encoding 
doesn't follow exactly, so that I also need a mapping from Unicode to 
PDF encoding, and then octal.

Even though \nnn is able to represent a number up to 512, only the first 
256 are understood (beyond, the encoding seems to repeat itself, i.e. 
\nnn is understood modulo 256), because PDF encoding represents only 
latin1. For my purpose (French), it's ok, but I guess some people must 
be somewhat annoyed...

The other solution is UTF-16BE, which can't be fed directly to LuaTeX 
(you can't use \nnn to represent bytes). I.e you can't have:

\pdfoutline goto name {there}{\sixteen{Héhé}}

where \sixteen would return a string encoded in UTF-16BE, because LuaTeX 
will complain about non-utf-8 characters. You can do that directly in 
Lua, though, but there's no pdf.outline primitive:

pdf.outline(to_utf16("Héhé"))

(See http://www.ntg.nl/pipermail/dev-luatex/2007-December/001175.html 
for a three-year old discussion.)
Anyway that would solve a symptom rather than the problem.

A solution is

function outline_title(str)
   tex.sprint(pdf.immediateobj("(" .. to_utf16(str) .. ")"))
end

and then

\pdfoutline attr {/Title \directlua{outline_title("Héhé")} 0 R } goto 
name {there}{}


It works, but it produces a bookmark with two /Title fields; the second 
one (the one we want) is selected, but that's no good PDF. (Anyway we 
can't use \pdfoutline as is.)


Ok, if you've read up to here and aren't too bored, congratulations, and 
then: any suggestions?
And to our dear developpers: any plan to make LuaTeX writes everthing in 
UTF-16 when necessary?

Best,
Paul


More information about the luatex mailing list