[luatex] ActualText attribute for hyphenated words
zappathustra at free.fr
Fri Feb 3 11:20:55 CET 2012
Patrick Gundlach <patrick at gundla.ch> a écrit:
> Hello Till,
> (just for the record: this comes from a discussion on tex.sx: http://tex.stackexchange.com/q/43033/243 )
> > Is it possible/desirable to let the LuaTeX PDF generator automatically tag words which are hyphenated at the end of line with a matching /ActualText attribute (so that the sequence of glyphs "hyphen- ation", for example, is internally represented as the sequence of characters 'hyphenation')? That would make sense from a linguistic viewpoint because the display of a text in a PDF is strictly presentational and may differ from its lexical and grammatical structure. It would also ensure that you can search for and find words in a LuaTeX-generated PDF with almost any viewer.
> This might be achieved by using LuaTeX's ability to modify a node list after line breaking.
Building on this idea, see code below.
I suppose it will fail miserably in many cases, and it should be extended
to handle non-ASCII characters; also, although Acrobat, Evince and Xpdf
now all find hyphenated words (the latter two could not do that before),
they don't highlight them properly. Finally, this will work only for
those viewer which implement /ActualText, and perhaps this is not the
case with Till's previewer.
In the meanwhile, I've discovered something very nice: Acrobat for
Debian doesn't lock the document, so you can keep it open and compile
too. That was not possible under Windows!
local HBOX = node.id"hlist"
local DISC = node.id"disc"
local GLYF = node.id"glyph"
local GLUE = node.id"glue"
local KERN = node.id"kern"
local function collect (n, dir)
local text = ""
while n and
(n.id == GLYF or n.id == KERN and n.subtype == 0) do
if n.id == GLYF then
local c = string.char(n.char)
text = dir == "prev" and (c .. text) or (text .. c)
limit = n
n = n[dir]
return text, limit
for line in node.traverse_id(HBOX, head) do
local last = node.slide(line.head)
if last.id == GLUE and last.subtype == 9 then
last = last.prev
if last and last.id == DISC then
local nextline = line.next
while nextline do
if nextline.id == HBOX then
nextline = nextline.next
if nextline then
local prevnode = last.prev.prev
local prevtext, l1 = collect(last.prev.prev, "prev")
local n = nextline.head
if n.id == GLUE and n.subtype == 8 then
n = n.next
local nexttext, l2 = collect(n, "next")
local lit1, lit2 = node.new(8, 8), node.new(8, 8)
lit1.mode, lit2.mode = 2, 2
lit1.data = "/Span << /ActualText (" .. prevtext .. nexttext .. ") >> BDC"
lit2.data = "EMC"
node.insert_before(line.head, l1, lit1)
node.insert_after(nextline.head, l2, lit2)
More information about the luatex