[luatex] information about ligatures

Stephan Hennig mailing_list at arcor.de
Fri Jan 3 18:07:25 CET 2014


Am 31.12.2013 09:37, schrieb Paul Isambert:
> Stephan Hennig <mailing_list at arcor.de> a écrit:
> 
> Ligatures are char nodes (id 37) with special subtype 2, and they have
> a “components” field which is a nodelist containing the ligature’s
> components.

I have already read about subtype 2 and the components field, but have
never seen a glyph node of that subtype in pre_linebreak_filter.
Instead, I can see glyph nodes of subtype 256 corresponding to standard
Unicode ligatures, e.g., 0xfb02 (fl).  That is, bit 8 is set in subtype,
which I can't find any documentation about.  For that reason, I have
never checked the 'components' field, but it is indeed there.  Thanks!

Attached is a tiny node list printer.  It hooks into
pre_linebreak_filter and prints the type and subtype of each node in a
list and some more information for glyph and disc nodes on the next
line.  Here's the beginning of the node list corresponding to the word
'flavour':

> [node] glyph        subtype: 256 next: t prev: t
> [node] +char: fl 0XFB02   components: t left:  2 right:  3 lang:   0 font:  16
> [node]   glyph        subtype:   0 next: t prev: n
> [node]   +char: f 0X66     components: n left:  2 right:  3 lang:   0 font:  16
> [node]   glyph        subtype:   0 next: n prev: t
> [node]   +char: l 0X6C     components: n left:  2 right:  3 lang:   0 font:  16
> [node] glyph        subtype: 256 next: t prev: t
> [node] +char: a 0X61     components: n left:  2 right:  3 lang:   0 font:  16
> [node] kern         subtype:   1 next: t prev: t

In fact, all top-level glyph nodes seem to be of subtype 256 in
pre_linebreak_filter.  What does that mean?  (You can find the full node
list corresponding to TeX input 'flavour specific office trick' at the
end of this mail.  With a proper font, the ck ligature is also present
there.)

Can somebody please provide TeX input that results in a glyph node with
bit 1 of subtype set?


> Note that you should also consider discretionary nodes; and
> “pre_linebreak_filter” will not catch ligatures in boxes (use
> “hpack_filter” for that).

Yeah, I am aware of that.

Happy new year!
Stephan Hennig


> This is LuaTeX, Version beta-0.76.0-2013120414 (rev 4627)  (format=lualatex 2013.12.11)  3 JAN 2014 18:05
> [...]
> [node] whatsit      subtype:   6 next: t prev: t
> [node] hlist        subtype:   3 next: t prev: t
> [node] glyph        subtype: 256 next: t prev: t
> [node] +char: fl 0XFB02   components: t lang:   0 font:  16
> [node]   glyph        subtype:   0 next: t prev: n
> [node]   +char: f 0X66     components: n lang:   0 font:  16
> [node]   glyph        subtype:   0 next: n prev: t
> [node]   +char: l 0X6C     components: n lang:   0 font:  16
> [node] glyph        subtype: 256 next: t prev: t
> [node] +char: a 0X61     components: n lang:   0 font:  16
> [node] kern         subtype:   1 next: t prev: t
> [node] glyph        subtype: 256 next: t prev: t
> [node] +char: v 0X76     components: n lang:   0 font:  16
> [node] kern         subtype:   1 next: t prev: t
> [node] glyph        subtype: 256 next: t prev: t
> [node] +char: o 0X6F     components: n lang:   0 font:  16
> [node] glyph        subtype: 256 next: t prev: t
> [node] +char: u 0X75     components: n lang:   0 font:  16
> [node] glyph        subtype: 256 next: t prev: t
> [node] +char: r 0X72     components: n lang:   0 font:  16
> [node] glue         subtype:   0 next: t prev: t
> [node] glyph        subtype: 256 next: t prev: t
> [node] +char: s 0X73     components: n lang:   0 font:  16
> [node] glyph        subtype: 256 next: t prev: t
> [node] +char: p 0X70     components: n lang:   0 font:  16
> [node] kern         subtype:   1 next: t prev: t
> [node] glyph        subtype: 256 next: t prev: t
> [node] +char: e 0X65     components: n lang:   0 font:  16
> [node] disc         subtype:   3 next: t prev: t
> [node] +pre
> [node]   glyph        subtype:   0 next: n prev: t
> [node]   +char: - 0X2D     components: n lang:   0 font:  16
> [node] +post
> [node] +replace
> [node] glyph        subtype: 256 next: t prev: t
> [node] +char: c 0X63     components: n lang:   0 font:  16
> [node] glyph        subtype: 256 next: t prev: t
> [node] +char: i 0X69     components: n lang:   0 font:  16
> [node] glyph        subtype: 256 next: t prev: t
> [node] +char: fi 0XFB01   components: t lang:   0 font:  16
> [node]   glyph        subtype:   0 next: t prev: n
> [node]   +char: f 0X66     components: n lang:   0 font:  16
> [node]   glyph        subtype:   0 next: n prev: t
> [node]   +char: i 0X69     components: n lang:   0 font:  16
> [node] glyph        subtype: 256 next: t prev: t
> [node] +char: c 0X63     components: n lang:   0 font:  16
> [node] glue         subtype:   0 next: t prev: t
> [node] glyph        subtype: 256 next: t prev: t
> [node] +char: o 0X6F     components: n lang:   0 font:  16
> [node] glyph        subtype: 256 next: t prev: t
> [node] +char: ffi 0XFB03   components: t lang:   0 font:  16
> [node]   glyph        subtype:   0 next: t prev: n
> [node]   +char: ff 0XFB00   components: t lang:   0 font:  16
> [node]     glyph        subtype:   0 next: t prev: n
> [node]     +char: f 0X66     components: n lang:   0 font:  16
> [node]     disc         subtype:   3 next: t prev: t
> [node]     +pre
> [node]       glyph        subtype:   0 next: n prev: t
> [node]       +char: - 0X2D     components: n lang:   0 font:  16
> [node]     +post
> [node]     +replace
> [node]     glyph        subtype:   0 next: n prev: t
> [node]     +char: f 0X66     components: n lang:   0 font:  16
> [node]   glyph        subtype:   0 next: n prev: t
> [node]   +char: i 0X69     components: n lang:   0 font:  16
> [node] glyph        subtype: 256 next: t prev: t
> [node] +char: c 0X63     components: n lang:   0 font:  16
> [node] glyph        subtype: 256 next: t prev: t
> [node] +char: e 0X65     components: n lang:   0 font:  16
> [node] glue         subtype:   0 next: t prev: t
> [node] glyph        subtype: 256 next: t prev: t
> [node] +char: t 0X74     components: n lang:   0 font:  16
> [node] glyph        subtype: 256 next: t prev: t
> [node] +char: r 0X72     components: n lang:   0 font:  16
> [node] glyph        subtype: 256 next: t prev: t
> [node] +char: i 0X69     components: n lang:   0 font:  16
> [node] glyph        subtype: 256 next: t prev: t
> [node] +char: c 0X63     components: n lang:   0 font:  16
> [node] kern         subtype:   1 next: t prev: t
> [node] glyph        subtype: 256 next: t prev: t
> [node] +char: k 0X6B     components: n lang:   0 font:  16
> [node] penalty      subtype:   0 next: t prev: t
> [node] glue         subtype:  15 next: n prev: t
-------------- next part --------------
local unicode = require('unicode')

local Nid = node.id
local Ntraverse = node.traverse
local Ntype = node.type
local Sformat = string.format
local Srep = string.rep
local Uchar = unicode.utf8.char

local M = {}

local err, warn, info, log = luatexbase.errwarinf('print_node')

-- Table of functions printing detailed node information.
local print_node_details
-- A string one can grep for in the log file.
local grep_prefix = '[node] '

local function print_node_list(head, indent)
   local grep_indent = grep_prefix .. Srep(' ', indent)
   -- Traverse node list.
   for n in Ntraverse(head) do
      -- Print general node information.
      texio.write(Sformat('%s%-12s subtype: %3d next: %1s prev: %1s\n', grep_indent, Ntype(n.id), n.subtype, n.next and 't' or 'n', n.prev and 't' or 'n'))
      -- Print detailed node information.
      if print_node_details[n.id] then print_node_details[n.id](n, indent) end
   end
end

print_node_details = {

   [Nid('glyph')] = function(n, indent)
      local grep_indent = grep_prefix .. Srep(' ', indent)
      texio.write(Sformat('%s+char: %s %#-8X components: %1s lang: %3d font: %3d\n', grep_indent, Uchar(n.char), n.char, n.components and 't' or 'n', n.lang, n.font))
      -- Ligature components?
      if n.components then print_node_list(n.components, indent+2) end
   end,

   [Nid('disc')] = function(n, indent)
      local grep_indent = grep_prefix .. Srep(' ', indent)
      texio.write(Sformat('%s+pre\n', grep_indent))
      print_node_list(n.pre, indent+2)
      texio.write(Sformat('%s+post\n', grep_indent))
      print_node_list(n.post, indent+2)
      texio.write(Sformat('%s+replace\n', grep_indent))
      print_node_list(n.replace, indent+2)
   end,

}

local function __cb_pre_linebreak_filter(head, groupcode)
   print_node_list(head, 0)
   return true
end

local function register_filter()
   luatexbase.add_to_callback('pre_linebreak_filter', __cb_pre_linebreak_filter, 'print_node')
end
M.register_filter = register_filter

return M
-------------- next part --------------
\listfiles
\RequirePackage{luatexbase-mcb}
\documentclass{article}
\usepackage{fontspec}
%\setmainfont{Unifraktur Maguntia}
% available at http://www.google.com/fonts
\directlua{
  local pn = require('print_node')
  pn.register_filter()
}
\begin{document}
flavour
specific
office
trick
\end{document}


More information about the luatex mailing list