[luatex] What are user-defined whatsit nodes?

Hans Hagen pragma at wxs.nl
Sun Nov 23 19:43:09 CET 2014


On 11/22/2014 11:25 PM, Stephan Hennig wrote:
> Am 21.11.2014 um 19:10 schrieb luigi scarso:
>> On Fri, Nov 21, 2014 at 6:31 PM, Stephan Hennig <sh-list at posteo.net> wrote:
>>
>>>
>>> To put it differently, is user-defined whatsits inhibiting ligatures a
>>> bug or intentional?
>>>
>>> intentional
>> anything non-glyph or non-disc will inhibit ligatures  building
>
> OK, thanks for the clarification!
>
> These (side-)effects of user-defined whatsits make me wonder what
> use-cases have been in mind when introducing this type of node?
> Attributes seem so much more attractive for squeezing additional
> information into a node list, because they should be handled
> transparently (I think).
>
> But never mind, I'm currently looking into ways to make TeX smarter with
> regards to ligatures.  I've recently found the selnolig package
> utilizing user-defined whatsit nodes for inhibiting selected ligatures.
>   Before asking further questions regarding ligatures I just wanted to
> get some clarification about the whatsit approach.
>
> The selnolig's whatsit approach works, but it's again not free of
> side-effects.  It gets in the way with the hyphenation algorithm in that
> the whatsit marks a word boundary causing problems with minimum
> hyphenation length calculation.  Note how setting \righthyphenmin=5 in
> the attached example prevents t-t hyphenation in the word 'butterflies.'
> (Let's ignore the fact that the fl ligature is indeed valid in this
> example.)  The problem is more serious in German with its many compound
> words.  Babel shortcuts like "|, which insert real glue if I recall
> correctly, suffer from the same problem.
>
> Any ideas how to prevent selected ligatures without causing side-effects?
>
> Best regards,
> Stephan Hennig
>
> % -*- coding: utf-8 -*-
> \directlua{
> % Declare constants.
>    local GLYPH = node.id('glyph')
>    local WHATSIT = node.id('whatsit')
>    local USER_DEFINED = node.subtype('user_defined')
>    local CHAR_f = string.byte('f')
>    local CHAR_l = string.byte('l')
>    local Ncopy = node.copy
>    local Nnew = node.new
>    local Ninsert_before = node.insert_before
>    local Ntraverse = node.traverse
> % Create user-defined whatsit.
>    local what = Nnew(WHATSIT, USER_DEFINED)
>    what.user_id = 20141117
>    what.type = 100
>    what.value = 0
> % Register callback.
>    callback.register('hyphenate',
>      function (head, tail)
> %     Iterate over node list.
>        for n in Ntraverse(head) do
>          if n.id == GLYPH and n.char == CHAR_l then
>            local p = n.prev
>            if p.id == GLYPH and p.char == CHAR_f then
>              Ninsert_before(head, n, Ncopy(what))
>            end
>          end
>        end
>        lang.hyphenate(head, tail)
>      end
>    )
> }
> \righthyphenmin=3
> \showhyphens{butterflies}
> \righthyphenmin=5
> \showhyphens{butterflies}
> \bye

Assuming some explicit control (you mention attributes) you can just 
reconstruct the original from the (nested) ligatures, like:

     local glyph_id = node.id("glyph")

     function nolig(head)
         current = head
         while current do
             local n = current.next
             if current.id == glyph_id and current[999] == 1 then
                 local c = current.components
                 if c then
                     local t = node.slide(c)
                     local x = current
                     local p = current.prev
                     if p then
                         p.next = c
                         c.prev = p
                     else
                         head = c
                     end
                     if n then
                         t.next = n
                         n.prev = t
                     end
                     x.components = nil
                     node.free(x)
                     n = c
                 end
             end
             current = n
         end
         return head
     end

     -- hook this function into the hlist handler after font
     -- handling; this is of course macro package dependent as
     -- is the 999 attribute

\def\nolig#1{\begingroup\attribute999=1\relax#1\endgroup}

e\nolig{ff}e

e\nolig{ffi}cient


Hans



-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
                                              | www.pragma-pod.nl
-----------------------------------------------------------------


More information about the luatex mailing list