[tex-hyphen] [lltx] [luatex] towards non-standard hyphenation support in LuaTeX

Stephan Hennig mailing_list at arcor.de
Mon Jan 27 15:33:57 CET 2014


Am 27.01.2014 13:28, schrieb Taco Hoekwater:
> On 01/26/2014 06:39 PM, Stephan Hennig wrote:
>>
>> Prospectively, I hope that LuaTeX gets means to apply more than one type
>> of patterns and custom manipulations to a node list built-in so that
>> this package renders superfluous.
> 
> Having a built-in could defeat its own purpose, depending on what it
> actually has to do. You want to keep the special lua code anyway, so
> there could be a lot of going back and forth between C and lua, which
> is not the best in terms of efficiency (put mildly).

True.  One concern is/was also speed, but having read the node access
chapter in the latest LuaTeX manual

>> When implementing this direct approach the regular index by key 
>> variant was also optimized, so direct access only makes sense when 
>> we’re accessing nodes millions of times (which happens in some
>> font processing for instance).

the node list manipulations considered here are perhaps cheap compared
to some font handling. :-)


> OTOH, I can imagine a helper function that somehow produces the words
> & associated language of a node list, and that may be helpful, I do
> not know?

Yeah, a built-in iterator over words of a certain property would be most
welcome.  I have two kinds of properties in mind:

  * any word subject to regular hyphenation

    When applying non-standard hyphenation, I think that should apply
    to words subject to regular hyphenation only.  Same for weighted
    hyphenation.  (I have a vague idea about how the latter can be
    tackled, but that needs some more discussion.  I'll write another
    mail about that.)

  * any word

    Such an iterator is needed for arbitrary glyph substitutions, e.g.,
    non-greedy ligature building or round/long-s handling.

Can't say, if there's a demand for other properties as well.

But one thing to consider for such iterators:  One might want to
consider other than "word" nodes as well during node list traversal to
control manipulations (switch it off locally).  That could be difficult
if an iterator only returns glyph and discretionary nodes.  It could
make sense to be able to iterate over words as well as non-word nodes
like this:

  for n1,n2 in words(head) do
     Operate on list head with n1 and n2 being references to
     the first and last node of the current word.
  end

  for n1,n2 in words(head, true) do
     As above, but n1 can be any type of node (an inter-word node)
     and n2 is nil in that case.
  end

Just an idea.  One use-case is to take action on certain (user_defined
or late_lua) whatsit nodes in a list.  I know that there's also
attributes to store secondary information in node lists, which perhaps
renders this proposed second syntax useless ...

Best regards,
Stephan Hennig



More information about the tex-hyphen mailing list