[luatex] towards non-standard hyphenation support in LuaTeX

Stephan Hennig mailing_list at arcor.de
Sun Jan 26 18:39:48 CET 2014


[ To tex-hyphen at tug.org
  CC luatex at tug.org
  CC lualatex-dev at tug.org
  Please keep tex-hyphen copied in your replies.]


Hi,

I'd like to invite you to checkout
<URL:https://github.com/sh2d/padrinoma>.  The repository contains a
package that provides support for pattern driven node list manipulations
in LuaTeX.

There are a handful of typographic features missing in TeX that involve
certain kinds of glyph replacement and in principle can be implemented
at the node list level in LuaTeX.  Examples are non-standard
hyphenation, smart ligature building or long/round s handling in
black-letter fonts.  There's two main questions that need to be answered
when doing low-level operations like this on node lists:

  a) Where to apply manipulations?

  b) What manipulations to apply?

The purpose of the padrinoma package is to give a higher-level answer to
the first question.  Given suitable patterns (that look like ordinary
Liang patterns), the package can scan a node list, apply the patterns to
the words found and return data structures that contain the results of
the pattern matching.

Some basic, illustrative examples of how that can be used can be found
in directory examples/pdnm/.  Document hyph-mark-color.tex colours all
letter pairs surrounding valid hyphenation points in a word (without
falling-back to LuaTeX's lang.hyphenate function).  Document
hyph-mark-explicit.tex is similar, but inserts a certain character at a
valid hyphenation position instead of the colouring.  Document
german-nstd-hyph.tex is a bit more ambitious and shows an attempt to
bring non-standard ck hyphenation to German users.  Example patterns are
provided.

Quick start:  To compile the sample documents, the following files

  lua-classes/cls_pdnm*.lua
  lua-modules/pdnm*.lua

have to be placed in a local TEXMF tree.  In general, mktexlsr needs to
be run afterwards.  Then move to the example documents and compile them
using LuaLaTeX.

There's not much user-level documentation, currently.  That's because
the package doesn't contain user-level code. :-)  The API of Lua modules
and classes is documented in LuaDoc format.  Documentation of examples,
on the other hand, is terse or non-existent.  If you want to play with
the example documents, look for a line

>   nlm.register_manipulation('hyph-la.pat.txt', 'pdnm_hyph-mark-explicit')

or similar.  The first argument to the function call is the name of an
ordinary pattern file (plain UTF-8 text) and the second argument is the
name of a Lua module implementing a particular kind of node list
manipulation.

Directory lua-classes/ contains basic data structures needed for the
pattern matching.  Pure TeX hackers need not care about the code there.
 Directory lua-modules/ contains modules that apply the data structures
from the former directory to LuaTeX's node lists.

What works: The example documents, see above.

What's missing: Much!

I'm announcing the package here, because it is at a point where input
and code contributions from people more firm in TeX and LaTeX internals
than me is desired.

Format:

  * A user-level TeX and LaTeX interface is missing.

  * How should a low-level LuaTeX interface look like?
    If the approach shown here proves useful, I think
    non-standard hyphenation and other application should get
    first class support similar to regular hyphenation by formats.

  * What about Babel/Polyglossia/hyph-utf8 integration?

Plain TeX:

  * Currently, only a very basic notion of a "word" is implemented.
    Basically, a word is a series of glyph, discretionary or
    user-defined whatsit nodes.  Plain TeX has a much more sophisticated
    notion of a "word subject to hyphenation".
      => see file lua-modules/pdnm_nl_iterate_words.lua

  * The language of a "word" is currently completely ignored.

  * "Words" are currently not checked against the list of hyphenation
    exceptions.

  * There's no way to locally switch off a particular manipulation for
    a single word or phrase.  This is needed for words where certain
    glyph replacements might not be desirable, e.g., names for better
    recognition.

  * And much more ...

Prospectively, I hope that LuaTeX gets means to apply more than one type
of patterns and custom manipulations to a node list built-in so that
this package renders superfluous.  (Well, if this approach works out,
Hans and Taco might as well get reluctant adding that functionality to
the core. :-)  Until then, please share your ideas and -- more important
-- your coding skills!

Happy TeXing!
Stephan Hennig


More information about the luatex mailing list