[lltx] pattern loading in luatex
taco at elvenkind.com
Mon Mar 7 08:34:53 CET 2011
On 03/06/2011 02:24 PM, Stephan Hennig wrote:
> Let me put it this way: In LuaTeX, the top-level data structure to
> organize the patterns of a language is still a trie, with additional
> data attached (fall-back transitions) that make the trie behave like a
> finite state automaton. Would that be wrong?
It depends on how you see the word 'trie': there is nowhere in luatex
a data structure that looks like an actual trie. The algorithm of course
uses a prefix test as the first step, but it checks for a key in a
hash instead of walking a tree. I am not too good in explaining this,
but you can look at luatexdir/lang/hyphen.w for the actual details.
> BTW, am I correctly assuming that the reason for switching from a packed
> array data structure to a hash is that the former only works as long as
> the keys (characters) are from a finite range. Which is the case for a
> 7 or 8 bit character encoding. But with Unicode characters this is
> (practically) not the case?
Mostly. A secondary reason is that the original routines converted the
trie into a 'compressed trie', after which point it became immutable.
We did not like that, it is quite handy to be able to add new \patterns
(for other languages) on the fly, and changing the existing code would
have been quite a lot of work. A+B meant that the hash-based approach
from LibHNJ was much easier to incorporate.
More information about the lualatex-dev