[luatex] Allowing or switching to string indexes in Lua bytecode registers

Kalrish Bäakjen kalrish.baakjen at gmail.com
Sat Sep 5 18:42:52 CEST 2015

On Sat, Sep 5, 2015 at 2:08 PM, David Carlisle <d.p.carlisle at gmail.com> wrote:
> On 4 September 2015 at 23:43, Kalrish Bäakjen <kalrish.baakjen at gmail.com> wrote:
>> Currently, all array elements of the lua.bytecode table, which must be
>> functions, are dumped as Lua bytecode to the format file by iniTeX. This
>> functionality is useful if one is dumping his own format from a split
>> preamble (to decrease compilation time) and uses custom Lua code: to avoid
>> loading it every time, one can assign consecutive slots of lua.bytecode to
>> the functions returned by loadfile, and then, in the main document, execute
>> those functions to recreate the code. More information about this can be
>> found in section 4.8.1 ("LUA bytecode registers") of the LuaTeX Reference
>> Manual[1] and in this StackExchange question[2].
>> I think this functionality is also interesting for packages because it would
>> allow for a clean way of dumping them to a format, if the user so desires.
>> Currently, packages which use Lua code, such as fontspec or polyglossia,
>> can't be "preloaded" in a format file because the Lua code that they load is
>> not saved and, thus, cannot be restored in the actual run. The same person
>> who asked [2] already considered using \everyjob (in combination with the
>> \directlua call that loads the code) to solve this issue. While that
>> solution would indeed make it possible to dump those packages, it would be a
>> trick in the sense that Lua code would be loaded every run. Since users
>> (usually) dump their own format to save time, this doesn't seem to fit.
>> However, as far as I know, this functionality is restricted to user code and
>> can't be effectively leveraged by packages, because the indexes of
>> lua.bytecode are numbers and there is no reasonable relation between numbers
>> and package names. Not even hashes of the names could be used, as far as I
>> understand, because iniLuaTeX only dumps the array part of the table (that
>> is: the consecutive non-nil slots, starting from 1).
>> Therefore, I suggest that this mechanism (the Lua bytecode registers dumping
>> logic) is either extended to allow the use of string indexes, or entirely
>> switched to them (only strings allowed as indexes). That way, packages could
>> do (Lua code follows):
> The engine does not need to change if  tex allocation macros are available for
> bytecodes as they are for other register types, as then any name/number mapping
> is available in TeX code and so will be dumped into the format.
> In the ltluatex code that is planned to be part of the next LaTeX
> release and form the basis of
> a luatexbase update, there was already such an allocator for lua
> functions, but the
> bytecode allocator was just in lua.  We have just adjusted it so that
> the allocation count
> for bytecodes is also stored in a tex count register.
> I think this addresses your use case, you can see the code for the
> core ltluatex in the latex svn
> and for a luatexbase update based on this in github.
> http://latex-project.org/svnroot/latex2e-public/trunk/base/
> and
> https://github.com/josephwright/ltluatexsupp
> Like the original luatexbase code, the ltluatex code is designed to
> run in plain TeX despite its latex roots.
> David

Thank you very much!

I use TeX Live and, unfortunately, there doesn't seem to be a way to
use SVN LaTeX here. As I think ltluatex.dtx[1] is the relevant file, I
have tried to locate it in my tree, but haven't found it. I haven't
been able to use the Lua function (luatexbase.new_bytecode, as it
appears in ltluatex.dtx[1]) either; could you please tell me which
package must I load?

In the meantime, I've been scratching my head. My idea, for packages
that use Lua code, is as follows:
- Regardless of whether or not we are on iniLuaTeX, we must get a
loader function for our Lua code through loadfile:
  local loader_function = assert( loadfile('mycode.lua') )
- If we are on iniLuaTeX, we must:
 1. ask the bytecode allocator (\newluabytecode) for a number (the
index), which we store in \myindex and which, I believe, would be
dumped in the format:
 2. save our Lua loader function, which we got by loadfile, in the
slot referenced by that index:
  \directlua{ lua.bytecode[\myindex] = loader_function }
 3. specify an "everyjob" via \everyjob that restores our Lua code by
calling the loader function stored at the bytecode slot referenced by
the TeX index:
  \everyjob\expandafter{ \the\everyjob\directlua{ lua.bytecode[\myindex]() } }
- If we are on a regular run, we just have to run the loader function
to actually load our Lua code and be able to use it:
  \directlua{ loader_function() }

This logic, however, doesn't seem to play well with Lua's require
(and, by extension, luatexbase-modutils' require_module, which uses it
under the hood) function[2], because it loads the Lua code internally;
packages doing \dofile{ require('mycode.lua') } have no way of
accessing the loader function, which is what needs to be stored in
bytecode registers. Perhaps a custom searcher (see [3]) could be
introduced. This searcher would be based on the default second
searcher. First, it would load the code from the resolved path:
  -- resolved_path is the path of the file containing the Lua code;
this path would be obtained with package.searchpath
  local loader_function = assert( loadfile(resolved_path) )
This special searcher would have to know if it's on a dumping session
or a normal run.
If on a normal run, it would just execute that loader function to
actually load the code and be done with it:
If on iniLuaTeX, it would get a bytecode register index and save the
loader function in the corresponding slot. I haven't yet figured out,
however, how would this special searcher of mine keep track of the
bytecode register index that it had been assigned by the allocation
manager. Perhaps a LuaTeX attribute? Like this:
  -- Get an index by the bytecode register allocator
  local register_index = luatexbase.new_bytecode(resolved_path)
  -- Store it in a LuaTeX attribute named after the resolved path of
the Lua file we loaded with loadfile. These attributes are dumped, I
  lua.attribute[resolved_path] = register_index
  -- Finally, spit out an "everyjob" to the TeX engine. This
"everyjob" is supposed to execute the loader function later, during
the normal run. We use the resolved path as the name of our attribute
  tex.sprint( [[\everyjob\expandafter{ \the\everyjob\directlua{
lua.bytecode[\csname ]] , resolved_path , [[\endcsname]() } }]] )

This may be getting too awful, but I think that the possibility that
Lua-using packages are dumped (along with their Lua code) is important
for the future, as Lua code can be noticeably slower to process than
"good old" TeX code.

1: http://latex-project.org/svnroot/latex2e-public/trunk/base/ltluatex.dtx
2: http://www.lua.org/manual/5.3/manual.html#pdf-require
3: http://www.lua.org/manual/5.3/manual.html#pdf-package.searchers

More information about the luatex mailing list