[luatex] runtime performance of lua(la)tex incl. gprof

Hans Hagen j.hagen at xs4all.nl
Thu Jun 22 16:08:38 CEST 2023


On 6/22/2023 3:14 PM, Axel Kittenberger wrote:
> About the lualatex speed thing, the profile for that one is:
> 
>   30.84      3.47     3.47   203674     0.00     0.00  luaV_execute
>   10.13      4.61     1.14  3289616     0.00     0.00  longest_match
>    4.71      5.14     0.53 126838656     0.00     0.00  luaH_get
>    4.40      5.64     0.50 77898671     0.00     0.00  luaH_newkey
>    4.22      6.11     0.48 138089097     0.00     0.00  luaD_precall
>    3.78      6.54     0.43 138025730     0.00     0.00  luaD_poscall
>    2.62      6.83     0.30 147056637     0.00     0.00  lua_tointegerx
>    1.96      7.05     0.22 109358093     0.00     0.00  mainposition
>    1.69      7.24     0.19  8846388     0.00     0.00  copy_node
>    1.33      7.39     0.15  1300730     0.00     0.00  luaH_resize
>    1.24      7.53     0.14 35276305     0.00     0.00
>   lua_nodelib_direct_is_char
>    1.16      7.66     0.13 33656112     0.00     0.00  get_next
>    1.02      7.78     0.12 59245176     0.00     0.00  luaV_equalobj
>    0.98      7.89     0.11  1300730     0.00     0.00  auxsetnode
>    0.89      7.99     0.10 14097849     0.00     0.00  delete_attribute_ref
>    0.89      8.09     0.10     8526     0.00     0.00  inflate_fast
>    0.84      8.18     0.10 152030358     0.00     0.00  lua_pushinteger
>    0.84      8.28     0.10 33632692     0.00     0.00
>   lua_nodelib_direct_getnext
>    0.80      8.37     0.09 65148440     0.00     0.00  luaO_ceillog2
>    0.80      8.46     0.09    43053     0.00     0.00  fix_node_list
> 
> So it's almost all within the Lua VM (and thus this profile not much more
> useful), also I my CNF setup for the self compiled binary is a bit amiss it
> keeps rebuilding/reloading lmroman10-regular, but in contrast to the total
> runtime not so much.
> 
> To get a more useful info what is going I modified the input to use a Lua
> profiler* that uses the debug hooks.
> -----
> \directlua{
>    profile = dofile('profile.lua')
>    profile.start()
> }
> \documentclass{article}
> \input{plipsum}
> \begin{document}
> \newcount\ii
> \ii=2
> \loop
>    \lipsum{1-100}
>    \advance\ii-1
> \ifnum \ii>0
> \repeat
> \directlua{
>    profile.stop()
>    print(profile.report(20))
> }
> \end{document}
> ----
> * https://github.com/2dengine/profile.lua
> 
> Contrary to nowadays standard profilers it is not a sampling profiler but
> call tracking. Also the printout is not self times but total times.
> Since the lua profiler slows down the runtime considerably, I had to reduce
> the document size to get reasonable runtime.
> 
> I made two runs with double the size the second, to differentiate which is
> offset and which runtime document length dependent.
> 
> 
> +-----+-------------------------------+-------------+--------------------------+------------------------------------------+
> 
>   | #   | Function                      | Calls       | Time
>      | Code                             |
>   +-----+-------------------------------+-------------+--------------------------+------------------------------------------+
> 
>   | 1   | ?                             | 287         | 9.953764
>      | xmf-dist/tex/latex/base/ltluatex.lua:459 |
>   | 2   | func                          | 287         | 9.9488
>      | uatex/luaotfload/luaotfload-init.lua:430 |
>   | 3   | handler                       | 287         | 9.616892
>      | aotfload/fontloader-2022-10-03.lua:36043 |
>   | 4   | nodepass                      | 287         | 8.861947
>      | aotfload/fontloader-2022-10-03.lua:35858 |
>   | 5   | ?                             | 221         | 6.412068
>      | aotfload/fontloader-2022-10-03.lua:25567 |
>   | 6   | ?                             | 1           | 2.2355
>      | ex/luaotfload/luaotfload-loaders.lua:179 |
>   | 7   | ltx_read                      | 1           | 2.235285
>      | ex/luaotfload/luaotfload-loaders.lua:133 |
>   | 8   | resolve                       | 1           | 2.130507
>      | aotfload/fontloader-2022-10-03.lua:32156 |
>   | 9   | r                             | 1           | 2.130164
>      | /luaotfload/luaotfload-resolvers.lua:246 |
>   | 10  | ?                             | 1           | 2.130115
>      | x/luaotfload/luaotfload-resolvers.lua:71 |
>   | 11  | ?                             | 1           | 2.106421
>      | /luaotfload/luaotfload-database.lua:1292 |
>   | 12  | update_names                  | 1           | 2.106258
>      | /luaotfload/luaotfload-database.lua:3360 |
>   | 13  | handler                       | 221         | 1.068138
>      | tex/luaotfload/luaotfload-notdef.lua:189 |
>   | 14  | collect_font_filenames        | 1           | 1.034945
>      | /luaotfload/luaotfload-database.lua:3114 |
>   | 15  | collect_font_filenames_dir    | 1003        | 0.984343
>      | /luaotfload/luaotfload-database.lua:2290 |
>   | 16  | ?                             | 200         | 0.743775
>      | aotfload/fontloader-2022-10-03.lua:21016 |
>   | 17  | handler                       | 43650       | 0.70179799999995
>      | aotfload/fontloader-2022-10-03.lua:23197 |
>   | 18  | retrieve_namedata             | 1           | 0.667004
>      | /luaotfload/luaotfload-database.lua:2463 |
>   | 19  | collect_font_filenames_texmf  | 1           | 0.663371
>      | /luaotfload/luaotfload-database.lua:2351 |
>   | 20  | kernrun                       | 5612        | 0.63967499999993
>      | aotfload/fontloader-2022-10-03.lua:24976 |
>   +-----+-------------------------------+-------------+--------------------------+------------------------------------------+
> 
> 
>   +-----+-------------------------------+-------------+--------------------------+------------------------------------------+
> 
>   | #   | Function                      | Calls       | Time
>      | Code                             |
>   +-----+-------------------------------+-------------+--------------------------+------------------------------------------+
> 
>   | 1   | ?                             | 575         | 18.783564
>       | xmf-dist/tex/latex/base/ltluatex.lua:459 |
>   | 2   | func                          | 575         | 18.773059
>       | uatex/luaotfload/luaotfload-init.lua:430 |
>   | 3   | handler                       | 575         | 18.119982
>       | aotfload/fontloader-2022-10-03.lua:36043 |
>   | 4   | nodepass                      | 575         | 16.760892
>       | aotfload/fontloader-2022-10-03.lua:35858 |
>   | 5   | ?                             | 443         | 12.157763
>       | aotfload/fontloader-2022-10-03.lua:25567 |
>   | 6   | ?                             | 1           | 2.215777
>      | ex/luaotfload/luaotfload-loaders.lua:179 |
>   | 7   | ltx_read                      | 1           | 2.215572
>      | ex/luaotfload/luaotfload-loaders.lua:133 |
>   | 8   | resolve                       | 1           | 2.116588
>      | aotfload/fontloader-2022-10-03.lua:32156 |
>   | 9   | r                             | 1           | 2.116285
>      | /luaotfload/luaotfload-resolvers.lua:246 |
>   | 10  | ?                             | 1           | 2.116237
>      | x/luaotfload/luaotfload-resolvers.lua:71 |
>   | 11  | ?                             | 1           | 2.089522
>      | /luaotfload/luaotfload-database.lua:1292 |
>   | 12  | update_names                  | 1           | 2.089362
>      | /luaotfload/luaotfload-database.lua:3360 |
>   | 13  | handler                       | 443         | 1.969348
>      | tex/luaotfload/luaotfload-notdef.lua:189 |
>   | 14  | handler                       | 87300       | 1.365021
>      | aotfload/fontloader-2022-10-03.lua:23197 |
>   | 15  | ?                             | 400         | 1.335748
>      | aotfload/fontloader-2022-10-03.lua:21016 |
>   | 16  | kernrun                       | 11224       | 1.1705539999999
>       | aotfload/fontloader-2022-10-03.lua:24976 |
>   | 17  | testrun                       | 33672       | 1.0861739999999
>       | aotfload/fontloader-2022-10-03.lua:25100 |
>   | 18  | ?                             | 34115       | 1.0375240000002
>       | tex/luaotfload/luaotfload-notdef.lua:311 |
>   | 19  | collect_font_filenames        | 1           | 0.981696
>      | /luaotfload/luaotfload-database.lua:3114 |
>   | 20  | collect_font_filenames_dir    | 1003        | 0.931817
>      | /luaotfload/luaotfload-database.lua:2290 |
>   +-----+-------------------------------+-------------+--------------------------+------------------------------------------+
> 
> 
> This time the profile actually is very lopsided where almost all runtime
> falls into fontloader-2022-10-03.lua. Which IMO should be good news for
> possibilities of optimization.
> 
> And here the majority into this function:
>   function otf.featuresprocessor
> but honestly I don't even understand what this function is supposed to do.
> Only maybe from the name, this is something that is not cacheable? The
> number of calls doubles with double the document size..
When I run 5000 times tufte.tex stored in a macro we get some 1110 pages 
in default context layout.

    \tufte\endgraf

This gives:

pdftex     :  4.3 seconds (8 bit fonts)
luatex     : 12.3 seconds
luametatex : 10.9 seconds
xetex      : 23.8 seconds

So, with the lua related engines we are some 2.5 slower than pdftex. 
Given that we have a 32 bit engine and use opentype fonts this is not 
that bad, also given that we have plenty more flexibility (also with 
fonts).

modern     : 10.9 seconds
dejavu     : 11.3 seconds (more feature processing)
ebgaramond : 12.5 seconds (inefficient font)
pagella    : 14.8 seconds (many kerns)
lucida     : 10.0 seconds (no kerns)

So, there is a dependency on fonts. However we're talking novels here: 
pure text, and 0.011 seconds per page is acceptable (less on a faster 
machine). (Running the context profiler on that sample takes 139.3 
seconds but that's irrelevant here I guess.)

When you process pure text you mostly measure font processing (par 
building can be neglected), hyphenation, backend pdf generation. So you 
you use little code which is probably is good for cache hits. Also, once 
a sample text is stored in a macro little work has to be done on the input.

More realistic examples will shift (at least for context) performance to 
luatex (and fwiw luametatex). I occasionaly run a test that goes from 
simple to more complex that make run luametatex faster than pdftex. So, 
don't stare too much at these numbers.

Over the last few years I managed to speed up quite some parts of the 
engine but measuring significant gain in some specific area by running 
millions of iterations becomes less impressive once the real number of 
expansions (etc) is in the tens of thousands or less. Say that 
luametatex is some 50% faster than luatex (of which we loose 10-15% on 
the more advanced and demanding backend). That is nice but inefficient 
macro code (could come from anywhere) quickly makes that gain go away 
because it touches those improved areas less than some core expansion 
stuff.

So, to come back to your 8 times slower luatex than pdftex ... if 
context on simple files is < 2.5 times slower and often less, then you 
need to look at the macros (or lua code if used) and not at the engine. 
There's little to gain there.

(You can try to run with luajiitex which has a faster virtual machine.)

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------



More information about the luatex mailing list.