[luatex] runtime performance of lua(la)tex incl. gprof
Hans Hagen
j.hagen at xs4all.nl
Thu Jun 22 16:08:38 CEST 2023
On 6/22/2023 3:14 PM, Axel Kittenberger wrote:
> About the lualatex speed thing, the profile for that one is:
>
> 30.84 3.47 3.47 203674 0.00 0.00 luaV_execute
> 10.13 4.61 1.14 3289616 0.00 0.00 longest_match
> 4.71 5.14 0.53 126838656 0.00 0.00 luaH_get
> 4.40 5.64 0.50 77898671 0.00 0.00 luaH_newkey
> 4.22 6.11 0.48 138089097 0.00 0.00 luaD_precall
> 3.78 6.54 0.43 138025730 0.00 0.00 luaD_poscall
> 2.62 6.83 0.30 147056637 0.00 0.00 lua_tointegerx
> 1.96 7.05 0.22 109358093 0.00 0.00 mainposition
> 1.69 7.24 0.19 8846388 0.00 0.00 copy_node
> 1.33 7.39 0.15 1300730 0.00 0.00 luaH_resize
> 1.24 7.53 0.14 35276305 0.00 0.00
> lua_nodelib_direct_is_char
> 1.16 7.66 0.13 33656112 0.00 0.00 get_next
> 1.02 7.78 0.12 59245176 0.00 0.00 luaV_equalobj
> 0.98 7.89 0.11 1300730 0.00 0.00 auxsetnode
> 0.89 7.99 0.10 14097849 0.00 0.00 delete_attribute_ref
> 0.89 8.09 0.10 8526 0.00 0.00 inflate_fast
> 0.84 8.18 0.10 152030358 0.00 0.00 lua_pushinteger
> 0.84 8.28 0.10 33632692 0.00 0.00
> lua_nodelib_direct_getnext
> 0.80 8.37 0.09 65148440 0.00 0.00 luaO_ceillog2
> 0.80 8.46 0.09 43053 0.00 0.00 fix_node_list
>
> So it's almost all within the Lua VM (and thus this profile not much more
> useful), also I my CNF setup for the self compiled binary is a bit amiss it
> keeps rebuilding/reloading lmroman10-regular, but in contrast to the total
> runtime not so much.
>
> To get a more useful info what is going I modified the input to use a Lua
> profiler* that uses the debug hooks.
> -----
> \directlua{
> profile = dofile('profile.lua')
> profile.start()
> }
> \documentclass{article}
> \input{plipsum}
> \begin{document}
> \newcount\ii
> \ii=2
> \loop
> \lipsum{1-100}
> \advance\ii-1
> \ifnum \ii>0
> \repeat
> \directlua{
> profile.stop()
> print(profile.report(20))
> }
> \end{document}
> ----
> * https://github.com/2dengine/profile.lua
>
> Contrary to nowadays standard profilers it is not a sampling profiler but
> call tracking. Also the printout is not self times but total times.
> Since the lua profiler slows down the runtime considerably, I had to reduce
> the document size to get reasonable runtime.
>
> I made two runs with double the size the second, to differentiate which is
> offset and which runtime document length dependent.
>
>
> +-----+-------------------------------+-------------+--------------------------+------------------------------------------+
>
> | # | Function | Calls | Time
> | Code |
> +-----+-------------------------------+-------------+--------------------------+------------------------------------------+
>
> | 1 | ? | 287 | 9.953764
> | xmf-dist/tex/latex/base/ltluatex.lua:459 |
> | 2 | func | 287 | 9.9488
> | uatex/luaotfload/luaotfload-init.lua:430 |
> | 3 | handler | 287 | 9.616892
> | aotfload/fontloader-2022-10-03.lua:36043 |
> | 4 | nodepass | 287 | 8.861947
> | aotfload/fontloader-2022-10-03.lua:35858 |
> | 5 | ? | 221 | 6.412068
> | aotfload/fontloader-2022-10-03.lua:25567 |
> | 6 | ? | 1 | 2.2355
> | ex/luaotfload/luaotfload-loaders.lua:179 |
> | 7 | ltx_read | 1 | 2.235285
> | ex/luaotfload/luaotfload-loaders.lua:133 |
> | 8 | resolve | 1 | 2.130507
> | aotfload/fontloader-2022-10-03.lua:32156 |
> | 9 | r | 1 | 2.130164
> | /luaotfload/luaotfload-resolvers.lua:246 |
> | 10 | ? | 1 | 2.130115
> | x/luaotfload/luaotfload-resolvers.lua:71 |
> | 11 | ? | 1 | 2.106421
> | /luaotfload/luaotfload-database.lua:1292 |
> | 12 | update_names | 1 | 2.106258
> | /luaotfload/luaotfload-database.lua:3360 |
> | 13 | handler | 221 | 1.068138
> | tex/luaotfload/luaotfload-notdef.lua:189 |
> | 14 | collect_font_filenames | 1 | 1.034945
> | /luaotfload/luaotfload-database.lua:3114 |
> | 15 | collect_font_filenames_dir | 1003 | 0.984343
> | /luaotfload/luaotfload-database.lua:2290 |
> | 16 | ? | 200 | 0.743775
> | aotfload/fontloader-2022-10-03.lua:21016 |
> | 17 | handler | 43650 | 0.70179799999995
> | aotfload/fontloader-2022-10-03.lua:23197 |
> | 18 | retrieve_namedata | 1 | 0.667004
> | /luaotfload/luaotfload-database.lua:2463 |
> | 19 | collect_font_filenames_texmf | 1 | 0.663371
> | /luaotfload/luaotfload-database.lua:2351 |
> | 20 | kernrun | 5612 | 0.63967499999993
> | aotfload/fontloader-2022-10-03.lua:24976 |
> +-----+-------------------------------+-------------+--------------------------+------------------------------------------+
>
>
> +-----+-------------------------------+-------------+--------------------------+------------------------------------------+
>
> | # | Function | Calls | Time
> | Code |
> +-----+-------------------------------+-------------+--------------------------+------------------------------------------+
>
> | 1 | ? | 575 | 18.783564
> | xmf-dist/tex/latex/base/ltluatex.lua:459 |
> | 2 | func | 575 | 18.773059
> | uatex/luaotfload/luaotfload-init.lua:430 |
> | 3 | handler | 575 | 18.119982
> | aotfload/fontloader-2022-10-03.lua:36043 |
> | 4 | nodepass | 575 | 16.760892
> | aotfload/fontloader-2022-10-03.lua:35858 |
> | 5 | ? | 443 | 12.157763
> | aotfload/fontloader-2022-10-03.lua:25567 |
> | 6 | ? | 1 | 2.215777
> | ex/luaotfload/luaotfload-loaders.lua:179 |
> | 7 | ltx_read | 1 | 2.215572
> | ex/luaotfload/luaotfload-loaders.lua:133 |
> | 8 | resolve | 1 | 2.116588
> | aotfload/fontloader-2022-10-03.lua:32156 |
> | 9 | r | 1 | 2.116285
> | /luaotfload/luaotfload-resolvers.lua:246 |
> | 10 | ? | 1 | 2.116237
> | x/luaotfload/luaotfload-resolvers.lua:71 |
> | 11 | ? | 1 | 2.089522
> | /luaotfload/luaotfload-database.lua:1292 |
> | 12 | update_names | 1 | 2.089362
> | /luaotfload/luaotfload-database.lua:3360 |
> | 13 | handler | 443 | 1.969348
> | tex/luaotfload/luaotfload-notdef.lua:189 |
> | 14 | handler | 87300 | 1.365021
> | aotfload/fontloader-2022-10-03.lua:23197 |
> | 15 | ? | 400 | 1.335748
> | aotfload/fontloader-2022-10-03.lua:21016 |
> | 16 | kernrun | 11224 | 1.1705539999999
> | aotfload/fontloader-2022-10-03.lua:24976 |
> | 17 | testrun | 33672 | 1.0861739999999
> | aotfload/fontloader-2022-10-03.lua:25100 |
> | 18 | ? | 34115 | 1.0375240000002
> | tex/luaotfload/luaotfload-notdef.lua:311 |
> | 19 | collect_font_filenames | 1 | 0.981696
> | /luaotfload/luaotfload-database.lua:3114 |
> | 20 | collect_font_filenames_dir | 1003 | 0.931817
> | /luaotfload/luaotfload-database.lua:2290 |
> +-----+-------------------------------+-------------+--------------------------+------------------------------------------+
>
>
> This time the profile actually is very lopsided where almost all runtime
> falls into fontloader-2022-10-03.lua. Which IMO should be good news for
> possibilities of optimization.
>
> And here the majority into this function:
> function otf.featuresprocessor
> but honestly I don't even understand what this function is supposed to do.
> Only maybe from the name, this is something that is not cacheable? The
> number of calls doubles with double the document size..
When I run 5000 times tufte.tex stored in a macro we get some 1110 pages
in default context layout.
\tufte\endgraf
This gives:
pdftex : 4.3 seconds (8 bit fonts)
luatex : 12.3 seconds
luametatex : 10.9 seconds
xetex : 23.8 seconds
So, with the lua related engines we are some 2.5 slower than pdftex.
Given that we have a 32 bit engine and use opentype fonts this is not
that bad, also given that we have plenty more flexibility (also with
fonts).
modern : 10.9 seconds
dejavu : 11.3 seconds (more feature processing)
ebgaramond : 12.5 seconds (inefficient font)
pagella : 14.8 seconds (many kerns)
lucida : 10.0 seconds (no kerns)
So, there is a dependency on fonts. However we're talking novels here:
pure text, and 0.011 seconds per page is acceptable (less on a faster
machine). (Running the context profiler on that sample takes 139.3
seconds but that's irrelevant here I guess.)
When you process pure text you mostly measure font processing (par
building can be neglected), hyphenation, backend pdf generation. So you
you use little code which is probably is good for cache hits. Also, once
a sample text is stored in a macro little work has to be done on the input.
More realistic examples will shift (at least for context) performance to
luatex (and fwiw luametatex). I occasionaly run a test that goes from
simple to more complex that make run luametatex faster than pdftex. So,
don't stare too much at these numbers.
Over the last few years I managed to speed up quite some parts of the
engine but measuring significant gain in some specific area by running
millions of iterations becomes less impressive once the real number of
expansions (etc) is in the tens of thousands or less. Say that
luametatex is some 50% faster than luatex (of which we loose 10-15% on
the more advanced and demanding backend). That is nice but inefficient
macro code (could come from anywhere) quickly makes that gain go away
because it touches those improved areas less than some core expansion
stuff.
So, to come back to your 8 times slower luatex than pdftex ... if
context on simple files is < 2.5 times slower and often less, then you
need to look at the macros (or lua code if used) and not at the engine.
There's little to gain there.
(You can try to run with luajiitex which has a faster virtual machine.)
Hans
-----------------------------------------------------------------
Hans Hagen | PRAGMA ADE
Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
More information about the luatex
mailing list.