[luatex] runtime performance of lua(la)tex incl. gprof
Axel Kittenberger
axkibe at gmail.com
Thu Jun 22 15:14:40 CEST 2023
About the lualatex speed thing, the profile for that one is:
30.84 3.47 3.47 203674 0.00 0.00 luaV_execute
10.13 4.61 1.14 3289616 0.00 0.00 longest_match
4.71 5.14 0.53 126838656 0.00 0.00 luaH_get
4.40 5.64 0.50 77898671 0.00 0.00 luaH_newkey
4.22 6.11 0.48 138089097 0.00 0.00 luaD_precall
3.78 6.54 0.43 138025730 0.00 0.00 luaD_poscall
2.62 6.83 0.30 147056637 0.00 0.00 lua_tointegerx
1.96 7.05 0.22 109358093 0.00 0.00 mainposition
1.69 7.24 0.19 8846388 0.00 0.00 copy_node
1.33 7.39 0.15 1300730 0.00 0.00 luaH_resize
1.24 7.53 0.14 35276305 0.00 0.00
lua_nodelib_direct_is_char
1.16 7.66 0.13 33656112 0.00 0.00 get_next
1.02 7.78 0.12 59245176 0.00 0.00 luaV_equalobj
0.98 7.89 0.11 1300730 0.00 0.00 auxsetnode
0.89 7.99 0.10 14097849 0.00 0.00 delete_attribute_ref
0.89 8.09 0.10 8526 0.00 0.00 inflate_fast
0.84 8.18 0.10 152030358 0.00 0.00 lua_pushinteger
0.84 8.28 0.10 33632692 0.00 0.00
lua_nodelib_direct_getnext
0.80 8.37 0.09 65148440 0.00 0.00 luaO_ceillog2
0.80 8.46 0.09 43053 0.00 0.00 fix_node_list
So it's almost all within the Lua VM (and thus this profile not much more
useful), also I my CNF setup for the self compiled binary is a bit amiss it
keeps rebuilding/reloading lmroman10-regular, but in contrast to the total
runtime not so much.
To get a more useful info what is going I modified the input to use a Lua
profiler* that uses the debug hooks.
-----
\directlua{
profile = dofile('profile.lua')
profile.start()
}
\documentclass{article}
\input{plipsum}
\begin{document}
\newcount\ii
\ii=2
\loop
\lipsum{1-100}
\advance\ii-1
\ifnum \ii>0
\repeat
\directlua{
profile.stop()
print(profile.report(20))
}
\end{document}
----
* https://github.com/2dengine/profile.lua
Contrary to nowadays standard profilers it is not a sampling profiler but
call tracking. Also the printout is not self times but total times.
Since the lua profiler slows down the runtime considerably, I had to reduce
the document size to get reasonable runtime.
I made two runs with double the size the second, to differentiate which is
offset and which runtime document length dependent.
+-----+-------------------------------+-------------+--------------------------+------------------------------------------+
| # | Function | Calls | Time
| Code |
+-----+-------------------------------+-------------+--------------------------+------------------------------------------+
| 1 | ? | 287 | 9.953764
| xmf-dist/tex/latex/base/ltluatex.lua:459 |
| 2 | func | 287 | 9.9488
| uatex/luaotfload/luaotfload-init.lua:430 |
| 3 | handler | 287 | 9.616892
| aotfload/fontloader-2022-10-03.lua:36043 |
| 4 | nodepass | 287 | 8.861947
| aotfload/fontloader-2022-10-03.lua:35858 |
| 5 | ? | 221 | 6.412068
| aotfload/fontloader-2022-10-03.lua:25567 |
| 6 | ? | 1 | 2.2355
| ex/luaotfload/luaotfload-loaders.lua:179 |
| 7 | ltx_read | 1 | 2.235285
| ex/luaotfload/luaotfload-loaders.lua:133 |
| 8 | resolve | 1 | 2.130507
| aotfload/fontloader-2022-10-03.lua:32156 |
| 9 | r | 1 | 2.130164
| /luaotfload/luaotfload-resolvers.lua:246 |
| 10 | ? | 1 | 2.130115
| x/luaotfload/luaotfload-resolvers.lua:71 |
| 11 | ? | 1 | 2.106421
| /luaotfload/luaotfload-database.lua:1292 |
| 12 | update_names | 1 | 2.106258
| /luaotfload/luaotfload-database.lua:3360 |
| 13 | handler | 221 | 1.068138
| tex/luaotfload/luaotfload-notdef.lua:189 |
| 14 | collect_font_filenames | 1 | 1.034945
| /luaotfload/luaotfload-database.lua:3114 |
| 15 | collect_font_filenames_dir | 1003 | 0.984343
| /luaotfload/luaotfload-database.lua:2290 |
| 16 | ? | 200 | 0.743775
| aotfload/fontloader-2022-10-03.lua:21016 |
| 17 | handler | 43650 | 0.70179799999995
| aotfload/fontloader-2022-10-03.lua:23197 |
| 18 | retrieve_namedata | 1 | 0.667004
| /luaotfload/luaotfload-database.lua:2463 |
| 19 | collect_font_filenames_texmf | 1 | 0.663371
| /luaotfload/luaotfload-database.lua:2351 |
| 20 | kernrun | 5612 | 0.63967499999993
| aotfload/fontloader-2022-10-03.lua:24976 |
+-----+-------------------------------+-------------+--------------------------+------------------------------------------+
+-----+-------------------------------+-------------+--------------------------+------------------------------------------+
| # | Function | Calls | Time
| Code |
+-----+-------------------------------+-------------+--------------------------+------------------------------------------+
| 1 | ? | 575 | 18.783564
| xmf-dist/tex/latex/base/ltluatex.lua:459 |
| 2 | func | 575 | 18.773059
| uatex/luaotfload/luaotfload-init.lua:430 |
| 3 | handler | 575 | 18.119982
| aotfload/fontloader-2022-10-03.lua:36043 |
| 4 | nodepass | 575 | 16.760892
| aotfload/fontloader-2022-10-03.lua:35858 |
| 5 | ? | 443 | 12.157763
| aotfload/fontloader-2022-10-03.lua:25567 |
| 6 | ? | 1 | 2.215777
| ex/luaotfload/luaotfload-loaders.lua:179 |
| 7 | ltx_read | 1 | 2.215572
| ex/luaotfload/luaotfload-loaders.lua:133 |
| 8 | resolve | 1 | 2.116588
| aotfload/fontloader-2022-10-03.lua:32156 |
| 9 | r | 1 | 2.116285
| /luaotfload/luaotfload-resolvers.lua:246 |
| 10 | ? | 1 | 2.116237
| x/luaotfload/luaotfload-resolvers.lua:71 |
| 11 | ? | 1 | 2.089522
| /luaotfload/luaotfload-database.lua:1292 |
| 12 | update_names | 1 | 2.089362
| /luaotfload/luaotfload-database.lua:3360 |
| 13 | handler | 443 | 1.969348
| tex/luaotfload/luaotfload-notdef.lua:189 |
| 14 | handler | 87300 | 1.365021
| aotfload/fontloader-2022-10-03.lua:23197 |
| 15 | ? | 400 | 1.335748
| aotfload/fontloader-2022-10-03.lua:21016 |
| 16 | kernrun | 11224 | 1.1705539999999
| aotfload/fontloader-2022-10-03.lua:24976 |
| 17 | testrun | 33672 | 1.0861739999999
| aotfload/fontloader-2022-10-03.lua:25100 |
| 18 | ? | 34115 | 1.0375240000002
| tex/luaotfload/luaotfload-notdef.lua:311 |
| 19 | collect_font_filenames | 1 | 0.981696
| /luaotfload/luaotfload-database.lua:3114 |
| 20 | collect_font_filenames_dir | 1003 | 0.931817
| /luaotfload/luaotfload-database.lua:2290 |
+-----+-------------------------------+-------------+--------------------------+------------------------------------------+
This time the profile actually is very lopsided where almost all runtime
falls into fontloader-2022-10-03.lua. Which IMO should be good news for
possibilities of optimization.
And here the majority into this function:
function otf.featuresprocessor
but honestly I don't even understand what this function is supposed to do.
Only maybe from the name, this is something that is not cacheable? The
number of calls doubles with double the document size..
Kind regards, Axel
On Wed, Jun 21, 2023 at 1:41 PM Axel Kittenberger <axkibe at gmail.com> wrote:
>
>
> On Wed, Jun 21, 2023 at 12:04 PM Taco Hoekwater <taco at bittext.nl> wrote:
>
>> Using dummy data doesn’t help I believe, because you then need to check
>> the memory pointer against the dummy pointer instead of against NULL. That
>> should not be helpful.
>
>
> Albeit the idea was to be branch less in the function that is a
> bottleneck, you are right, I threw something like this together (buggy, it
> will crash on an unknown symbol) albeit it compiled the test that on the
> other hand did indeed not help at all, even very slightly slower (likely
> due to initialization time of the dummy variables)
>
> - Axel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://tug.org/pipermail/luatex/attachments/20230622/fc905284/attachment.htm>
More information about the luatex
mailing list.