<div dir="ltr">About the lualatex speed thing, the profile for that one is:<div><br></div><div> 30.84      3.47     3.47   203674     0.00     0.00  luaV_execute<br> 10.13      4.61     1.14  3289616     0.00     0.00  longest_match<br>  4.71      5.14     0.53 126838656     0.00     0.00  luaH_get<br>  4.40      5.64     0.50 77898671     0.00     0.00  luaH_newkey<br>  4.22      6.11     0.48 138089097     0.00     0.00  luaD_precall<br>  3.78      6.54     0.43 138025730     0.00     0.00  luaD_poscall<br>  2.62      6.83     0.30 147056637     0.00     0.00  lua_tointegerx<br>  1.96      7.05     0.22 109358093     0.00     0.00  mainposition<br>  1.69      7.24     0.19  8846388     0.00     0.00  copy_node<br>  1.33      7.39     0.15  1300730     0.00     0.00  luaH_resize<br>  1.24      7.53     0.14 35276305     0.00     0.00  lua_nodelib_direct_is_char<br>  1.16      7.66     0.13 33656112     0.00     0.00  get_next<br>  1.02      7.78     0.12 59245176     0.00     0.00  luaV_equalobj<br>  0.98      7.89     0.11  1300730     0.00     0.00  auxsetnode<br>  0.89      7.99     0.10 14097849     0.00     0.00  delete_attribute_ref<br>  0.89      8.09     0.10     8526     0.00     0.00  inflate_fast<br>  0.84      8.18     0.10 152030358     0.00     0.00  lua_pushinteger<br>  0.84      8.28     0.10 33632692     0.00     0.00  lua_nodelib_direct_getnext<br>  0.80      8.37     0.09 65148440     0.00     0.00  luaO_ceillog2<br>  0.80      8.46     0.09    43053     0.00     0.00  fix_node_list<br></div><div><br></div><div>So it's almost all within the Lua VM (and thus this profile not much more useful), also I my CNF setup for the self compiled binary is a bit amiss it keeps rebuilding/reloading lmroman10-regular, but in contrast to the total runtime not so much.</div><div><br></div><div>To get a more useful info what is going I modified the input to use a Lua profiler* that uses the debug hooks.</div><div>-----<br>\directlua{<br>  profile = dofile('profile.lua')<br>  profile.start()<br>}<br>\documentclass{article}<br>\input{plipsum}<br>\begin{document}<br>\newcount\ii<br>\ii=2<br>\loop<br>  \lipsum{1-100}<br>  \advance\ii-1<br>\ifnum \ii>0<br>\repeat<br>\directlua{<br>  profile.stop()<br>  print(profile.report(20))<br>}<br>\end{document}<br></div><div>----</div><div>* <a href="https://github.com/2dengine/profile.lua">https://github.com/2dengine/profile.lua</a></div><div><br></div><div>Contrary to nowadays standard profilers it is not a sampling profiler but call tracking. Also the printout is not self times but total times.</div><div>Since the lua profiler slows down the runtime considerably, I had to reduce the document size to get reasonable runtime.</div><div><br></div><div>I made two runs with double the size the second, to differentiate which is offset and which runtime document length dependent.</div><div><br></div><div>  +-----+-------------------------------+-------------+--------------------------+------------------------------------------+ <br> | #   | Function                      | Calls       | Time                     | Code                             | <br> +-----+-------------------------------+-------------+--------------------------+------------------------------------------+ <br> | 1   | ?                             | 287         | 9.953764                 | xmf-dist/tex/latex/base/ltluatex.lua:459 | <br> | 2   | func                          | 287         | 9.9488                   | uatex/luaotfload/luaotfload-init.lua:430 | <br> | 3   | handler                       | 287         | 9.616892                 | aotfload/fontloader-2022-10-03.lua:36043 | <br> | 4   | nodepass                      | 287         | 8.861947                 | aotfload/fontloader-2022-10-03.lua:35858 | <br> | 5   | ?                             | 221         | 6.412068                 | aotfload/fontloader-2022-10-03.lua:25567 | <br> | 6   | ?                             | 1           | 2.2355                   | ex/luaotfload/luaotfload-loaders.lua:179 | <br> | 7   | ltx_read                      | 1           | 2.235285                 | ex/luaotfload/luaotfload-loaders.lua:133 | <br> | 8   | resolve                       | 1           | 2.130507                 | aotfload/fontloader-2022-10-03.lua:32156 | <br> | 9   | r                             | 1           | 2.130164                 | /luaotfload/luaotfload-resolvers.lua:246 | <br> | 10  | ?                             | 1           | 2.130115                 | x/luaotfload/luaotfload-resolvers.lua:71 | <br> | 11  | ?                             | 1           | 2.106421                 | /luaotfload/luaotfload-database.lua:1292 | <br> | 12  | update_names                  | 1           | 2.106258                 | /luaotfload/luaotfload-database.lua:3360 | <br> | 13  | handler                       | 221         | 1.068138                 | tex/luaotfload/luaotfload-notdef.lua:189 | <br> | 14  | collect_font_filenames        | 1           | 1.034945                 | /luaotfload/luaotfload-database.lua:3114 | <br> | 15  | collect_font_filenames_dir    | 1003        | 0.984343                 | /luaotfload/luaotfload-database.lua:2290 | <br> | 16  | ?                             | 200         | 0.743775                 | aotfload/fontloader-2022-10-03.lua:21016 | <br> | 17  | handler                       | 43650       | 0.70179799999995         | aotfload/fontloader-2022-10-03.lua:23197 | <br> | 18  | retrieve_namedata             | 1           | 0.667004                 | /luaotfload/luaotfload-database.lua:2463 | <br> | 19  | collect_font_filenames_texmf  | 1           | 0.663371                 | /luaotfload/luaotfload-database.lua:2351 | <br> | 20  | kernrun                       | 5612        | 0.63967499999993         | aotfload/fontloader-2022-10-03.lua:24976 | <br> +-----+-------------------------------+-------------+--------------------------+------------------------------------------+ <br> <br> +-----+-------------------------------+-------------+--------------------------+------------------------------------------+ <br> | #   | Function                      | Calls       | Time                     | Code                             | <br> +-----+-------------------------------+-------------+--------------------------+------------------------------------------+ <br> | 1   | ?                             | 575         | 18.783564                | xmf-dist/tex/latex/base/ltluatex.lua:459 | <br> | 2   | func                          | 575         | 18.773059                | uatex/luaotfload/luaotfload-init.lua:430 | <br> | 3   | handler                       | 575         | 18.119982                | aotfload/fontloader-2022-10-03.lua:36043 | <br> | 4   | nodepass                      | 575         | 16.760892                | aotfload/fontloader-2022-10-03.lua:35858 | <br> | 5   | ?                             | 443         | 12.157763                | aotfload/fontloader-2022-10-03.lua:25567 | <br> | 6   | ?                             | 1           | 2.215777                 | ex/luaotfload/luaotfload-loaders.lua:179 | <br> | 7   | ltx_read                      | 1           | 2.215572                 | ex/luaotfload/luaotfload-loaders.lua:133 | <br> | 8   | resolve                       | 1           | 2.116588                 | aotfload/fontloader-2022-10-03.lua:32156 | <br> | 9   | r                             | 1           | 2.116285                 | /luaotfload/luaotfload-resolvers.lua:246 | <br> | 10  | ?                             | 1           | 2.116237                 | x/luaotfload/luaotfload-resolvers.lua:71 | <br> | 11  | ?                             | 1           | 2.089522                 | /luaotfload/luaotfload-database.lua:1292 | <br> | 12  | update_names                  | 1           | 2.089362                 | /luaotfload/luaotfload-database.lua:3360 | <br> | 13  | handler                       | 443         | 1.969348                 | tex/luaotfload/luaotfload-notdef.lua:189 | <br> | 14  | handler                       | 87300       | 1.365021                 | aotfload/fontloader-2022-10-03.lua:23197 | <br> | 15  | ?                             | 400         | 1.335748                 | aotfload/fontloader-2022-10-03.lua:21016 | <br> | 16  | kernrun                       | 11224       | 1.1705539999999          | aotfload/fontloader-2022-10-03.lua:24976 | <br> | 17  | testrun                       | 33672       | 1.0861739999999          | aotfload/fontloader-2022-10-03.lua:25100 | <br> | 18  | ?                             | 34115       | 1.0375240000002          | tex/luaotfload/luaotfload-notdef.lua:311 | <br> | 19  | collect_font_filenames        | 1           | 0.981696                 | /luaotfload/luaotfload-database.lua:3114 | <br> | 20  | collect_font_filenames_dir    | 1003        | 0.931817                 | /luaotfload/luaotfload-database.lua:2290 | <br> +-----+-------------------------------+-------------+--------------------------+------------------------------------------+ <br> </div><div>This time the profile actually is very lopsided where almost all runtime falls into fontloader-2022-10-03.lua. Which IMO should be good news for possibilities of optimization.<br></div><div><br></div><div>And here the majority into this function:</div><div> function otf.featuresprocessor<br></div><div>but honestly I don't even understand what this function is supposed to do. Only maybe from the name, this is something that is not cacheable? The number of calls doubles with double the document size..</div><div><br></div><div>Kind regards, Axel</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jun 21, 2023 at 1:41 PM Axel Kittenberger <<a href="mailto:axkibe@gmail.com">axkibe@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jun 21, 2023 at 12:04 PM Taco Hoekwater <<a href="mailto:taco@bittext.nl" target="_blank">taco@bittext.nl</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Using dummy data doesn’t help I believe, because you then need to check the memory pointer against the dummy pointer instead of against NULL. That should not be helpful.</blockquote><div><br></div><div>Albeit the idea was to be branch less in the function that is a bottleneck, you are right, I threw something like this together (buggy, it will crash on an unknown symbol) albeit it compiled the test that on the other hand did indeed not help at all, even very slightly slower (likely due to initialization time of the dummy variables)</div><div><br></div><div>- Axel</div></div></div>
</blockquote></div>