[luatex] fio library byte order

Reinhard Kotucha reinhard.kotucha at web.de
Mon Jun 29 19:45:17 CEST 2020

On 2020-06-28 at 13:52:56 +0200, Hans Hagen wrote:

 > On 6/28/2020 3:26 AM, Reinhard Kotucha wrote:
 > >   > that adds passing parameters and checking them for each call
 > >   > ... you can then as well use lua's 'read' function and
 > >   > convert with string.byte/char which is then about equally
 > >   > fast
 > > 
 > > This is what I actually did.  It took 14 s to process a PNM file,
 > > way too much if I have to process hundreds or thousands files.  I
 > > ported the script to C and could process the file within 270 ms.
 > > I can't imagine that obeying a variable in C can slow down
 > > everything so much.
 > how big a file ... also, i bet you do more than just reading, you
 > don't define what 'process' is (270 ms for 100K files is still not
 > fast I guess)

96MiB per file.  Processing means to apply a lookup table and a 3×3
color matrix, quite inexpensive operations.  What takes most of the
time is to extract single bytes with string.sub() and to convert them
to integers.  Finally I have to convert everything back to uint16.

In C I convert to host byte order with ntohs(3) and access the color
triplets by pushing a pointer around.  In both cases I read the file
line by line (30024 bytes per line).

 > > I'm not very familiar with C programming.  You say that it's expensive
 > > to pass arguments to a function.  What I had in mind is that functions
 > > obey a global variable at runtime which denotes whether byte order
 > > conversion is necessary or not.
 > passing variables in c is no issue (also because compilers are
 > smart enough to deal with it)
 > a global variable would not work because one can read several files
 > a the same time interleaved with different properties
 > i'm talking of picking up some optional argument passed by lua
 > (passed on stack, checking needed, etc)
 > anyway, there's nothing wrong with writing and using a c program if
 > that is more suitable esp when you need to process that many files
 > ...  opening closing in lua is slower than in c, as is storing all
 > your read bytes in lua variables (and i'm not even talking about
 > the fact that a file metatable has to be looked up and type being
 > checked for every read) plus some garbage collection every now and
 > then
 > as you can compile c, you can also write a dedicated library and add 
 > that to luatex (assuming you need to do this runtime from luatex)
 > (you could consider using ffi)

Thanks for the info.  I wasn't aware that reading bytes into lua
variables is expensive too.  Maybe it's better indeed to stay with C.

 > I downloaded the 3.7 GB texlive iso and read integers from that one
 > -- 360 sec : one  byte integers + counting
 > -- 224 sec : two  byte integers + counting
 > -- 166 sec : four byte integers + counting (160 no counting)
 > But that's a lot of lua calls.

Reading 96MB as two byte integers would then take 6 seconds, much more
than I expected.

 > Then I downloaded the tug logo from the website
 > -- string : .55 sec for 1000 times (including opening / loading)
 > -- file   : .67 sec for 1000 times (including opening / loading)
 > So, that's milliseconds per file.
 > Finally I processed the 3414 files in the 268M context distribution and 
 > read 2 byte integers from those till end of file which took 15 seconds 
 > for the lot. So, no complaints from my end.

This means that file opening is quite fast:

  3700/224 = 16.518
  268/15 = 17.867

 > I think it's not the file handling that is your bottleneck.

Yes, thanks for the info.


Reinhard Kotucha                            Phone: +49-511-3373112
Marschnerstr. 25
D-30167 Hannover                    mailto:reinhard.kotucha at web.de

More information about the luatex mailing list.