[luatex] fio library byte order

Sun Jun 28 13:52:56 CEST 2020

On 6/28/2020 3:26 AM, Reinhard Kotucha wrote:

>   > (btw, the format file used to normalize to hig endian but that was
>   > dropped long ago already: formats are no longer portable, which in fact
>   > was already dropped before that)
> 
> I don't understand.  All format files in TeX Live work on all systems.
> They have a distinct byte order and are portable among all systems
> supported by TeX Live.  What do you mean if you say "formats are no
> longer portable"?

i can only speak for luatex but indeed, there we don't juggle the bytes 
of the format file (actually, as most users use a le system it ended up 
always juggling) ... one reason is that a format can have stored 
bytecode or whatever code which can be system dependent so ...

>   > that adds passing parameters and checking them for each call
>   > ... you can then as well use lua's 'read' function and convert with
>   > string.byte/char which is then about equally fast
> 
> This is what I actually did.  It took 14 s to process a PNM file, way
> too much if I have to process hundreds or thousands files.  I ported
> the script to C and could process the file within 270 ms.  I can't
> imagine that obeying a variable in C can slow down everything so much.

how big a file ... also, i bet you do more than just reading, you don't 
define what 'process' is  (270 ms for 100K files is still not fast I guess)

> I'm not very familiar with C programming.  You say that it's expensive
> to pass arguments to a function.  What I had in mind is that functions
> obey a global variable at runtime which denotes whether byte order
> conversion is necessary or not.

passing variables in c is no issue (also because compilers are smart 
enough to deal with it)

a global variable would not work because one can read several files a 
the same time interleaved with different properties

i'm talking of picking up some optional argument passed by lua (passed 
on stack, checking needed, etc)

anyway, there's nothing wrong with writing and using a c program if that 
is more suitable esp when you need to process that many files ... 
opening closing in lua is slower than in c, as is storing all your read 
bytes in lua variables (and i'm not even talking about the fact that a 
file metatable has to be looked up and type being checked for every 
read) plus some garbage collection every now and then

as you can compile c, you can also write a dedicated library and add 
that to luatex (assuming you need to do this runtime from luatex)

(you could consider using ffi)

I downloaded the 3.7 GB texlive iso and read integers from that one

-- 360 sec : one  byte integers + counting
-- 224 sec : two  byte integers + counting
-- 166 sec : four byte integers + counting (160 no counting)

But that's a lot of lua calls.

Then I downloaded the tug logo from the website

-- string : .55 sec for 1000 times (including opening / loading)
-- file   : .67 sec for 1000 times (including opening / loading)

So, that's milliseconds per file.

Finally I processed the 3414 files in the 268M context distribution and 
read 2 byte integers from those till end of file which took 15 seconds 
for the lot. So, no complaints from my end.

I think it's not the file handling that is your bottleneck.

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------