[twg-tds] storing scripts in the texmf tree
Olaf Weber
olaf at infovore.xs4all.nl
Fri Feb 13 21:43:41 CET 2004
Paul Vojta writes:
> On Fri, Feb 13, 2004 at 07:51:30PM +0100, Olaf Weber wrote:
>> > - a service (on some port)
>>
>> Not implemented (yet).
>>
>> > - shared memory for subprocesses
>>
>> As currently implemented, the datastructures are not suitable for
>> this. In test versions I have been playing with datastructures that
>> are (they have to be independent of the address at which the data is
>> loaded).
> Of course, the logical next step would be a file system (kpsefs).
> :-)
:-P
One possibility I'm exploring is using a 'texmf.zip' for the texmf
tree. We have again the issue that (like building the hash table from
the ls-R file) building the index into the zip will be a fairly
expensive operation, which could conceivably benefit from things like
daemonizing or sharing.
It is also interesting to see just how large the datastructures have
become. Take the ls-R file in texmf-dist of TeX-live:
texmf-dist$ wc ls-R
53964 50188 643296 ls-R
(That's lines, words, and bytes)
To a first approximation, storing every string in the file takes
643296 bytes. Each "word" is an entry: 16 bytes/hash bucket
<hash,key,value,next>. Plus an array that indexes to the first bucket
in each chain (currently 15991, lets say 16384), 4 bytes/entry.
643296 + 16 * 50188 + 4 * 16384
= 643296 + 803008 + 65536
= 643296 868544
= 1511840
The hash table and chains require more room than the actual data, and
we're above a megabyte in total.
(Note: I store the hash in the table because I'm using a different
hash function which should result in a better 32 bit key. With a
power-of-two table size, I can mask bits to get a bucket number, and
compare hashes before I have to compare strings when looking for the
right bucket in the chain. This may be overkill, as the average chain
length is a bit over 3 in this case.)
--
Olaf Weber
(This space left blank for technical reasons.)
More information about the twg-tds
mailing list