Idea: Git as basis for future CTAN and TeX Live. (Discuss here or at tomorrow's TeX Hour)
jfine2358 at gmail.com
Wed Jun 23 18:57:29 CEST 2021
As well as being a version control system, Git is a distributed
peer-to-peer content addressable store. It's also efficient in its use of
network bandwidth and mass storage. And it uses multiple cores when
possible, so it's also quick. And it is, of course, widely used.
All this makes git a good foundation for rethinking CTAN and TeX Live. This
post explores this idea. We focus on git's use of PACKFILES to do
peer-to-peer file sharing.
When you clone a repository, the repository being cloned creates a single
git pack file (and associated index file perhaps), which is then sent to
the newly created local repository. From this, if required, the working
files are created.
If you do a pull from a source, the same process takes place, except that
the two repositories first do some negotiation to determine what should be
sent. And then as before a pack file is sent. And a push is similar.
(Actually, in both cases, it might be several pack files.) Rsync, used by
CTAN, also does peer-to-peer negotiation.
Here's an example a git pull
$ git clone git at github.com:jgm/pandoc.git
Cloning into 'pandoc'...
$ ls -l pandoc/.git/objects/pack/
-r--r--r-- 1 jfine jfine 2.8M Jun 23 17:12 pack-53640....idx
-r--r--r-- 1 jfine jfine 50M Jun 23 17:12 pack-53640....pack
And now I've got every version of every file in the history of pandoc (up
to the commit I pulled). That's not bad for 50M. (The index can be computed
from the pack. It speeds disc access.)
For GitLab the size limit is 10GB per repository. For GitHub the size limit
is about 5GB. Norbert Preining's git-svn mirror of TeX Live is about 40GB.
Let me end with a question. It's related to hosting TeX Live on GitHub and
First, consider all files in any version of TeX Live that are used by any
subscriber to this list as inputs to TeX or any of its associated programs.
(This definition is crafted to exclude documentation files. And files not
in TeX Live. It's the files in TeX Live that TeX or whatever inputs when
Now for the question. Put all these files in a git pack file. How big will
that pack file be? Perhaps powers of 2 is the way to ask this. In other
words, at most 250M? At most 500M? At most 1G? At most 2G? At most 4G? At
most 8G? At most 16G? At most 32G? At most 64G? [Stop here because
Norbert's git-svn mirror provides 40G a bound.]
If we're at most 5GB then we can use both GitHub and GitLab to host these
files. And the TeX Collection / TeX Live could store this material as git
pack files. This would make the DVD a
https://en.wikipedia.org/wiki/Sneakernet for some TeX-related git
Still here? Well done. I'll be discussing this, read-only file systems,
immutable OSes and related methods at tomorrow evening's TeX Hour.
When and where. Thursday 17 June, 6.30 to 7.30pm UK time. The UK time now
is at https://time.is/UK. The zoom details are
Meeting ID: 785 5125 5396
For the keen: READ-ONLY FILE SYSTEMS
For the very keen: IMMUTABLE OSes
Finally, video from last week's TeX Hour is available at
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the tex-live