[tlbuild] Experience of building XeTeX, xdvipdfmx, bibtex8, PdfTeX, LuaTeX for WebAssembly

Vadim Kantorov vadimkantorov at gmail.com
Sat Jan 16 02:54:21 CET 2021


Hi!

I'm not a serious user of LaTeX or knowledgeable of TexLive. However,
I managed to build XeTeX, xdvipdfmx, bibtex8, PdfTeX, LuaTeX for
WebAssembly and bundle them all in a single binary file. I thought
that the problems encountered on this way may be of interest to
maintainers even if some steps have obvious errors. An extremely long
email below.

I do this in a quest to build a sort of github-synchronizable OverLeaf
clone that runs completely in-browser without a backend server.
Currently I'm hosting a debug version of it at
https://busytex.github.io. The sources are at
https://github.com/busytex/busytex. I call it busytex because it
statically bundles all TeX programs in a single executable akin to
busybox.

There were previous separate attempts (now unsupported) of WebAssembly
builds for pdftex, xetex, bibtex. As far as I know, this is the first
successful attempt to build all these programs within a single build
process. In the ideal case, WebAssembly would become a supported
architecture target, which could itself help TexLive run portably on
many WebAssembly-supporting platforms (such as node, web browsers and
many new WebAssembly-runtimes that are appearing).

The whole build process uses Emscripten and is defined in a single
large Makefile: https://github.com/busytex/busytex/blob/main/Makefile

The problems are split in a few themes:
1) build TexLive dependencies manually, can't rely fully on TexLive
build process
2) problematic autoconf checks not supported by Emscripten
3) don't know how to set hostcc for some ICU, freetype programs
4) duplicate symbols when bundling object files from different
programs (pdftex, luatex, xetex) in a single executable
5) manually set paths to custom-built freetype, fontconfig
6) wrong string offsets in xetex generated program source code

If there's anyone from maintainers interested to better core support
the WebAssembly architecture, I would be extremely happy to share more
details. I'd be equally glad if anyone wishes to review the Makefile
or the TDS construction process.

I'm using Emscripten's emconfigure, emmake, emcmake scripts to do the build.

Now, more concretely:

1. Main steps are: a) configure TexLive tree, b) build all other
dependencies, c) build freetype, d) build icu, e) build libexpat , f)
build fontconfig, g) build TeX programs

Because hacks are needed that require custom library reconfiguration
(mainly dependency-specific overrides of CFLAGS/CPPFLAGS), we can't
rely on running just the TexLive build. If fewer hacks were required,
or if TexLive allowed to do them with the main configure script, the
process would become much simpler.

Emscripten compiler is very slow, thus TexLive configure program that
touches all directories even those that don't need to be built takes
really a lot of time. So some explicit ways to disable configuration
of some directories (besides deleting them) would be good.

2. Currently upmendex autoconf checks somehow check for a working
compiler differently from all other utilities. I could not hack around
this: https://github.com/t-tk/upmendex-package/issues/1 and had to
remove upmendex from the tree. In addition to this,
ac_cv_func_getwd=no, ax_cv_c_float_words_bigendian=no ,
ac_cv_namespace_ok=yes had to be patched out from the confiure site
file. If these are not really needed, it would be nice to remove them
(while simultaneously requesting them in Emscripten repo)

3. Is there a way to provide paths to programs icupkg, pkgdata,
apinames, ctangle otangle tangle tangleboot ctangleboot tie fixwrites
makecpool splitup web2c so that they are not rebuilt for WebAssembly?
Currently to avoid reubilding them for WebAssembly, I had to implement
a compiler wrapper to that it would copy existing native binaries
instead of compiling them for WebAssembly.

4. xetex and luatex have same-named functions derived from same (or
almost same codebases). If they are exactly same, it would be good
that they generate a shared object file. If they are not same, it
would be good to allow enable some name prefix. Currently LLVM's
llvm-objcopy does not support --redefine-sym, so adding prefix to
functions via C defines is extremely tedious and error-prone. If the
bundle scenario is not actively opposed, configure switches allowing
to build binaries as libraries and changing prefixing main function
names would be very helpful.

5. For some reason, when I manually build icu, freetype, fontconfig, I
have to set paths to includes and their libraries manually via various
compiler flags. This may be a TexLive build scripts bug.

6. This is the most mysterious of all. Memory offsets used in web2c
generated programs come out very different when generated for native
or for WebAssembly. The native offsets are correct, the WebAssembly
offsets are incorrect. So I have to override the WebAssembly-generted
C files with native C files. This may also be a configuration bug (or
Emscripten bug) that derives various platform constants.

Thanks!
-- 
Vadim Kantorov


More information about the tlbuild mailing list.