[luatex] Using LuaTeX to standardize source files of papers

Dirk Laurie dirk.laurie at gmail.com
Fri May 3 16:24:59 CEST 2019


Op Vr. 3 Mei 2019 om 14:32 het Ying Zhou <yingzhou474 at gmail.com> geskryf:

> I’m a beginning data scientist who wants to be able to get software to process scholarly papers. While it is possible to extract text and structure from DVI files, PDF files and PS files using machine learning it can never been 100% correct which is a fact about ML. This is why I’m thinking about using the tex sources of papers themselves. However custom macros in TeX are notoriously hard to completely remove so that the TeX files can be standardized without introducing inaccuracies. Is this problem possible to solve using LuaTex since Lua gives authors more control? Or shall I completely forget about standardizing TeX files in any sense and focus on better methods to extract information from PDF files?

If you invoke LuaTeX as "texlua", it expects a Lua script as input.
You can therefore do in LuaTeX anything you could do in Lua, including
execution of programs, and it's more convenient than standard Lua,
since LuaTeX comes with plenty of "batteries" in the form of useful
preloaded modules. It could, for example, make twenty PDF's from
various inputs and combine selected pages of them into one document
using pdftk.

An example is the TeX add-on package "musixtex", which I use
regularly. It produces a PDF file of sheet music, which may require
running up to ten stages (one of which is itself a Lua script run by
texlua) with the aid of a 600-line Lua script that automates the whole
process. At least two (more if you use LaTeX and make an index) of the
seven stages involve running TeX, which may, but need not, be LuaTeX.

The world's your oyster.

-- Dirk



More information about the luatex mailing list