[luatex] Using LuaTeX to standardize source files of papers

Ying Zhou yingzhou474 at gmail.com
Fri May 3 05:12:49 CEST 2019

Dear all,

Sorry if this question doesn’t belong here but TeX.SE community hasn’t given helpful answers other than recommending de-macro and other scripts that often fail.

I’m a beginning data scientist who wants to be able to get software to process scholarly papers. While it is possible to extract text and structure from DVI files, PDF files and PS files using machine learning it can never been 100% correct which is a fact about ML. This is why I’m thinking about using the tex sources of papers themselves. However custom macros in TeX are notoriously hard to completely remove so that the TeX files can be standardized without introducing inaccuracies. Is this problem possible to solve using LuaTex since Lua gives authors more control? Or shall I completely forget about standardizing TeX files in any sense and focus on better methods to extract information from PDF files?


Ying Zhou

