[luatex] Using LuaTeX to standardize source files of papers

luigi scarso luigi.scarso at gmail.com
Fri May 3 14:46:32 CEST 2019

On Fri, May 3, 2019 at 2:32 PM Ying Zhou <yingzhou474 at gmail.com> wrote:

> Dear all,
> Sorry if this question doesn’t belong here but TeX.SE community hasn’t
> given helpful answers other than recommending de-macro and other scripts
> that often fail.
> I’m a beginning data scientist who wants to be able to get software to
> process scholarly papers. While it is possible to extract text and
> structure from DVI files, PDF files and PS files using machine learning it
> can never been 100% correct which is a fact about ML. This is why I’m
> thinking about using the tex sources of papers themselves. However custom
> macros in TeX are notoriously hard to completely remove so that the TeX
> files can be standardized without introducing inaccuracies. Is this problem
> possible to solve using LuaTex since Lua gives authors more control? Or
> shall I completely forget about standardizing TeX files in any sense and
> focus on better methods to extract information from PDF files?
1) if with "standardizing TeX files" you mean an ISO standard , yes , *in
principle* is possibile;

2) A more concrete goal is using tagged pdf. You can promote custom tags
---  read : a standard de facto xml  application  --  for your content.

It's a pity that the TeX community has no access to the pdf 2.0 ISO
standard .

