[pdftex] hacking on tex parsing

Karl Berry karl at freefriends.org
Tue May 5 03:21:26 CEST 2020

Hi Peter,

    to make some TTS software to help me proofread my papers, 

As far as I know, general text-to-speech for LaTeX remains an open
problem. Shortly before his untimely death,Eitan Gurari was working on
this as another output mode for tex4ht, but I don't believe it's seen
any development since. What he did was targeted at Emacspeak
(https://tug.org/TUGboat/tb28-3/tb90gurari.pdf, last section).  For
convenience, I'll attach the eslatex script he mentions in case you want
to try it. Although it's still in the sources, I took it out of the
binary directories in TeX Live years ago.

A search for latex document to speech turns up
which I've never looked at. FWIW.

   but I'm hoping that I can reuse some existing tex parsing code.

There are other standalone programs, such as KaTeX, LaTeX2HTML, and
mathjax which can parse TeX (or just TeX math) to greater or lesser
extents. Perhaps something in there would be useful.

    instance, if I can grab the document after newcommand or
    DeclareMathOperator has been processed that would be very helpful. Or if

You have to redefine the macros to do so. This is what tex4ht does -- it
runs TeX, but redefines virtually everything, often at a low level,
in order to be able to intervene and generate the various output formats.

    there's some tree-like data structure that gets created 

(pdf)tex itself (and tex4ht) don't build trees. They operate token by token.

You might get more and better answers from texhax at tug.org (general
public mailing list), tex.stackexchange.com, etc. The above is just what
comes to my mind, certainly not definitive.

If you get anywhere, we'd like to publish something about it in TUGboat :).

All the best,

-------------- next part --------------
A non-text attachment was scrubbed...
Name: eslatex
Type: application/octet-stream
Size: 1646 bytes
Desc: not available
URL: <https://tug.org/pipermail/pdftex/attachments/20200504/77f598ce/attachment.obj>

More information about the pdftex mailing list.