[l2h] extracting bag-of-words from latex
Hakan Kuecuekyilmaz
hakan at mysql.com
Wed Feb 15 00:52:55 CET 2006
On Tue, 2006-02-14 at 13:45 -0700, Hamilton Link wrote:
> Hi, I'm about to try to process a large set of LaTeX files. What I
> would like is to strip the files of equations, formatting, comments,
> etc. to produce a text file of "just the words," so to speak. As far
> as I can tell the ways of potentially doing this would be:
>
> - compile the latex to ps or pdf and then run a word extractor on that
> - run latex2rtf or latex2html and do word extraction from that
>
> Does anyone on the list know of a better way, or have any suggestions
> as to how I might proceed using latex2html as far as configurations or
> settings that might ease the process etc.?
>
You could try untex[1]?
[1] http://ftp.tu-clausthal.de/pub/mirror/ctan/support/untex
Regards, Hakan
--
Hakan Kuecuekyilmaz
More information about the latex2html
mailing list