[texworks] Count Characters Script

Reinhard Kotucha reinhard.kotucha at web.de
Wed Mar 14 01:35:20 CET 2012


On 2012-03-13 at 13:52:04 -0700, Charlie Sharpsteen wrote:

 > On Tue, Mar 13, 2012 at 12:29 PM, Philip TAYLOR <P.Taylor at rhul.ac.uk> wrote:
 > 
 > >
 > >
 > > Stefan Löffler wrote:
 > >
 > >  [1] <off-topic>Is TeX Turing-complete?</off-topic>
 > >>
 > >
 > > I have always [1] believed so.
 > > Philip Taylor
 > > --------
 > > [1] Where "always" is defined as "since 1986".
 > >
 > 
 > 
 > TeX is worse than Turing-complete---most boring old Turing-complete
 > programming languages can be completely parsed and deconstructed by
 > a "dumb" syntax highlighter implemented in a few lines of regular
 > expressions. TeX is __context sensitive__ and therefore also
 > requires a __Turing-complete parser__.
 > 
 > This is why it is so difficult to deconstruct an arbitrary bit of
 > TeX compared to most programming languages, like Python. Python can
 > be parsed with a simple rule-based program while TeX has to be
 > processed by a program that could also perform Lambda calculus[1],
 > serve as a BASIC interpreter[2], or function as a control program
 > for NASA's Mars rovers[3].
 > 
 > The parser for a language like Python cannot stray outside of the
 > rules defined by the language grammer. The parser for a TeX
 > document can typeset TeX, or be re-purposed do any other computable
 > task if given the right input.
 > 
 > See the following TeX StackExchange question for proof and
 > demonstrations:
 > 
 > 
 > http://tex.stackexchange.com/questions/4201/is-there-a-bnf-grammar-of-the-tex-language
 > 
 > -Charlie
 > 
 >   [1]: http://ctan.org/pkg/lambda-lists
 >   [2]: http://ctan.org/pkg/basix
 >   [3]: http://sdh33b.blogspot.com/2008/07/icfp-contest-2008.html

On the other hand, you can't simply insert Python function calls into
a text file without worrying about some concept of escape sequences.
In XML/HTML angle brackets have a special meaning and you have to use
special sequences of characters in order to typeset angle brackets
literally.

Such a file can be parsed easily with regular expressions, of course.
But there is a significant difference between XML/HTML and TeX.  While
XML/HTML has to be parsed by an external program, TeX is its own
interpreter built-in.  This makes things more difficult, but it also
makes it extremely flexible and powerful.

If you are using LaTeX without mixing it up with plain TeX code, I
suppose that you can write a parser which is solely based on regular
expressions and an EBNF specification indeed.  \catcode is not an
official LaTeX command.  Though the EBNF spec has to be extended
whenever an external package is loaded, the same is true for XML.

As far as Lambda calculus is concerned, IMO it's not bad that it's
supported somehow.  I have the impression that Knuth was inspired by
functional programming languages when he designed the scripting
interface, and I'm convinced that this was not a mistake.

Sure, it would be much easier to have a static document with static
markup tags and a program written in a language like Python, which
transforms the content to PostScript or PDF.  This approach would be
context insensitive, the grammar could be described in EBNF, and the
lexical scanner could be set up by regular expressions.

But all these things exist already and are freely available.  We are
all aware of them, so one question remains: Why are we still using
TeX/LaTeX instead of XML?  There must be a reason.

Stefan, coming back to your question:

 >>  [1] <off-topic>Is TeX Turing-complete?</off-topic>

Yes.  Every modern programming language is Turing-complete.  But it
doesn't mean very much.  It means that particular things are doable,
it doesn't mean that you really want to do them.  Since TeX is Turing-
complete, it's possible to write macros dealing with matrices of
complex numbers and, of course, arbitrary precision.  But I doubt that
anybody wants do program this in TeX.  BTW, one of the smallest
Turing-complete programming languages is Brainfuck.  It's even quite
painful there to print "hello, world!".

In order to count characters or words, I think that LuaTeX is the best
choice nowadays.  The same could be done by external scripts too, but
nothing else is as portable as LuaTeX.  The program code could be
inserted by \input or \usepackage, characters and words could be
counted either when TeX reads its input files or when it ships out
pages, whatever approach is more appropriate, and the results could be
written to the log file.

Regards,
  Reinhard

-- 
----------------------------------------------------------------------------
Reinhard Kotucha                                      Phone: +49-511-3373112
Marschnerstr. 25
D-30167 Hannover                              mailto:reinhard.kotucha at web.de
----------------------------------------------------------------------------
Microsoft isn't the answer. Microsoft is the question, and the answer is NO.
----------------------------------------------------------------------------



More information about the texworks mailing list