# [texworks] Count Characters Script

Charlie Sharpsteen chuck at sharpsteen.net
Wed Mar 14 02:13:04 CET 2012

On Tue, Mar 13, 2012 at 5:35 PM, Reinhard Kotucha
<reinhard.kotucha at web.de>wrote:

> On the other hand, you can't simply insert Python function calls into
> a text file without worrying about some concept of escape sequences.
> In XML/HTML angle brackets have a special meaning and you have to use
> special sequences of characters in order to typeset angle brackets
> literally.
>
> Such a file can be parsed easily with regular expressions, of course.
> But there is a significant difference between XML/HTML and TeX.  While
> XML/HTML has to be parsed by an external program, TeX is its own
> interpreter built-in.  This makes things more difficult, but it also
> makes it extremely flexible and powerful.
>
> If you are using LaTeX without mixing it up with plain TeX code, I
> suppose that you can write a parser which is solely based on regular
> expressions and an EBNF specification indeed.  \catcode is not an
> official LaTeX command.  Though the EBNF spec has to be extended
> whenever an external package is loaded, the same is true for XML.
>
> As far as Lambda calculus is concerned, IMO it's not bad that it's
> supported somehow.  I have the impression that Knuth was inspired by
> functional programming languages when he designed the scripting
> interface, and I'm convinced that this was not a mistake.
>
> Sure, it would be much easier to have a static document with static
> markup tags and a program written in a language like Python, which
> transforms the content to PostScript or PDF.  This approach would be
> context insensitive, the grammar could be described in EBNF, and the
> lexical scanner could be set up by regular expressions.
>
> But all these things exist already and are freely available.  We are
> all aware of them, so one question remains: Why are we still using
> TeX/LaTeX instead of XML?  There must be a reason.

"Why are we still using TeX?" is not the question I was trying to ask---or
answer. I was simply demonstrating that faithfully deconstructing TeX is
several orders of magnitude more difficult than processing just about any
other programming or markup language and hence it is very difficult to
write a lightweight processor that can do things like count the characters
that will be produced by an arbitrary string of TeX.

-Charlie
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/texworks/attachments/20120313/42f61fb0/attachment-0001.html>