[texhax] Low-level TeX question: string substitution macro

Uwe Lück uwe.lueck at web.de
Thu May 31 23:41:58 CEST 2007

Sorry that I don't read the whole thread.
(A local philosopher here in Munich has said that you must decide
between being a reading or a writing philosopher.)

I have started working on a package or bundle on this matter
months ago, result attached.

I want to extend this to make up for my difficulties with Perl and AWK,
also to have an easily portable substitute for them.
The basic idea is the philosophy of docstrip, the goal to generalize
the latter very much. I thought of making a CTAN subdirectory "parse"
too which many could contribute special matching macros.
What I want to have first are macros for extracting version informations
from my own packages foo.sty without \usepackage{foo} etc.
Especially, look for \def\filedate and \ProvidesPackage and check
whether the date is the same as the latest date in the versions history.
Using, alas, the platform's files listing (dir; ls), check whether the
date of the most recent "physical" change of the file is documented
inside the file. I also have numerous versions of certain packages of my own
on numerous computers in numerous directories of each and want
to see where the most recent versions are.

Unfortunately, I became too busy with the German Wikipedia in recent months.



At 20:22 31.05.07, Toby Cubitt wrote:

>Thanks to some very helpful comments from Barbara Beeton (off-list) and
>to the on-list replies, I've now more or less got this working.
>Since catcodes are fixed when the characters are first read (apart from
>special commands like \string and \meaning), it seems there's no way to
>do what I want directly. So instead, I first write the unescaped text to
>a temporary file, then modify the appropriate catcodes and re-read this
>temporary file, writing it out again to the final destination file. The
>modified catcodes are in effect when the file is re-read, so the
>characters get expanded to their escaped form when they're re-written.
>The only thing holding me back from dispensing with the temporary file
>is that I can't figure out how to write a newline character to an
>external file. None of the following seem to work:
>{\lccode`|=13 \lowercase{\write\@stream{|}}}
>Is there some way to write out an explicit newline? Please don't just
>tell me I could do it by writing the file one line at a time. That's
>what I'm doing at the moment, but it requires the temporary file. I have
>to loop through the temporary file, reading a line from it and
>immediately re-writing it (with escapes expanded) to the final file. I
>could store the text to be written in a macro that gets added to each
>iteration, and write it all out to file at the very end. But then I need
>to insert the newlines manually into the macro so that they appear in
>the file when it's written out, hence my question.
>In answer to Donald Arseneau's comments: I realise TeX's file
>input/output features aren't designed for dealing with anything other
>than files containing TeX source. But the file I'm writing *is* mostly
>TeX code. The sed script contains rules for replacing one sequence of
>LaTeX commands with another. The LaTeX commands to be replaced aren't
>known until the LaTeX source file is processed, so I *have* to write out
>at least some of the information from within TeX. Given that I have to
>write something from TeX, I might as well write the entire sed script
>from TeX if I can.
>Finally, in reply to Michael Doob: I now think that writing a Perl
>script instead of sed would only make things slightly simpler. I would
>still need to escape the "\" character inside Perl strings when writing
>the script file from TeX, and we're back to my original problem :) By
>the way, awk can also be made to escape special characters in a string
>prior to using it as a computed regexp, though not in quite so simple a
>way as Perl. But I seem to have it working with sed now, anyway.
>Thanks for everyone's help, and I hope someone can shed similar light on
>my final dilemma.
>Toby Cubitt wrote:
> > I'm trying to write an internal macro that does string substitution, in
> > order to escape certain characters in the string before writing it to a
> > file. (The package is supposed to be writing a sed script, so I need to
> > escape characters that have a special meaning in regular expressions.)
> >
> > If this was a user-level macro to be used in the LaTeX source itself, I
> > think can see how it could be done, by changing the catcodes of the
> > characters to be escaped to 13 (active character), then defining these
> > active characters to expand to escaped versions of themselves. (I
> > suppose this would be somewhat akin to LaTeX's \verb command). The
> > trouble is, this macro is to be used in a LaTeX package, and I need
> > something like the following to work:
> >
> >
> > \begingroup%
> > \catcode`|=0
> > |catcode`.=13 |catcode`[=13 |catcode`]=13
> > |catcode`^=13 |catcode`$=13 %$
> > \catcode`\\=13
> > |gdef|@escapechars#1{%
> >    |begingroup
> >    |catcode`|=0
> >    |catcode`.=13 |catcode`[=13 |catcode`]=13
> >    |catcode`^=13 |catcode`$=13 %$
> >    |catcode`\=13
> >    |def\{|string\|string\}%
> >    |def^{|string\|string^}%
> >    |def${|string\|string$}%
> >    |def.{|string\|string.}%
> >    |def[{|string\|string[}%
> >    |def]{|string\|string]}%
> >    #1|endgroup%
> > }
> > |endgroup%
> > \def\@tmpa{\foobar}
> > \expandafter\@escapechars\expandafter{\@tmpa}%
> >
> >
> > It seems I need those \catcode changes outside the macro definition, as
> > well as inside, otherwise the |endgroup and |catcode changes inside the
> > macro aren't recognized properly, though I don't entirely understand the
> > reason behind this. In reality, the \@tmpa macro is of course defined by
> > a much more complicated process than a simple \def (otherwise the whole
> > exercise becomes trivial!), but the above serves to illustrate the 
> scenario.
> >
> > This code is supposed to change the "\foobar" into "\\foobar", but
> > instead it fails with an "Undefined control sequence \foobar" error. If
> > I understand this correctly (unlikely!), the problem is that the "\" in
> > "\foobar" already has catcode 0 (escape character) before it's absorbed
> > by \@escapechars, so TeX expands #1 into "\foobar" with the catcodes
> > already assigned, the catcode changes inside the \@escapechars macro
> > have no effect, and TeX tries to interpret "\foobar" as a command
> > sequence. Is this at all correct?
> >
> > Is there any way to do what I want? If my above analysis is correct,
> > what I guess I need is a command to change the catcodes of tokens, but
> > TeX's abilities in this respect seem to be limited. The \string and
> > \meaning commands can only change tokens to catcode 12 (letter), and the
> > \lowercase command changes charcodes rather than catcodes. Maybe there's
> > a completely different way of achieving what I want?
> >
> > I've tried to reduce this question to its bare essentials, but if it's
> > not clear what I'm trying to do, I can go into more detail.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PARSE.zip
Type: application/zip
Size: 6348 bytes
Desc: not available
Url : http://tug.org/pipermail/texhax/attachments/20070531/37ce8a5b/attachment.zip 

More information about the texhax mailing list