[texhax] Low-level TeX question: string substitution macro
Toby Cubitt
tsc25 at cantab.net
Thu May 31 20:22:27 CEST 2007
Thanks to some very helpful comments from Barbara Beeton (off-list) and
to the on-list replies, I've now more or less got this working.
Since catcodes are fixed when the characters are first read (apart from
special commands like \string and \meaning), it seems there's no way to
do what I want directly. So instead, I first write the unescaped text to
a temporary file, then modify the appropriate catcodes and re-read this
temporary file, writing it out again to the final destination file. The
modified catcodes are in effect when the file is re-read, so the
characters get expanded to their escaped form when they're re-written.
The only thing holding me back from dispensing with the temporary file
is that I can't figure out how to write a newline character to an
external file. None of the following seem to work:
\write\@stream{^^M}
\write\@stream{\\}
{\lccode`|=13 \lowercase{\write\@stream{|}}}
Is there some way to write out an explicit newline? Please don't just
tell me I could do it by writing the file one line at a time. That's
what I'm doing at the moment, but it requires the temporary file. I have
to loop through the temporary file, reading a line from it and
immediately re-writing it (with escapes expanded) to the final file. I
could store the text to be written in a macro that gets added to each
iteration, and write it all out to file at the very end. But then I need
to insert the newlines manually into the macro so that they appear in
the file when it's written out, hence my question.
In answer to Donald Arseneau's comments: I realise TeX's file
input/output features aren't designed for dealing with anything other
than files containing TeX source. But the file I'm writing *is* mostly
TeX code. The sed script contains rules for replacing one sequence of
LaTeX commands with another. The LaTeX commands to be replaced aren't
known until the LaTeX source file is processed, so I *have* to write out
at least some of the information from within TeX. Given that I have to
write something from TeX, I might as well write the entire sed script
from TeX if I can.
Finally, in reply to Michael Doob: I now think that writing a Perl
script instead of sed would only make things slightly simpler. I would
still need to escape the "\" character inside Perl strings when writing
the script file from TeX, and we're back to my original problem :) By
the way, awk can also be made to escape special characters in a string
prior to using it as a computed regexp, though not in quite so simple a
way as Perl. But I seem to have it working with sed now, anyway.
Thanks for everyone's help, and I hope someone can shed similar light on
my final dilemma.
Toby
Toby Cubitt wrote:
> I'm trying to write an internal macro that does string substitution, in
> order to escape certain characters in the string before writing it to a
> file. (The package is supposed to be writing a sed script, so I need to
> escape characters that have a special meaning in regular expressions.)
>
> If this was a user-level macro to be used in the LaTeX source itself, I
> think can see how it could be done, by changing the catcodes of the
> characters to be escaped to 13 (active character), then defining these
> active characters to expand to escaped versions of themselves. (I
> suppose this would be somewhat akin to LaTeX's \verb command). The
> trouble is, this macro is to be used in a LaTeX package, and I need
> something like the following to work:
>
>
> \begingroup%
> \catcode`|=0
> |catcode`.=13 |catcode`[=13 |catcode`]=13
> |catcode`^=13 |catcode`$=13 %$
> \catcode`\\=13
> |gdef|@escapechars#1{%
> |begingroup
> |catcode`|=0
> |catcode`.=13 |catcode`[=13 |catcode`]=13
> |catcode`^=13 |catcode`$=13 %$
> |catcode`\=13
> |def\{|string\|string\}%
> |def^{|string\|string^}%
> |def${|string\|string$}%
> |def.{|string\|string.}%
> |def[{|string\|string[}%
> |def]{|string\|string]}%
> #1|endgroup%
> }
> |endgroup%
> \def\@tmpa{\foobar}
> \expandafter\@escapechars\expandafter{\@tmpa}%
>
>
> It seems I need those \catcode changes outside the macro definition, as
> well as inside, otherwise the |endgroup and |catcode changes inside the
> macro aren't recognized properly, though I don't entirely understand the
> reason behind this. In reality, the \@tmpa macro is of course defined by
> a much more complicated process than a simple \def (otherwise the whole
> exercise becomes trivial!), but the above serves to illustrate the scenario.
>
> This code is supposed to change the "\foobar" into "\\foobar", but
> instead it fails with an "Undefined control sequence \foobar" error. If
> I understand this correctly (unlikely!), the problem is that the "\" in
> "\foobar" already has catcode 0 (escape character) before it's absorbed
> by \@escapechars, so TeX expands #1 into "\foobar" with the catcodes
> already assigned, the catcode changes inside the \@escapechars macro
> have no effect, and TeX tries to interpret "\foobar" as a command
> sequence. Is this at all correct?
>
> Is there any way to do what I want? If my above analysis is correct,
> what I guess I need is a command to change the catcodes of tokens, but
> TeX's abilities in this respect seem to be limited. The \string and
> \meaning commands can only change tokens to catcode 12 (letter), and the
> \lowercase command changes charcodes rather than catcodes. Maybe there's
> a completely different way of achieving what I want?
>
> I've tried to reduce this question to its bare essentials, but if it's
> not clear what I'm trying to do, I can go into more detail.
>
> Thanks very much,
>
> Toby
>
> _______________________________________________
> TeX FAQ: http://www.tex.ac.uk/faq
> Mailing list archives: http://tug.org/pipermail/texhax/
> More links: http://tug.org/begin.html
>
> Automated subscription management: http://tug.org/mailman/listinfo/texhax
> Human mailing list managers: postmaster at tug.org
More information about the texhax
mailing list