[texhax] Low-level TeX question: string substitution macro

Toby Cubitt tsc25 at cantab.net
Tue May 29 15:38:39 CEST 2007


I'm trying to write an internal macro that does string substitution, in
order to escape certain characters in the string before writing it to a
file. (The package is supposed to be writing a sed script, so I need to
escape characters that have a special meaning in regular expressions.)

If this was a user-level macro to be used in the LaTeX source itself, I
think can see how it could be done, by changing the catcodes of the
characters to be escaped to 13 (active character), then defining these
active characters to expand to escaped versions of themselves. (I
suppose this would be somewhat akin to LaTeX's \verb command). The
trouble is, this macro is to be used in a LaTeX package, and I need
something like the following to work:


\begingroup%
\catcode`|=0
|catcode`.=13 |catcode`[=13 |catcode`]=13
|catcode`^=13 |catcode`$=13 %$
\catcode`\\=13
|gdef|@escapechars#1{%
   |begingroup
   |catcode`|=0
   |catcode`.=13 |catcode`[=13 |catcode`]=13
   |catcode`^=13 |catcode`$=13 %$
   |catcode`\=13
   |def\{|string\|string\}%
   |def^{|string\|string^}%
   |def${|string\|string$}%
   |def.{|string\|string.}%
   |def[{|string\|string[}%
   |def]{|string\|string]}%
   #1|endgroup%
}
|endgroup%
\def\@tmpa{\foobar}
\expandafter\@escapechars\expandafter{\@tmpa}%


It seems I need those \catcode changes outside the macro definition, as
well as inside, otherwise the |endgroup and |catcode changes inside the
macro aren't recognized properly, though I don't entirely understand the
reason behind this. In reality, the \@tmpa macro is of course defined by 
a much more complicated process than a simple \def (otherwise the whole 
exercise becomes trivial!), but the above serves to illustrate the scenario.

This code is supposed to change the "\foobar" into "\\foobar", but 
instead it fails with an "Undefined control sequence \foobar" error. If 
I understand this correctly (unlikely!), the problem is that the "\" in 
"\foobar" already has catcode 0 (escape character) before it's absorbed 
by \@escapechars, so TeX expands #1 into "\foobar" with the catcodes 
already assigned, the catcode changes inside the \@escapechars macro 
have no effect, and TeX tries to interpret "\foobar" as a command 
sequence. Is this at all correct?

Is there any way to do what I want? If my above analysis is correct,
what I guess I need is a command to change the catcodes of tokens, but
TeX's abilities in this respect seem to be limited. The \string and
\meaning commands can only change tokens to catcode 12 (letter), and the
\lowercase command changes charcodes rather than catcodes. Maybe there's 
a completely different way of achieving what I want?

I've tried to reduce this question to its bare essentials, but if it's
not clear what I'm trying to do, I can go into more detail.

Thanks very much,

Toby



More information about the texhax mailing list