# [texhax] Low-level TeX question: string substitution macro

Donald Arseneau asnd at triumf.ca
Wed May 30 00:37:28 CEST 2007

Toby Cubitt <tsc25 at cantab.net> writes:

> I'm trying to write an internal macro that does string substitution, in
> order to escape certain characters in the string before writing it to a
> file. (The package is supposed to be writing a sed script, so I need to
> escape characters that have a special meaning in regular expressions.)

You do not have strings or characters in "internal" macros; you
have tokens.  Some input characters give no tokens.

In general, you can't write all characters using TeX, because it converts
to ^^ notation.  TeX only writes files for itself to reload; it is not
for general writing.

> If this was a user-level macro to be used in the LaTeX source itself, I
> think can see how it could be done, by changing the catcodes of the
> characters to be escaped to 13 (active character)

This is the *only* way to capture all input characters.

> It seems I need those \catcode changes outside the macro definition, as
> well as inside, otherwise the |endgroup and |catcode changes inside the
> macro aren't recognized properly, though I don't entirely understand the
> reason behind this.

Catcodes control how characters are read from input, and contribute tokens
to the internal workings.  Note that in "\foobar" there is only one
token, and there is certainly no backslash character:

\escapechar=44
\string\foobar

> In reality, the \@tmpa macro is of course defined by a
> much more complicated process than a simple \def

Attempt:
\write{\expandafter\strip at prefix\meaning\@tmpa}

That gives a sanitised string representation, what TeX uses for
error messages, but it is not a duplicate of the input characters.

You should really be using an external string processing language.

--
Donald Arseneau                          asnd at triumf.ca