[texhax] Low-level TeX question: string substitution macro

Toby Cubitt tsc25 at cantab.net
Tue Apr 21 16:57:34 CEST 2009

Hi Uwe,

Yes, I made cleveref write out a sed script. As for the sed/awk/perl
debate, I personally prefer to use the simplest tool for the job: sed if
possible, awk if sed isn't up to it, and perl if the awk solution would be
too complicated. (I've never yet gotten as far as perl :)

I completely see why one might like to make cleveref write out a TeX
script instead (the script would then need to use one of the string
substitution packages you mention). The reasoning, of course, is that if
someone is using cleveref, then they're guaranteed to have TeX installed
and working, so by far the most portable and robust solution is to
generate a TeX script to carry out the desired string substitutions.

However, implementing this would be far more work than I'm prepared to put
into a feature that is simply a work-around that allows me (and maybe
others) to submit papers to journals that accept LaTeX but don't support
cleveref (i.e. all of them :-)

If someone wanted to contribute a patch that (optionally) generated a TeX
(or awk or perl) script, instead of a sed script, I'd be happy to include
it in cleveref...


PS: I read about the Mars rover project a while back. It definitely falls
into the "foolish quest" category :)

Uwe Lück wrote:
> [Last thread entry by Toby Cubitt, 2007-05-31, copied much below]
> Hi Toby,
> so you have based cleveref.sty on sed? Traitor !-)
> There have been already datatool, stringstrings, ted, xstring for string
> substitution using TeX. Scott Pakin's perltex moreover provides
> ("exported") text processing macros that work independently of the Perl
> installation which they were created with (right?).
> I have now provided basic setup of *expandable* chains of string
> substitutions processing (TeX) files with essentially catcodes 11 and 12
> only, so you can, e.g.,
>     \immediate\write\result_file{\Transform{\InputLine}}
> in /macros/latex/contrib/nicetext. So far, I have just used it for
> replacing `...' by $\dots$, `etc. ' by `etc.\ ' etc.
> By the way, in (My Code Blog)
>     http://sdh33b.blogspot.com/2008/07/icfp-contest-2008.html
> Steve Hicks reports about controlling a mars rover using TeX. (The
> discussion then considers Metafont as well.)
> TeX forever!
>     Uwe.
> ____________________________________________________________
> This was in thread `enviroment/ifthen':
> ____________________________________________________________
> At 15:32 11.02.09, Toby Cubitt wrote:
>> Uwe Lück wrote:
>> > You might reason whether your actual task is worth such efforts. If the
>> > task arises all the time with new projects and with a large number of
>> > different strings, this may be the case. I have made such a thing,
>> yet I
>> > can't release it as it is. This works on entire files, not
>> environments.
>> If you do want to embark on this foolish quest :-), perhaps the xstring
>> package might help?
>> Even if the task arises frequently, a simple sed script (or similar)
>> will be far quicker to write, easier to maintain, more robust...and
>> generally better in almost every way. In my experience, string
>> substitution is just not a task that LaTeX is well suited to. It's not
>> difficult to integrate running your source through a sed script into
>> your LaTeX build procedure (you could even write a quick Makefile).
>> HTH,
>> Toby
> ____________________________________________________________
> Last thread entry by Toby Cubitt, 2007-05-31:
> ____________________________________________________________
> Thanks to some very helpful comments from Barbara Beeton (off-list) and
> to the on-list replies, I've now more or less got this working.
> Since catcodes are fixed when the characters are first read (apart from
> special commands like \string and \meaning), it seems there's no way to
> do what I want directly. So instead, I first write the unescaped text to
> a temporary file, then modify the appropriate catcodes and re-read this
> temporary file, writing it out again to the final destination file. The
> modified catcodes are in effect when the file is re-read, so the
> characters get expanded to their escaped form when they're re-written.
> The only thing holding me back from dispensing with the temporary file
> is that I can't figure out how to write a newline character to an
> external file. None of the following seem to work:
> \write\@stream{^^M}
> \write\@stream{\\}
> {\lccode`|=13 \lowercase{\write\@stream{|}}}
> Is there some way to write out an explicit newline? Please don't just
> tell me I could do it by writing the file one line at a time. That's
> what I'm doing at the moment, but it requires the temporary file. I have
> to loop through the temporary file, reading a line from it and
> immediately re-writing it (with escapes expanded) to the final file. I
> could store the text to be written in a macro that gets added to each
> iteration, and write it all out to file at the very end. But then I need
> to insert the newlines manually into the macro so that they appear in
> the file when it's written out, hence my question.
> In answer to Donald Arseneau's comments: I realise TeX's file
> input/output features aren't designed for dealing with anything other
> than files containing TeX source. But the file I'm writing *is* mostly
> TeX code. The sed script contains rules for replacing one sequence of
> LaTeX commands with another. The LaTeX commands to be replaced aren't
> known until the LaTeX source file is processed, so I *have* to write out
> at least some of the information from within TeX. Given that I have to
> write something from TeX, I might as well write the entire sed script
> from TeX if I can.
> Finally, in reply to Michael Doob: I now think that writing a Perl
> script instead of sed would only make things slightly simpler. I would
> still need to escape the "\" character inside Perl strings when writing
> the script file from TeX, and we're back to my original problem :) By
> the way, awk can also be made to escape special characters in a string
> prior to using it as a computed regexp, though not in quite so simple a
> way as Perl. But I seem to have it working with sed now, anyway.
> Thanks for everyone's help, and I hope someone can shed similar light on
> my final dilemma.
> Toby
> Toby Cubitt wrote:
>> I'm trying to write an internal macro that does string substitution, in
>> order to escape certain characters in the string before writing it to a
>> file. (The package is supposed to be writing a sed script, so I need to
>> escape characters that have a special meaning in regular expressions.)
>> If this was a user-level macro to be used in the LaTeX source itself, I
>> think can see how it could be done, by changing the catcodes of the
>> characters to be escaped to 13 (active character), then defining these
>> active characters to expand to escaped versions of themselves. (I
>> suppose this would be somewhat akin to LaTeX's \verb command). The
>> trouble is, this macro is to be used in a LaTeX package, and I need
>> something like the following to work:
>> \begingroup%
>> \catcode`|=0
>> |catcode`.=13 |catcode`[=13 |catcode`]=13
>> |catcode`^=13 |catcode`$=13 %$
>> \catcode`\\=13
>> |gdef|@escapechars#1{%
>>    |begingroup
>>    |catcode`|=0
>>    |catcode`.=13 |catcode`[=13 |catcode`]=13
>>    |catcode`^=13 |catcode`$=13 %$
>>    |catcode`\=13
>>    |def\{|string\|string\}%
>>    |def^{|string\|string^}%
>>    |def${|string\|string$}%
>>    |def.{|string\|string.}%
>>    |def[{|string\|string[}%
>>    |def]{|string\|string]}%
>>    #1|endgroup%
>> }
>> |endgroup%
>> \def\@tmpa{\foobar}
>> \expandafter\@escapechars\expandafter{\@tmpa}%
>> It seems I need those \catcode changes outside the macro definition, as
>> well as inside, otherwise the |endgroup and |catcode changes inside the
>> macro aren't recognized properly, though I don't entirely understand the
>> reason behind this. In reality, the \@tmpa macro is of course defined by
>> a much more complicated process than a simple \def (otherwise the whole
>> exercise becomes trivial!), but the above serves to illustrate the
> scenario.
>> This code is supposed to change the "\foobar" into "\\foobar", but
>> instead it fails with an "Undefined control sequence \foobar" error. If
>> I understand this correctly (unlikely!), the problem is that the "\" in
>> "\foobar" already has catcode 0 (escape character) before it's absorbed
>> by \@escapechars, so TeX expands #1 into "\foobar" with the catcodes
>> already assigned, the catcode changes inside the \@escapechars macro
>> have no effect, and TeX tries to interpret "\foobar" as a command
>> sequence. Is this at all correct?
>> Is there any way to do what I want? If my above analysis is correct,
>> what I guess I need is a command to change the catcodes of tokens, but
>> TeX's abilities in this respect seem to be limited. The \string and
>> \meaning commands can only change tokens to catcode 12 (letter), and the
>> \lowercase command changes charcodes rather than catcodes. Maybe there's
>> a completely different way of achieving what I want?
>> I've tried to reduce this question to its bare essentials, but if it's
>> not clear what I'm trying to do, I can go into more detail.
>> Thanks very much,
>> Toby

More information about the texhax mailing list