[luatex] [OT] The consumption of an input string.

Dirk Laurie dirk.laurie at gmail.com
Mon Jun 17 19:26:58 CEST 2013

2013/6/17 Paul Isambert <zappathustra at free.fr>:

> This is not really a LuaTeX question, but I ask it here anyway since a
> lot of knowledgeable people read this list.
> I’ve been surprised to discover that
>     print(string.gsub('abc', '.*', '(%0)'))
> returns
>     (abc)()
> (similarly, “string.gmatch('abc', '.*')” returns two matches). I’d
> expect
>     (abc)
> since the string is completely consumed after the first match and
> there’s no reason to try matching any further. I thought it was a Lua
> quirk but then in Ruby
>     puts 'abc'.gsub(/.*/, '(\0)')
> returns the same thing. On the other hand, “(abc)” is returned as
> expected (by me) with
>     echo substitute('abc', '.*', '(\0)', 'g')
> in Vim script and
>     import re
>     print re.sub(re.compile('(.*)'), '(\\1)', 'abc')
> in Python and
>     echo "abc" | sed 's/.*/(\0)/g'
> with sed (I’m not familiar with Python and sed, so the last two codes
> are only tentative).

In my opinion this is a case of an early implementation of regular
expressions (possibly of Perl) becoming a de facto standard. Nobody
realized at the time that there is an ambiguity, and it is too late
to change now.

Perl has since spelt it out, casting in concrete the behaviour you
(and I) consider counter-intuitive) but many other languages just
leave the issue vague.

LuaTeX does it that way because Lua does it that way. There was a
discussion on this very topic on the Lua users list about a month
ago, people weighed in with arguments on both sides, and nothing
will change.

More information about the luatex mailing list