[luatex] [OT] The consumption of an input string.

Paul Isambert zappathustra at free.fr
Mon Jun 17 23:42:42 CEST 2013


Dirk Laurie <dirk.laurie at gmail.com> a écrit:
> 2013/6/17 Paul Isambert <zappathustra at free.fr>:
> 
> > This is not really a LuaTeX question, but I ask it here anyway since a
> > lot of knowledgeable people read this list.
> >
> > I’ve been surprised to discover that
> >
> >     print(string.gsub('abc', '.*', '(%0)'))
> >
> > returns
> >
> >     (abc)()
> >
> > (similarly, “string.gmatch('abc', '.*')” returns two matches). I’d
> > expect
> >
> >     (abc)
> >
> > since the string is completely consumed after the first match and
> > there’s no reason to try matching any further. I thought it was a Lua
> > quirk but then in Ruby
> >
> >     puts 'abc'.gsub(/.*/, '(\0)')
> >
> > returns the same thing. On the other hand, “(abc)” is returned as
> > expected (by me) with
> >
> >     echo substitute('abc', '.*', '(\0)', 'g')
> >
> > in Vim script and
> >
> >     import re
> >     print re.sub(re.compile('(.*)'), '(\\1)', 'abc')
> >
> > in Python and
> >
> >     echo "abc" | sed 's/.*/(\0)/g'
> >
> > with sed (I’m not familiar with Python and sed, so the last two codes
> > are only tentative).
> 
> In my opinion this is a case of an early implementation of regular
> expressions (possibly of Perl) becoming a de facto standard. Nobody
> realized at the time that there is an ambiguity, and it is too late
> to change now.
> 
> Perl has since spelt it out, casting in concrete the behaviour you
> (and I) consider counter-intuitive) but many other languages just
> leave the issue vague.
> 
> LuaTeX does it that way because Lua does it that way. There was a
> discussion on this very topic on the Lua users list about a month
> ago, people weighed in with arguments on both sides, and nothing
> will change.

Thank you Dirk for the explanation. I find the whole thing terribly
counter-intuitive. The following:

    local c = 0
    for match in string.gmatch("a,b,c", "[^,]*") do
      c = c+1
      print(c, match)
    end

results in 6 matches!

For those interested in the discussion mentionned (and actually
launched) by Dirk, here it is:
http://lua-users.org/lists/lua-l/2013-04/msg00812.html

Best,
Paul



More information about the luatex mailing list