[luatex] [OT] The consumption of an input string.

Paul Isambert zappathustra at free.fr
Mon Jun 17 18:29:41 CEST 2013


luigi scarso <luigi.scarso at gmail.com> a écrit:
> On Mon, Jun 17, 2013 at 4:54 PM, Paul Isambert <zappathustra at free.fr> wrote:
> 
> > luigi scarso <luigi.scarso at gmail.com> a écrit:
> > > On Mon, Jun 17, 2013 at 2:43 PM, Paul Isambert <zappathustra at free.fr>
> > wrote:
> > >
> > > > Hello all,
> > > >
> > > > This is not really a LuaTeX question, but I ask it here anyway since a
> > > > lot of knowledgeable people read this list.
> > > >
> > > > I’ve been surprised to discover that
> > > >
> > > >     print(string.gsub('abc', '.*', '(%0)'))
> > > >
> > > > returns
> > > >
> > > >     (abc)()
> > > >
> > > > (similarly, “string.gmatch('abc', '.*')” returns two matches). I’d
> > > > expect
> > > >
> > > >     (abc)
> > > >
> > > >
> > >
> > > myabe this can help
> > >
> > > > print(string.gsub("abc","%s*","(%0)"))
> > > ()a()b()c()    4
> > >
> > > > print(string.gsub("abc","%S*","(%0)"))
> > > (abc)()    2
> > >
> > > """
> > > A pattern item can be
> > >
> > > a single character class followed by '*', which matches 0 or more
> > > repetitions of characters in the class. These repetition items will
> > always
> > > match the longest possible sequence;
> > > """
> >
> > Thank you Luigi, but “*” has the same definition in other languages,
> > including those where there is no match on a final empty string.
> >
> > As for your first example, all languages behave the same as far as I
> > can tell, as expected.
> >
> > Best,
> > Paul
> >
> >
> $ perl -e '$x="abc"; @w=($x=~ /(.*)/g);  print "tot. matches:", scalar(@w),
> "  matches:($w[0])($w[1])\n"'
> tot. matches:2  matches:(abc)()
> 
> $ perl -e '$x="abc"; @w=($x=~ /(.*)/);  print "tot. matches:", scalar(@w),
> "  matches:($w[0])($w[1])\n"'
> tot. matches:1  matches:(abc)()
> 
> in perl
> "the modifier //g stands for global matching and allows the matching
> operator to match within a string as many times as possible"
> and I think it corresponds to
> "These repetition items will always match the longest possible sequence;"
> of pattern.

Thanks again, Luigi... but again, that doesn’t explain away the
problem. Actually, I don’t think “g” corresponds to matching the
longest possible sequence (simply matching as many times as possible
instead of only once), but anyway a similar “g” was included in my Vim
and sed codes; as for Python, “re.sub()” replaces several times by
default, like Lua’s “string.gsub()”. As far as I can tell, all my code
snippets were equivalent, meaning “replace X with Y as many times as
possible”; so the question really is: why do some languages seem to
consider that there is a “one more time” (the empty string) once the
input string has (apparently) been completely consumed?

Best,
Paul





More information about the luatex mailing list