[luatex] [OT] The consumption of an input string.

luigi scarso luigi.scarso at gmail.com
Mon Jun 17 19:05:20 CEST 2013


On Mon, Jun 17, 2013 at 6:29 PM, Paul Isambert <zappathustra at free.fr> wrote:

> luigi scarso <luigi.scarso at gmail.com> a écrit:
> > On Mon, Jun 17, 2013 at 4:54 PM, Paul Isambert <zappathustra at free.fr>
> wrote:
> >
> > > luigi scarso <luigi.scarso at gmail.com> a écrit:
> > > > On Mon, Jun 17, 2013 at 2:43 PM, Paul Isambert <zappathustra at free.fr
> >
> > > wrote:
> > > >
> > > > > Hello all,
> > > > >
> > > > > This is not really a LuaTeX question, but I ask it here anyway
> since a
> > > > > lot of knowledgeable people read this list.
> > > > >
> > > > > I’ve been surprised to discover that
> > > > >
> > > > >     print(string.gsub('abc', '.*', '(%0)'))
> > > > >
> > > > > returns
> > > > >
> > > > >     (abc)()
> > > > >
> > > > > (similarly, “string.gmatch('abc', '.*')” returns two matches). I’d
> > > > > expect
> > > > >
> > > > >     (abc)
> > > > >
> > > > >
> > > >
> > > > myabe this can help
> > > >
> > > > > print(string.gsub("abc","%s*","(%0)"))
> > > > ()a()b()c()    4
> > > >
> > > > > print(string.gsub("abc","%S*","(%0)"))
> > > > (abc)()    2
> > > >
> > > > """
> > > > A pattern item can be
> > > >
> > > > a single character class followed by '*', which matches 0 or more
> > > > repetitions of characters in the class. These repetition items will
> > > always
> > > > match the longest possible sequence;
> > > > """
> > >
> > > Thank you Luigi, but “*” has the same definition in other languages,
> > > including those where there is no match on a final empty string.
> > >
> > > As for your first example, all languages behave the same as far as I
> > > can tell, as expected.
> > >
> > > Best,
> > > Paul
> > >
> > >
> > $ perl -e '$x="abc"; @w=($x=~ /(.*)/g);  print "tot. matches:",
> scalar(@w),
> > "  matches:($w[0])($w[1])\n"'
> > tot. matches:2  matches:(abc)()
> >
> > $ perl -e '$x="abc"; @w=($x=~ /(.*)/);  print "tot. matches:",
> scalar(@w),
> > "  matches:($w[0])($w[1])\n"'
> > tot. matches:1  matches:(abc)()
> >
> > in perl
> > "the modifier //g stands for global matching and allows the matching
> > operator to match within a string as many times as possible"
> > and I think it corresponds to
> > "These repetition items will always match the longest possible sequence;"
> > of pattern.
> >
> > Thanks again, Luigi... but again, that doesn’t explain away the
> > problem. Actually, I don’t think “g” corresponds to matching the
> > longest possible sequence (simply matching as many times as possible
> > instead of only once),
>
ok, better:
in Lua string.gsub match always as many time as possible,  and .* is greedy.
Together they are  like g (as many time as possibile) and .* (greedy,
default) in perl .
So we have the same result.

The no greedy version of * is -:
print(string.gsub("abc",".-","(%0)"))
()a()b()c()    4

perl -e '$x="abc"; @w=($x=~ /(.{0}?)/g);  print "tot. matches:",
scalar(@w),"  matches:($w[0])($w[1])($w[2])($w[3])\n"'
g as many time as possible
{0}? no greedy (? is redundant)

-- 
luigi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/luatex/attachments/20130617/bba02ef3/attachment-0001.html>


More information about the luatex mailing list