[luatex] [OT] The consumption of an input string.
Paul Isambert
zappathustra at free.fr
Mon Jun 17 19:24:06 CEST 2013
luigi scarso <luigi.scarso at gmail.com> a écrit:
> On Mon, Jun 17, 2013 at 6:29 PM, Paul Isambert <zappathustra at free.fr> wrote:
>
> > luigi scarso <luigi.scarso at gmail.com> a écrit:
> > > On Mon, Jun 17, 2013 at 4:54 PM, Paul Isambert <zappathustra at free.fr>
> > wrote:
> > >
> > > > luigi scarso <luigi.scarso at gmail.com> a écrit:
> > > > > On Mon, Jun 17, 2013 at 2:43 PM, Paul Isambert <zappathustra at free.fr
> > >
> > > > wrote:
> > > > >
> > > > > > Hello all,
> > > > > >
> > > > > > This is not really a LuaTeX question, but I ask it here anyway
> > since a
> > > > > > lot of knowledgeable people read this list.
> > > > > >
> > > > > > I’ve been surprised to discover that
> > > > > >
> > > > > > print(string.gsub('abc', '.*', '(%0)'))
> > > > > >
> > > > > > returns
> > > > > >
> > > > > > (abc)()
> > > > > >
> > > > > > (similarly, “string.gmatch('abc', '.*')” returns two matches). I’d
> > > > > > expect
> > > > > >
> > > > > > (abc)
> > > > > >
> > > > > >
> > > > >
> > > > > myabe this can help
> > > > >
> > > > > > print(string.gsub("abc","%s*","(%0)"))
> > > > > ()a()b()c() 4
> > > > >
> > > > > > print(string.gsub("abc","%S*","(%0)"))
> > > > > (abc)() 2
> > > > >
> > > > > """
> > > > > A pattern item can be
> > > > >
> > > > > a single character class followed by '*', which matches 0 or more
> > > > > repetitions of characters in the class. These repetition items will
> > > > always
> > > > > match the longest possible sequence;
> > > > > """
> > > >
> > > > Thank you Luigi, but “*” has the same definition in other languages,
> > > > including those where there is no match on a final empty string.
> > > >
> > > > As for your first example, all languages behave the same as far as I
> > > > can tell, as expected.
> > > >
> > > > Best,
> > > > Paul
> > > >
> > > >
> > > $ perl -e '$x="abc"; @w=($x=~ /(.*)/g); print "tot. matches:",
> > scalar(@w),
> > > " matches:($w[0])($w[1])\n"'
> > > tot. matches:2 matches:(abc)()
> > >
> > > $ perl -e '$x="abc"; @w=($x=~ /(.*)/); print "tot. matches:",
> > scalar(@w),
> > > " matches:($w[0])($w[1])\n"'
> > > tot. matches:1 matches:(abc)()
> > >
> > > in perl
> > > "the modifier //g stands for global matching and allows the matching
> > > operator to match within a string as many times as possible"
> > > and I think it corresponds to
> > > "These repetition items will always match the longest possible sequence;"
> > > of pattern.
> > >
> > > Thanks again, Luigi... but again, that doesn’t explain away the
> > > problem. Actually, I don’t think “g” corresponds to matching the
> > > longest possible sequence (simply matching as many times as possible
> > > instead of only once),
> >
> ok, better:
Let me tell you, Luigi, I love your doggedness :)
> in Lua string.gsub match always as many time as possible, and .* is greedy.
> Together they are like g (as many time as possibile) and .* (greedy,
> default) in perl .
> So we have the same result.
Yes, but just like in Vim script or Python, which combine “.*” and “g”
too (explicitly in the case of Python) yet do not match on a final
empty string.
> The no greedy version of * is -:
> print(string.gsub("abc",".-","(%0)"))
> ()a()b()c() 4
Yet
print(string.gsub('abc', '.-$', '(%0)'))
still prints “(abc)()” (the “$” is included to make it comparable to
“.*”), whereas the equivalent
echo substitute('abc', '.\{-}$', '(\0)', 'g')
in Vim returns “(abc)”. We still have the same problem here: Lua
thinks it wise to continue searching after it has consumed the string,
and it seems to be expected in some languages but not other.
> perl -e '$x="abc"; @w=($x=~ /(.{0}?)/g); print "tot. matches:",
> scalar(@w)," matches:($w[0])($w[1])($w[2])($w[3])\n"'
> g as many time as possible
> {0}? no greedy (? is redundant)
(Note that I can’t really read Perl, so I’m not sure what you’re
showing me here.)
Best,
Paul
More information about the luatex
mailing list