[luatex] [OT] The consumption of an input string.

Paul Isambert zappathustra at free.fr
Mon Jun 17 19:24:06 CEST 2013


luigi scarso <luigi.scarso at gmail.com> a écrit:
> On Mon, Jun 17, 2013 at 6:29 PM, Paul Isambert <zappathustra at free.fr> wrote:
> 
> > luigi scarso <luigi.scarso at gmail.com> a écrit:
> > > On Mon, Jun 17, 2013 at 4:54 PM, Paul Isambert <zappathustra at free.fr>
> > wrote:
> > >
> > > > luigi scarso <luigi.scarso at gmail.com> a écrit:
> > > > > On Mon, Jun 17, 2013 at 2:43 PM, Paul Isambert <zappathustra at free.fr
> > >
> > > > wrote:
> > > > >
> > > > > > Hello all,
> > > > > >
> > > > > > This is not really a LuaTeX question, but I ask it here anyway
> > since a
> > > > > > lot of knowledgeable people read this list.
> > > > > >
> > > > > > I’ve been surprised to discover that
> > > > > >
> > > > > >     print(string.gsub('abc', '.*', '(%0)'))
> > > > > >
> > > > > > returns
> > > > > >
> > > > > >     (abc)()
> > > > > >
> > > > > > (similarly, “string.gmatch('abc', '.*')” returns two matches). I’d
> > > > > > expect
> > > > > >
> > > > > >     (abc)
> > > > > >
> > > > > >
> > > > >
> > > > > myabe this can help
> > > > >
> > > > > > print(string.gsub("abc","%s*","(%0)"))
> > > > > ()a()b()c()    4
> > > > >
> > > > > > print(string.gsub("abc","%S*","(%0)"))
> > > > > (abc)()    2
> > > > >
> > > > > """
> > > > > A pattern item can be
> > > > >
> > > > > a single character class followed by '*', which matches 0 or more
> > > > > repetitions of characters in the class. These repetition items will
> > > > always
> > > > > match the longest possible sequence;
> > > > > """
> > > >
> > > > Thank you Luigi, but “*” has the same definition in other languages,
> > > > including those where there is no match on a final empty string.
> > > >
> > > > As for your first example, all languages behave the same as far as I
> > > > can tell, as expected.
> > > >
> > > > Best,
> > > > Paul
> > > >
> > > >
> > > $ perl -e '$x="abc"; @w=($x=~ /(.*)/g);  print "tot. matches:",
> > scalar(@w),
> > > "  matches:($w[0])($w[1])\n"'
> > > tot. matches:2  matches:(abc)()
> > >
> > > $ perl -e '$x="abc"; @w=($x=~ /(.*)/);  print "tot. matches:",
> > scalar(@w),
> > > "  matches:($w[0])($w[1])\n"'
> > > tot. matches:1  matches:(abc)()
> > >
> > > in perl
> > > "the modifier //g stands for global matching and allows the matching
> > > operator to match within a string as many times as possible"
> > > and I think it corresponds to
> > > "These repetition items will always match the longest possible sequence;"
> > > of pattern.
> > >
> > > Thanks again, Luigi... but again, that doesn’t explain away the
> > > problem. Actually, I don’t think “g” corresponds to matching the
> > > longest possible sequence (simply matching as many times as possible
> > > instead of only once),
> >
> ok, better:

Let me tell you, Luigi, I love your doggedness :)

> in Lua string.gsub match always as many time as possible,  and .* is greedy.
> Together they are  like g (as many time as possibile) and .* (greedy,
> default) in perl .
> So we have the same result.

Yes, but just like in Vim script or Python, which combine “.*” and “g”
too (explicitly in the case of Python) yet do not match on a final
empty string.

> The no greedy version of * is -:
> print(string.gsub("abc",".-","(%0)"))
> ()a()b()c()    4

Yet

    print(string.gsub('abc', '.-$', '(%0)'))

still prints “(abc)()” (the “$” is included to make it comparable to
“.*”), whereas the equivalent

    echo substitute('abc', '.\{-}$', '(\0)', 'g')

in Vim returns “(abc)”. We still have the same problem here: Lua
thinks it wise to continue searching after it has consumed the string,
and it seems to be expected in some languages but not other.

> perl -e '$x="abc"; @w=($x=~ /(.{0}?)/g);  print "tot. matches:",
> scalar(@w),"  matches:($w[0])($w[1])($w[2])($w[3])\n"'
> g as many time as possible
> {0}? no greedy (? is redundant)

(Note that I can’t really read Perl, so I’m not sure what you’re
showing me here.)

Best,
Paul



More information about the luatex mailing list