[luatex] [OT] The consumption of an input string.

Paul Isambert zappathustra at free.fr
Tue Jun 18 12:41:33 CEST 2013


luigi scarso <luigi.scarso at gmail.com> a écrit:
> On Mon, Jun 17, 2013 at 11:51 PM, Paul Isambert <zappathustra at free.fr>wrote:
> 
> > luigi scarso <luigi.scarso at gmail.com> a écrit:
> > > On Mon, Jun 17, 2013 at 7:24 PM, Paul Isambert <zappathustra at free.fr>
> > wrote:
> > >
> > > >
> > > > still prints “(abc)()” (the “$” is included to make it comparable to
> > > > “.*”), whereas the equivalent
> > > >
> > > >     echo substitute('abc', '.\{-}$', '(\0)', 'g')
> > > >
> > > I don't know vim,
> >
> > Bad Luigi.
> >
> > >                   but does zero-width
> > > http://vimdoc.sourceforge.net/htmldoc/pattern.html
> > >
> > http://davidchuprogramming.blogspot.it/2012/04/vim-tip-not-containing-pattern-2.html
> > > have some influence ?
> >
> > I don’t think so. As expressed in the discussion pointed to by Dirk,
> > the difference seems to be one of implementation not of semantic
> > difference between similar operators (although it actually makes a
> > difference). Dirk even formalized all that (in the same discussion),
> > and if I understood correctly the punchline is (as long as you agree
> > with Dirk, as I do): substrings should be closed intervals! Which
> > makes a nice motto, too bad Lua doesn’t endorse it.
> >
> > Thanks,
> > Paul
> >
> > reading
> http://lua-users.org/lists/lua-l/2013-04/msg00865.html
> I can say that agree, but coming from perl I also say that I find Lua
> natural for me.
> When I have an unexpected  behaviour I think then pattern with ϵ, the zero
> width string,
> eg  abc as
> ϵaϵbϵcϵ
> and in this way also your prev. example looks reasonable (greedy and global
> match considered).
> 
> Or:
> print((string.gsub(";a;", "a*", "ITEM")))
> ITEM;ITEMITEM;ITEM
> 
> target string is
> ϵ;ϵaϵ;ϵ
> pattern  is a* = ϵ|a+
> replacement is ITEM
> If we rewrite the target as
> target=[ϵ][;][ϵ][a][ϵ][;][ϵ]
> we have
> ϵ=target[1] match  => ITEM
> ;=target[2] no match => ;
> ϵ;target[3] match => ITEM
> aϵ=target[4] & target[5] match (greedy) => ITEM
> ;=target[6] no match => ;
> ϵ=target[7]  match => ITEM

But then “abc” should be represented as “[ϵ][a][ϵ][b][ϵ][c][ϵ]” and
“string.gsub("abc", ".*", "(%0)")” should return “()(abc)” or
something like that? I’ll admit I can’t really get my head around it.

> Given that there is a pcre lib I tend to consider perl as reference --- but
> I also know that many find regex complicate to understand/implement,
> so I don't  complain when Lua says that
> its regex are not perl or posix compatible because it wants to maintain the
> size of the code low.
> 
> How do you reproduce the same behaviour of Lua string.gsub("abc"...
> with Vim ?

Well, here things become strange. The regex in Vim script based on
Perl, as far as I can tell. Now

    substitute('abc', '.*', '(\0)', 'g')

returns the not-Perl-like “(abc)” instead of “(abc)()”, however

    substitute(';a;', 'a*', 'ITEM', 'g')

returns the Perl-like “ITEM;ITEMITEM;ITEM” instead of the expected
not-Perl-like “ITEM;ITEM;ITEM”.

Well, I think I’ll write to the Vim list! :)

Best,
Paul



More information about the luatex mailing list