[luatex] [OT] The consumption of an input string.

luigi scarso luigi.scarso at gmail.com
Tue Jun 18 14:18:36 CEST 2013


On Tue, Jun 18, 2013 at 12:41 PM, Paul Isambert <zappathustra at free.fr>wrote:

> But then “abc” should be represented as “[ϵ][a][ϵ][b][ϵ][c][ϵ]” and
> “string.gsub("abc", ".*", "(%0)")” should return “()(abc)” or
> something like that? I’ll admit I can’t really get my head around it.
>

abc=
target=[ϵ][a][ϵ][b][ϵ][c][ϵ]
pattern =ϵ | [^ϵ]+     (not sure about \n for Lua )
ϵaϵbϵc=abc =target[1]&target[2]&...&target[6]  (low level loop greedy) =>
(ϵaϵbϵc)  = (abc)
 ϵ=target[7] => (ϵ)  =()


Why not the ϵ after  c ?
>From http://perldoc.perl.org/perlre.html

"By default, a quantified subpattern is "greedy", that is, it will match as
many times as possible (given a particular starting location) while still
allowing the rest of the pattern to match"

or as in
Repeated Patterns Matching a Zero-length Substring :

"The lower-level loops are *interrupted* (that is, the loop is broken) when
Perl detects that a repeated expression matched a zero-length substring. "

Here the lower-level loops are those associated with the *+{} greedy
quantifiers .
In this case ϵ after c is the zero-lenght substring and the the string is
finished, so the first match is abc.

The global switch /g is the higher level loop:
"The higher-level loops preserve an additional state between iterations:
whether the last match was zero-length. To break the loop, the following
match after a zero-length match is prohibited to have a length of zero.
This prohibition interacts with backtracking (see
Backtracking<http://perldoc.perl.org/perlre.html#Backtracking>),
and so the *second best* match is chosen if the *best* match is of zero
length."

which seems to  be the case of  the last ϵ.


(at least for m; needless to say the I reading the manual every time...)

 Well, here things become strange. The regex in Vim script based on
> Perl, as far as I can tell. Now
>
>     substitute('abc', '.*', '(\0)', 'g')
>
> returns the not-Perl-like “(abc)” instead of “(abc)()”, however
>
>     substitute(';a;', 'a*', 'ITEM', 'g')
>
> returns the Perl-like “ITEM;ITEMITEM;ITEM” instead of the expected
> not-Perl-like “ITEM;ITEM;ITEM”.
>
> Well, I think I’ll write to the Vim list! :)
>
> I've tried with \zs\ze .. no luck.


-- 
luigi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/luatex/attachments/20130618/6ddbf63d/attachment.html>


More information about the luatex mailing list