[luatex] String manipulation in Lua.

Philipp Gesang pgesang at ix.urz.uni-heidelberg.de
Fri Dec 3 12:08:13 CET 2010


On 2010-12-03 <10:37:02>, Paul Isambert wrote:
> Le 03/12/2010 00:22, Philipp Gesang a écrit :
> >On 2010-12-02<20:59:51>, Paul Isambert wrote:
> >>local sub, gsub = string.sub, string.gsub
> >>function isub (str, pattern, replace, index, num)
> >>   -- Extract the suffix starting at given index
> >>   local s1 = sub(str, index)
> >>   -- Make the replacement on the suffix
> >>   local s2 = gsub(s1, pattern, replace, num)
> >>   -- Replace the suffix in the string with it modified version
> >>   return gsub(str, s1 .. "$", s2)
> >>end
> >>
> >>It works, but I find the solution an overkill for what seems to be a
> >>basic operation. So, as I like to ask: have I missed something?
> >Hi Paul,
> >
> >“string.sub” takes an optional third argument so that you can do
> >something like this:
> >
> >···8<····························································
> >
> >local sub, gsub = string.sub, string.gsub
> >function isub (str, pattern, replace, index, num)
> >   local left  = sub(str, 1, index-1)
> >   local right = sub(str,index):gsub(pattern, replace, num)
> >   return left .. right
> >end
> >
> >print(isub("abc)abc%def-def",   "b",   "B", 4))
> >print(isub("abc)abc%def-def", "def", "FED", 1, 1))
> >
> >···8<····························································
> 
> Thank you Philip. That's what I'd figured out in the meanwhile. So
> there is no "primitive" isub...
> 
> >If you encounter problems with magic characters in patterns,
> >there is some assistance waiting in the context helper libs:
> >http://wiki.contextgarden.net/String_Manipulation#string.escapedpattern.28string.29_.7C_string.partialescapedpattern.28string.29
> 
> Are you trying to lure me into ConTeXt? :)
> Nice to get inspiration from, though.

Well, not *actively* so -- but it would appear that context of
itself has some involuntarily alluring property that does the job
pretty well.

> >PS: What’s the problem with lpeg, anyways?
> 
> Nothing, but I don't know how to use it and it seems to me it can't
> be used lightly; and for the moment I have other things to do.
> Your code, for instance, which seems to involve a so-called grammar,
> looks really interesting, except I don't know what lpeg.Cmt,
> lpeg.Cs, etc., mean, so to me it's unreadable. [Ok, from what I can
> read, it does the same thing as the above function.]

“Cs” triggers “smooth” substitution inside its scope. Without it
    you would get a lose bunch of substitutes missing the rest of
    the string in between.

“Cmt” gives you free hand to create your very own personalized
    matching function -- it does match, if the pattern it
    encounters is matched by the first argument, but also checks
    if the second arg -- the function -- returns true. This makes
    it the most versatile of any of lpeg’s functions (not
    omnipotent, though …) as you can feed it arbitrary conditions
    (imagine patterns that match only on Mondays, call a routine
    to check your bank account before matching, or match iff
    there is an entry in an online dictionary for the match as
    filtered through an md5 function …).

    (Details at Roberto’s: http://www.inf.puc-rio.br/~roberto/lpeg/lpeg.html)

Wrapping things up in a grammar saves rules which can become
necessary in bigger patterns (have a look at
http://lua-users.org/lists/lua-l/2008-11/msg00470.html) but in
this case you could do without it.

Switching to lpeg means freeing yourself from the limitations
(and idiosyncratic notation) of the common string patterns. It
surely *can*, as you put it, be used lightly once you got the
hang of it.

Regards, Philipp

> Best,
> Paul
> 
> >>(Yes, probably LPeg, but I don't want to go into that for the
> >>moment.)
> >···8<····························································
> >
> >local lpeg = require "lpeg"
> >local Cmt, Cs, P, V = lpeg.Cmt, lpeg.Cs, lpeg.P, lpeg.V
> >
> >local function lpeg_gsub (str, pattern, replacement, threshold, limit)
> >     local idx = 1
> >     local threshold = threshold or 0
> >     local sub_cnt = 0
> >
> >     local peg = P{
> >         [1] = "initial",
> >
> >         initial = Cs((V"p" + V"other")^0),
> >
> >         p = Cmt(P(pattern), function (_,_, matched)
> >                 if idx>= threshold and
> >                    ( (limit ~= nil) and (sub_cnt<  limit) or (limit == nil) )
> >                    then
> >                     idx = idx + #matched
> >                     sub_cnt = sub_cnt + 1
> >                     return true
> >                 end
> >                 return false
> >             end) / replacement,
> >
> >         other = Cmt(1, function () idx = idx + 1 return true end)
> >     }
> >
> >     --peg:print()
> >     return peg:match(str)
> >end
> >
> >
> >local test = "abcdefg abcdefg abcdefg abcdefg"
> >
> >print(lpeg_gsub(test, "def", "FED", 10))
> >print(lpeg_gsub(test, "def", "FED", 07, 2))
> >print(lpeg_gsub(test, "def", "FED", 15))
> >
> >io.write("\n")
> >
> >print(lpeg_gsub(test,   "a",   "A", 1, 2))
> >print(lpeg_gsub(test,   "b",   "B", 4, 1))
> >print(lpeg_gsub(test,   "c",   "C", 9, 0))
> >print(lpeg_gsub(test,   "d",   "D", 9))
> >
> >io.write("\n")
> >
> >local p1 = P"a" * P(1 - P"g")^1 * P"g"
> >local p2 = lpeg.S"bdf"
> >local p3 = lpeg.R"dg"
> >
> >print(lpeg_gsub(test, p1, "O", 9, 1))
> >print(lpeg_gsub(test, p2, "O", 1, 3))
> >print(lpeg_gsub(test, p3, "O", 22))
> >
> >···8<····························································
> >
> >

-- 
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://tug.org/pipermail/luatex/attachments/20101203/5dae714a/attachment.bin>


More information about the luatex mailing list