[luatex] problem with slnunicode's find
Stephan Hennig
mailing_list at arcor.de
Wed Mar 3 17:18:46 CET 2010
Am 03.03.2010 10:19, schrieb Manuel Pégourié-Gonnard:
> luigi scarso a écrit :
>
>> Can we implement an acceptable wrapper ?
>>
> Yes, an proper wrapper has already been given by Patrick [1] and quoted by
> myself. Here it is again, now in the form of a function:
>
> function find_utf8_chars(str, pat)
> local a, b = unicode.utf8.find(str, pat)
> a = unicode.utf8.len(string.sub(str, 1, a))
> b = unicode.utf8.len(string.sub(str, 1, b))
> return a, b
> end
>
> [...]
>
> [1] http://tug.org/pipermail/luatex/2010-March/001262.html
My original problem has already been solved by the function posted in my
second mail.[1] Here's a slightly modified version:
function utf8_find(str, pattern, init)
local s = unicode.utf8.sub(str, init)
-- search for first occurrence of pattern
s = unicode.utf8.match(s, "^.-" .. pattern)
-- calculate end point of match
local e = s and init + unicode.utf8.len(s) - 1
-- calculate beginning of match
local b = e and e - unicode.utf8.len(pattern) + 1
-- return indices of found match, or nil
return b, e
end
It works similar, but uses match instead of find. Although, Patrick's
approach could be a bit faster than mine, both won't perform well, since
they
* build temporary strings and
* have to iterate over strings several times (find/match, sub, len).
A native C implementation would probably be significantly faster than a
Lua implementation. Slnunicode developers decided not to provide such
thing. I can't imagine, why.
In my personal utf8_find, I think I'll use both Lua solutions and check
for differences of the find/match approaches for the sake of robustness
(until I get confident upon unicode.utf8.find again).
Best regards,
Stephan Hennig
[1] <URL:http://permalink.gmane.org/gmane.comp.tex.luatex.user/1182>
More information about the luatex
mailing list