[luatex] problem with slnunicode's find

Stephan Hennig mailing_list at arcor.de
Wed Mar 3 17:18:46 CET 2010

Am 03.03.2010 10:19, schrieb Manuel Pégourié-Gonnard:
> luigi scarso a écrit :
>> Can we implement an acceptable  wrapper  ?
> Yes, an proper wrapper has already been given by Patrick [1] and quoted by
> myself. Here it is again, now in the form of a function:
> function find_utf8_chars(str, pat)
>      local a, b = unicode.utf8.find(str, pat)
>      a = unicode.utf8.len(string.sub(str, 1, a))
>      b = unicode.utf8.len(string.sub(str, 1, b))
>      return a, b
> end
   > [...]
> [1] http://tug.org/pipermail/luatex/2010-March/001262.html

My original problem has already been solved by the function posted in my 
second mail.[1]  Here's a slightly modified version:

function utf8_find(str, pattern, init)
     local s = unicode.utf8.sub(str, init)
     -- search for first occurrence of pattern
     s = unicode.utf8.match(s, "^.-" .. pattern)
     -- calculate end point of match
     local e = s and init + unicode.utf8.len(s) - 1
     -- calculate beginning of match
     local b = e and e - unicode.utf8.len(pattern) + 1
     -- return indices of found match, or nil
     return b, e

It works similar, but uses match instead of find.  Although, Patrick's 
approach could be a bit faster than mine, both won't perform well, since 

    * build temporary strings and

    * have to iterate over strings several times (find/match, sub, len).

A native C implementation would probably be significantly faster than a 
Lua implementation.  Slnunicode developers decided not to provide such 
thing.  I can't imagine, why.

In my personal utf8_find, I think I'll use both Lua solutions and check 
for differences of the find/match approaches for the sake of robustness 
(until I get confident upon unicode.utf8.find again).

Best regards,
Stephan Hennig

[1] <URL:http://permalink.gmane.org/gmane.comp.tex.luatex.user/1182>

More information about the luatex mailing list