[luatex] problem with slnunicode's find
mpg at elzevir.fr
Wed Mar 3 10:19:25 CET 2010
luigi scarso a écrit :
>> this discussion is IMO whether unicode.* libraries are a replacement for string or not.
> A difficult question.
IMO not. The comments state that unicode.ascii and unicode.latin1 are
locale-independent replacements for string, but doens't say anything about
unicode.utf8, and that's probably for a reason. But as Taco, said, this would be
best discussed with the selene developpers.
> Have we found a bug in unicode.utf8.find or it's correct but we
> disagree about its behavior ?
This question has been answered many times: the fact that string.find returns
positions in bytes (as opposed to characters) is a design decision and the
function behaves precisely as the doc says on this point:
-- NOTE: find positions are in bytes for all ctypes!
-- use ascii.sub to cut found ranges!
-- this is a) faster b) more reliable
> If we disagree, what is the expected behavior ?
People who disagree would like the counts to be characters, not bytes.
> Can we implement an acceptable wrapper ?
Yes, an proper wrapper has already been given by Patrick  and quoted by
myself. Here it is again, now in the form of a function:
function find_utf8_chars(str, pat)
local a, b = unicode.utf8.find(str, pat)
a = unicode.utf8.len(string.sub(str, 1, a))
b = unicode.utf8.len(string.sub(str, 1, b))
return a, b
Note that this is not proper full version of find (arguments 3 and 4 not
supported, no captures returned). However, it does answers Stephan's original
More information about the luatex