[luatex] problem with slnunicode's find

Tue Mar 2 22:04:02 CET 2010

Stephan Hennig a écrit :
> Am 02.03.2010 19:09, schrieb Patrick Gundlach:
>> That would break every program that uses unicode.utf8 as a
>> replacement for string,
> 
I disagree here for two reasons:

1. unicode.utf8 is not meant to be a drop-in replacement for string. This
explains why unicode.utf8 is not used to replace string in LuaTeX (I think it
was already mentioned in this thread). So you should use string when you want to
manipulate arbitrary strings of bytes, and unicode.utf8 only with strings of
bytes that happen to be valid UTF-8 (as the name implies).

2. string.find can already return nil, so I don't see how it breaks anything.

>> How would you then search for a byte sequence in another byte
>> sequence?
> 
> string.find.  Whoever wants to search for arbitrary byte values with a
> Unicode string library does something truly wrong, IMHO.

I strongly agree here.

>> This discussion is getting ridiculous, and I'll stop here. If you
>> want something that returns utf-8 lengths then please write a wrapper
>> around the find function yourself.
> 
> That's what my question
> 
Do you agree that Patrick's suggestion answers this question? I mean this one:

b = unicode.utf8.find(str,"%u+")
print(unicode.utf8.len(string.sub(str,1,b)))

If so, then from a practical point of view, your problem is solved. From a
theoretical point of view (the logician's point of view as you might say) there
is no problem with the function itself since it just does what the
"documentation" (if comments in a test file can be called documentation) says.

So the only question that remains is the following: is the design Good© or Bad®?
I'm not particularly interested in this debate (but while we're at it, I find it
quite surprising but wouldn't call it just plain broken either).

Manuel.