[luatex] problem with slnunicode's find

Stephan Hennig mailing_list at arcor.de
Tue Mar 2 21:30:59 CET 2010

Am 02.03.2010 19:09, schrieb Patrick Gundlach:
>>> What would you suggest the following statement returns?
>>> str="aö" unicode.utf8.find(str,"\182")  -- (ö's utf8 values are
>>> 195 and 182)
>> Nil, or even better error out, since the second argument is
>> invalid.
> That would break every program that uses unicode.utf8 as a
> replacement for string,

For UTF-8 encoded strings?  That's what one commits to when using 
unicode.utf8, no?  So, it would only break in case of a programming 
error.  Not that bad, I'd say.

> which is meant for.

Any pointer?  That would explain much.  Even though, it seem to me like
harnessing a horse to a car for those who don't have a driver's license, 
but want to use the car as a drop-in replacement for their carriage. 
The benefit of crippled functionality is questionable.

> How would you then search for a byte sequence in another byte
> sequence?

string.find.  Whoever wants to search for arbitrary byte values with a
Unicode string library does something truly wrong, IMHO.  Do you have a
use-case at hand where that would seem desirable, while at the same time
the built-in string library is not at hand?

>> Do you think 3 is a sensible result?
> Yes, and the only sensible result.
> And what would you expect in this case:
> my_three_numbers = "\97\195\182"
> unicode.utf8.find(my_three_numbers,"\182")

Nil or error out, since \182 is an invalid argument.

> and in this case:
> my_three_numbers = "\97\195\182"
> string.find(my_three_numbers,"\182")


> This discussion is getting ridiculous, and I'll stop here. If you
> want something that returns utf-8 lengths then please write a wrapper
> around the find function yourself.

That's what my question

> Is there a function in slnunicode that checks a string for UTF-8
> compliance?

was aiming at.  Even though, explicitly pre-processing strings might 
lower the performance penalty compared to a wrapper.

Best regards,
Stephan Hennig

More information about the luatex mailing list