[luatex] problem with slnunicode's find
Stephan Hennig
mailing_list at arcor.de
Tue Mar 2 21:30:59 CET 2010
Am 02.03.2010 19:09, schrieb Patrick Gundlach:
>>> What would you suggest the following statement returns?
>>>
>>> str="aö" unicode.utf8.find(str,"\182") -- (ö's utf8 values are
>>> 195 and 182)
>>
>> Nil, or even better error out, since the second argument is
>> invalid.
>
> That would break every program that uses unicode.utf8 as a
> replacement for string,
For UTF-8 encoded strings? That's what one commits to when using
unicode.utf8, no? So, it would only break in case of a programming
error. Not that bad, I'd say.
> which is meant for.
Any pointer? That would explain much. Even though, it seem to me like
harnessing a horse to a car for those who don't have a driver's license,
but want to use the car as a drop-in replacement for their carriage.
The benefit of crippled functionality is questionable.
> How would you then search for a byte sequence in another byte
> sequence?
string.find. Whoever wants to search for arbitrary byte values with a
Unicode string library does something truly wrong, IMHO. Do you have a
use-case at hand where that would seem desirable, while at the same time
the built-in string library is not at hand?
>> Do you think 3 is a sensible result?
>
> Yes, and the only sensible result.
>
>
> And what would you expect in this case:
>
> my_three_numbers = "\97\195\182"
> unicode.utf8.find(my_three_numbers,"\182")
Nil or error out, since \182 is an invalid argument.
> and in this case:
>
> my_three_numbers = "\97\195\182"
> string.find(my_three_numbers,"\182")
3
> This discussion is getting ridiculous, and I'll stop here. If you
> want something that returns utf-8 lengths then please write a wrapper
> around the find function yourself.
That's what my question
> Is there a function in slnunicode that checks a string for UTF-8
> compliance?
was aiming at. Even though, explicitly pre-processing strings might
lower the performance penalty compared to a wrapper.
Best regards,
Stephan Hennig
More information about the luatex
mailing list