[luatex] problem with slnunicode's find
Patrick Gundlach
patrick at gundla.ch
Wed Mar 3 08:50:26 CET 2010
Hi all,
this discussion is IMO whether unicode.* libraries are a replacement for string or not.
If they are a replacement for string, then they must preserve its semantics. For example string.find must be able to find bytes and return byte positions, because a string can also be binary data. I don't think that we can argue about this.
So the question is: are the unicode.* libraries meant as a drop in replacement for string? So that one can say for example:
if input=="utf8" then
string = unicode.utf8
elseif input=="latin1" then
string = uniocde.latin1
end
result = string.whatever()
When I look at the source code of the selene library, it seems to me perfectly clear that is meant as a drop in replacement.
a) It covers exactly the same functions as string.*
b) The only changes are the extended character classes and the counting of character lengths when there is a non-byte operation (for example string.len() vs. #str)
c) everything else behaves exactly like strings.
d) it even mentions that it can be used as a replacement
So if it is a replacement, changing the find function would break everything that deals with binary data. Please let's not easily call the unicode library broken, because it is a design decision that has been made and for me it makes perfectly sense. And with the combination of *.len and *.sub as I have shown in a previous mail, everything that has been requested so far can be made.
And yes, if it is *not* meant as a replacement, than I can understand that this opens questions. But then find should not allow bytes in the pattern and should raise an error.
Patrick
(trying to avoid a heated discussion)
More information about the luatex
mailing list