[luatex] problem with slnunicode's find

Patrick Gundlach patrick at gundla.ch
Wed Mar 3 08:50:26 CET 2010


Hi all,

this discussion is IMO whether unicode.* libraries are a replacement for string or not.

If they are a replacement for string, then they must preserve its semantics. For example string.find must be able to find bytes and return byte positions, because a string can also be binary data. I don't think that we can argue about this.

So the question is: are the unicode.* libraries meant as a drop in replacement for string? So that one can say for example:

if input=="utf8" then
  string = unicode.utf8
elseif input=="latin1" then
  string = uniocde.latin1
end

result = string.whatever()

When I look at the source code of the selene library, it seems to me perfectly clear that is meant as a drop in replacement. 

a) It covers exactly the same functions as string.*
b) The only changes are the extended character classes and the counting of character lengths when there is a non-byte operation (for example string.len() vs. #str)
c) everything else behaves exactly like strings.
d) it even mentions that it can be used as a replacement

So if it is a replacement, changing the find function would break everything that deals with binary data. Please let's not easily call the unicode library broken, because it is a design decision that has been made and for me it makes perfectly sense. And with the combination of *.len and *.sub as I have shown in a previous mail, everything that has been requested so far can be made.


And yes, if it is *not* meant as a replacement, than I can understand that this opens questions. But then find should not allow bytes in the pattern and should raise an error.

Patrick

(trying to avoid a heated discussion)


More information about the luatex mailing list