[luatex] Behavior of slnunicode.utf8.match().
Paul Isambert
zappathustra at free.fr
Mon Aug 8 09:26:17 CEST 2011
Hello all,
The manual says slnunicode.utf8.match() is normally unicode-aware, unless one
uses the empty capture. Yet I stumble on the following strange behavior
(assuming the file is encoded in utf8):
\directlua{
% Returns "é" (two bytes):
tex.print(slnunicode.utf8.match("éî", ".", 1)
% Returns invalid (one-byte) character:
tex.print(slnunicode.utf8.match("éî", ".", 2)
% Returns "î" (two bytes):
tex.print(slnunicode.utf8.match("éî", ".", 3)
% Returns invalid (one-byte) character:
tex.print(slnunicode.utf8.match("éî", ".", 4)
}
I'd expect the second call to return "î", but it looks like the function counts
in bytes (not in UTF-8 characters) yet returns an UTF-8 character (i.e. more
than one byte) if it can do so. So call 2 (resp. 4) returns the second byte of
"é" (resp. "î"), while call 1 and 3 return the correct characters starting
there.
Is this a bug or have I misunderstood something? (I can't test slunicode
independantly for the moment.)
Best,
Paul
More information about the luatex
mailing list