[luatex] problem with slnunicode's find
Stephan Hennig
mailing_list at arcor.de
Mon Mar 1 19:23:12 CET 2010
Hi,
I have trouble getting the position of a character in a UTF-8 string
with slnunicode. The attached Lua script reads two UTF-8 encoded (I
think) strings, 'äb' and 'öäb', from a file and outputs their length and
the position of the last character 'b'. (UTF-8 characters are scrambled
in the output, because this is on a Windows console. But that shouldn't
harm, should it?)
> >texlua slnunicode-find.lua
> line = äb
> len(line) = 2
> character 'b' at position 3
>
> line = ├Â├ñb
> len(line) = 3
> character 'b' at position 5
I would expect the positions of 'b' being 2 and 3, resp., as that are
the lengths of the strings as returned by unicode.utf8.len. However,
unicode.utf8.find seems to have another notion of the length of a
string. To correct these values manually (apparently the byte
positions) one needed to know how many of the characters preceding 'b'
are multiple bytes long. Actually, I thought, that is what slnunicode
is made for.
What is the preferred way to get the position of a character in a UTF-8
string, given a string contains only 'letters'?
Best regards,
Stephan Hennig
> >texlua -v
> This is LuaTeX, Version beta-0.40.6-2009110118 (Web2C 2009) luatex.web >= v14240
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: words.utf8
URL: <http://tug.org/pipermail/luatex/attachments/20100301/4660827b/attachment.pl>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: slnunicode-find.lua
URL: <http://tug.org/pipermail/luatex/attachments/20100301/4660827b/attachment-0001.pl>
More information about the luatex
mailing list