[luatex] Off topic: A quiz
Hans Hagen
pragma at wxs.nl
Thu Dec 12 11:06:50 CET 2013
On 12/12/2013 10:59 AM, Paul Isambert wrote:
> De: "Patrick Gundlach" <patrick at gundla.ch>
>>> space
>>> not a space
>>
>> that was the easy part... Now the question is "why"... (Its clear
>> when you add anchors ^ and $ to the pattern).
>
> I'll admit I don't get it. When I saw that
>
> unicode.utf8.match("à", "%s")
>
> returned true, I thougt: "à" is "C3 A0" in UTF-8, but Lua knows about latin-1
> only, and "A0" is the non-breaking space, hence the false positive. And then,
> of course: but isn't unicode.utf8.match() supposed to know about UTF-8? What
> good is it if it can't spot a multibyte character?
>
> Then I tried
>
> string.match("à", "%s")
>
> and it returned false, meaning actually the non-breaking space isn't
> recognized by "%s", so my first explanation was wrong anyway.
>
> I may be missing something here, being quite tired, but it seems to me
> slnunicode is buggy or what?
it's some optimization (i remember noticing similar things) ... i think
that "%s" becomes a quick and dirty match for without looking at each
character as utf
if match("xà","x%s") then
print("space")
else
print("not a space")
end
works as expected
Hans
-----------------------------------------------------------------
Hans Hagen | PRAGMA ADE
Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
| www.pragma-pod.nl
-----------------------------------------------------------------
More information about the luatex
mailing list