[luatex] Off topic: A quiz

Hans Hagen pragma at wxs.nl
Thu Dec 12 11:06:50 CET 2013


On 12/12/2013 10:59 AM, Paul Isambert wrote:
> De: "Patrick Gundlach" <patrick at gundla.ch>
>>> space
>>> not a space
>>
>> that was the easy part... Now the question is "why"... (Its clear
>> when you add anchors ^ and $ to the pattern).
>
> I'll admit I don't get it. When I saw that
>
>      unicode.utf8.match("à", "%s")
>
> returned true, I thougt: "à" is "C3 A0" in UTF-8, but Lua knows about latin-1
> only, and "A0" is the non-breaking space, hence the false positive. And then,
> of course: but isn't unicode.utf8.match() supposed to know about UTF-8? What
> good is it if it can't spot a multibyte character?
>
> Then I tried
>
>      string.match("à", "%s")
>
> and it returned false, meaning actually the non-breaking space isn't
> recognized by "%s", so my first explanation was wrong anyway.
>
> I may be missing something here, being quite tired, but it seems to me
> slnunicode is buggy or what?

it's some optimization (i remember noticing similar things) ... i think 
that "%s" becomes a quick and dirty match for without looking at each 
character as utf

if match("xà","x%s") then
     print("space")
else
     print("not a space")
end

works as expected

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
                                              | www.pragma-pod.nl
-----------------------------------------------------------------


More information about the luatex mailing list