[texhax] How to use 'collate' array?
Igor Liferenko
igor.liferenko at gmail.com
Tue Nov 29 04:36:59 CET 2016
Hi all,
Run the following commands on this example1.w:
@ @c
@.écrire@>
@.ça@>
@.Écrire@>
iconv -t iso8859-1 example1.w > ex1-8bit.w
cweave ex1-8bit.w > /dev/null
iconv -f iso8859-1 ex1-8bit.idx
The output will be:
\I\.{Écrire}, 1.
\I\.{ça}, 1.
\I\.{écrire}, 1.
As we see, the sorting on non-ASCII letters is performed in a
case-sensitive manner.
This is the violation of the rule that sorting must be case-insensitive.
To check this rule, run the following commands on this ASCII-only example2.w:
@ @c
@.Enter@>
@.enter@>
@.cancel@>
cweave example2.w > /dev/null
cat example2.idx
The output will be:
\I\.{cancel}, 1.
\I\.{enter}, 1.
\I\.{Enter}, 1.
The problem is that 'collate' array is not used properly.
To fix this, apply the following change to cweave.w:
@x
c=(eight_bits)((cur_name->byte_start)[0]);
if (xisupper(c)) c=tolower(c);
blink[cur_name-name_dir]=bucket[c]; bucket[c]=cur_name;
@y
c=(eight_bits)((cur_name->byte_start)[0]);
if (xisupper(c)) c=tolower(c);
blink[cur_name-name_dir]=bucket[collate[c]]; bucket[collate[c]]=cur_name;
@z
After applying this change, the output from example1.w becomes:
\I\.{ça}, 1.
\I\.{écrire}, 1.
\I\.{Écrire}, 1.
This is analogous to the output from example2.w, i.e.,
sorting of non-ASCII letters is done case-insensitively.
But this fixes only the first letter of an entry. Other non-ASCII
letters are still compared
case-sensitively.
I used the following example3.w:
@ @c
@.même@>
@.mça@>
@.mÊme@>
iconv -t iso8859-1 example3.w > ex3-8bit.w
cweave ex3-8bit.w > /dev/null
iconv -f iso8859-1 ex3-8bit.idx
The output from example3.w is:
\I\.{mÊme}, 1.
\I\.{mça}, 1.
\I\.{même}, 1.
And example4.w:
@ @c
@.eNter@>
@.enter@>
@.eancel@>
cweave example4.w > /dev/null
cat example4.idx
The output from the example4.w is:
\I\.{eancel}, 1.
\I\.{enter}, 1.
\I\.{eNter}, 1.
We see, that in example3.w sorting is done case-sensitively, and in
example4.w it is done case-insensitively. I don't know how to fix this case.
example3.w did not work neither with this change (iso8859-1 code for Ê
= \202, and for ê = \234):
@x
strcpy(collate+101,"\200\201\202\203\204\205\206\207\210\211\212\213\214\215\216\217");
@y
strcpy(collate+101,"\200\201\234\203\204\205\206\207\210\211\212\213\214\215\216\217");
@z
nor with this change:
@x
strcpy(collate+117,"\220\221\222\223\224\225\226\227\230\231\232\233\234\235\236\237");
@y
strcpy(collate+117,"\220\221\222\223\224\225\226\227\230\231\232\233\202\235\236\237");
@z
Please answer the following questions:
1) How to use 'collate' array properly?
2) How to make index for example3.w sorted case-insensitively?
Regards,
Igor
More information about the texhax
mailing list