[texhax] How to use 'collate' array?

Igor Liferenko igor.liferenko at gmail.com
Tue Nov 29 04:36:59 CET 2016


Hi all,

Run the following commands on this example1.w:

@ @c
@.écrire@>
@.ça@>
@.Écrire@>

iconv -t iso8859-1 example1.w > ex1-8bit.w
cweave ex1-8bit.w > /dev/null
iconv -f iso8859-1 ex1-8bit.idx

The output will be:

\I\.{Écrire}, 1.
\I\.{ça}, 1.
\I\.{écrire}, 1.

As we see, the sorting on non-ASCII letters is performed in a
case-sensitive manner.
This is the violation of the rule that sorting must be case-insensitive.
To check this rule, run the following commands on this ASCII-only example2.w:

@ @c
@.Enter@>
@.enter@>
@.cancel@>

cweave example2.w > /dev/null
cat example2.idx

The output will be:

\I\.{cancel}, 1.
\I\.{enter}, 1.
\I\.{Enter}, 1.

The problem is that 'collate' array is not used properly.
To fix this, apply the following change to cweave.w:

@x
      c=(eight_bits)((cur_name->byte_start)[0]);
      if (xisupper(c)) c=tolower(c);
      blink[cur_name-name_dir]=bucket[c]; bucket[c]=cur_name;
@y
      c=(eight_bits)((cur_name->byte_start)[0]);
      if (xisupper(c)) c=tolower(c);
      blink[cur_name-name_dir]=bucket[collate[c]]; bucket[collate[c]]=cur_name;
@z

After applying this change, the output from example1.w becomes:

\I\.{ça}, 1.
\I\.{écrire}, 1.
\I\.{Écrire}, 1.

This is analogous to the output from example2.w, i.e.,
sorting of non-ASCII letters is done case-insensitively.

But this fixes only the first letter of an entry. Other non-ASCII
letters are still compared
case-sensitively.

I used the following example3.w:

@ @c
@.même@>
@.mça@>
@.mÊme@>

iconv -t iso8859-1 example3.w > ex3-8bit.w
cweave ex3-8bit.w > /dev/null
iconv -f iso8859-1 ex3-8bit.idx

The output from example3.w is:

\I\.{mÊme}, 1.
\I\.{mça}, 1.
\I\.{même}, 1.

And example4.w:

@ @c
@.eNter@>
@.enter@>
@.eancel@>

cweave example4.w > /dev/null
cat example4.idx

The output from the example4.w is:

\I\.{eancel}, 1.
\I\.{enter}, 1.
\I\.{eNter}, 1.

We see, that in example3.w sorting is done case-sensitively, and in
example4.w it is done case-insensitively. I don't know how to fix this case.

example3.w did not work neither with this change (iso8859-1 code for Ê
= \202, and for ê = \234):

@x
strcpy(collate+101,"\200\201\202\203\204\205\206\207\210\211\212\213\214\215\216\217");
@y
strcpy(collate+101,"\200\201\234\203\204\205\206\207\210\211\212\213\214\215\216\217");
@z

nor with this change:

@x
strcpy(collate+117,"\220\221\222\223\224\225\226\227\230\231\232\233\234\235\236\237");
@y
strcpy(collate+117,"\220\221\222\223\224\225\226\227\230\231\232\233\202\235\236\237");
@z

Please answer the following questions:
1) How to use 'collate' array properly?
2) How to make index for example3.w sorted case-insensitively?

Regards,
Igor



More information about the texhax mailing list