[tex-hyphen] Loading patterns twice, OT1 and apostrophe

Jonathan Kew jonathan_kew at sil.org
Sat Jun 28 11:14:05 CEST 2008


On 28 Jun 2008, at 9:02 am, Taco Hoekwater wrote:

> Arthur Reutenauer wrote:
>>>                   I've been thinking: Perhaps the final solution  
>>> is to
>>> do away with \lccode and \uccode completely and instead base the
>>> system on unicode properties?
>>   You don't say :-)
>
> Well, there is a downside also: an interface to the unicode properties
> would have to be written too, lest we loose flexibility. TeX users
> are used to being able to modify everything, so a static database
> won't do.

Operations such as case-folding must allow "tailoring" because the  
properties in the UCD are defaults, not necessarily correct for every  
language. (Consider the casing behavior of i in Turkish, to take one  
well-known example.)

And we mustn't forget that users may need to provide properties for  
PUA codepoints they're using, even if they don't normally need to  
modify standard Unicode properties.

>
>
>>   The irony here is that LuaTeX doesn't complain about duplicate
>> patterns anymore since the hyphenation-handling code moved over to
>> libHnj last October, and part 43 of the original TeX code disappeared
>> entirely; Taco, can you comment about that?
>
> I could have added such testing code, but it seemed a bit pointless.
> Duplicate patterns are harmless after all, it just wastes a few CPU
> cycles.

There are two slightly different cases, and they might merit  
different handling. Truly duplicated patterns

   a1b
   a1b

could be silently ignored as harmless, or perhaps a warning logged;  
on the other hand, patterns that have the same sequence of letters  
but different hyphenation weights

   a1b
   a2b

should probably be reported as "conflicting" rather than "duplicate".  
(TeX does not currently distinguish between these two situations, it  
just gives the "duplicate" error for both.)

JK



More information about the tex-hyphen mailing list