[XeTeX] Σχετ: Re: Assignment of codes (particularly \catcode) based on Unicode data

Thu May 7 11:56:32 CEST 2015

On 7/5/15 09:34, Philip Taylor wrote:
>
>
> Apostolos Syropoulos wrote:
>
>> The only mark that remains when making all capitals is the dieredis
>> (dialytika). All other vanish. This is common knowledge for people who
>> speak and write Greek.
>
> Well, this is not the opinion of (for example) Dr Charalambos Dendrinos,
> a native Greek speaker and Director of the Hellenic Institute.  This is
> why I asked whether it was a universally-agreed truism or simply a
> matter of opinion, and in view of the fact that both Dr Dendrinos (in
> private correspondence) and Julian Bradfield (on this list) have offered
> the alternative perspective to your own, it would seem to be a matter of
> opinion rather than one of fact.  If you look at the opening folio of
> George Etheridge's Encomium on Henry VIII, addressed to Elizabeth I :
>
> 	http://hellenic-institute.rhul.ac.uk/research/Etheridge/Electronic-Edition/
>
> you will see a number of Greek majuscules with either psilí or daseîa,
> including the very combination under discussion (GREEK CAPITAL LETTER
> EPSILON WITH PSILI, on line 2), suggesting that the combination of
> breathing and majuscule was common at that time.

I think there may be some confusion as to exactly what this discussion 
is about. Certainly, "the combination of breathing and majuscule" occurs 
in mixed-case polytonic text, as shown in your example. However, 
Apostolos is (I think) addressing the case of all-uppercase text, in 
which case the usual practice is to drop all marks except dieresis.

See, for example, http://unicode.org/udhr/d/udhr_ell_polytonic.html; 
note the presence of breathing marks on initial capitals within the 
text, but note also their complete absence in the ALL-CAPS title.

So if a lower-to-uppercase mapping is used just to Capitalize Initial 
Letters, it perhaps should not discard breathing marks; but if it is 
used to turn a passage of text into ALL UPPERCASE, then it probably 
should discard them.

But things are actually trickier than that. AIUI, the most correct 
polytonic UPPERCASE transform for "μάιος" would be "ΜΑΪΟΣ" -- not only 
is the accent on ά gone, but ι has acquired a dieresis and become Ϊ.

The \uccode/\lccode tables in (Xe)TeX cannot fully capture this, no 
matter what code assignments are chosen; neither can the per-character 
properties in Unicode. It requires a more powerful approach to case 
transforms.

So I still maintain that the default code values assigned in formats 
such as xe(la)tex should be based directly on the Unicode properties. It 
would be great to have a Greek package that implements proper Greek 
uppercasing, but this level of language- and orthography-specific 
behavior does not belong in the base format.

JK