[XeTeX] Σχετ: Re: Assignment of codes (particularly \catcode) based on Unicode data
Jonathan Kew
jfkthame at gmail.com
Thu May 7 11:56:32 CEST 2015
On 7/5/15 09:34, Philip Taylor wrote:
>
>
> Apostolos Syropoulos wrote:
>
>> The only mark that remains when making all capitals is the dieredis
>> (dialytika). All other vanish. This is common knowledge for people who
>> speak and write Greek.
>
> Well, this is not the opinion of (for example) Dr Charalambos Dendrinos,
> a native Greek speaker and Director of the Hellenic Institute. This is
> why I asked whether it was a universally-agreed truism or simply a
> matter of opinion, and in view of the fact that both Dr Dendrinos (in
> private correspondence) and Julian Bradfield (on this list) have offered
> the alternative perspective to your own, it would seem to be a matter of
> opinion rather than one of fact. If you look at the opening folio of
> George Etheridge's Encomium on Henry VIII, addressed to Elizabeth I :
>
> http://hellenic-institute.rhul.ac.uk/research/Etheridge/Electronic-Edition/
>
> you will see a number of Greek majuscules with either psilí or daseîa,
> including the very combination under discussion (GREEK CAPITAL LETTER
> EPSILON WITH PSILI, on line 2), suggesting that the combination of
> breathing and majuscule was common at that time.
I think there may be some confusion as to exactly what this discussion
is about. Certainly, "the combination of breathing and majuscule" occurs
in mixed-case polytonic text, as shown in your example. However,
Apostolos is (I think) addressing the case of all-uppercase text, in
which case the usual practice is to drop all marks except dieresis.
See, for example, http://unicode.org/udhr/d/udhr_ell_polytonic.html;
note the presence of breathing marks on initial capitals within the
text, but note also their complete absence in the ALL-CAPS title.
So if a lower-to-uppercase mapping is used just to Capitalize Initial
Letters, it perhaps should not discard breathing marks; but if it is
used to turn a passage of text into ALL UPPERCASE, then it probably
should discard them.
But things are actually trickier than that. AIUI, the most correct
polytonic UPPERCASE transform for "μάιος" would be "ΜΑΪΟΣ" -- not only
is the accent on ά gone, but ι has acquired a dieresis and become Ϊ.
The \uccode/\lccode tables in (Xe)TeX cannot fully capture this, no
matter what code assignments are chosen; neither can the per-character
properties in Unicode. It requires a more powerful approach to case
transforms.
So I still maintain that the default code values assigned in formats
such as xe(la)tex should be based directly on the Unicode properties. It
would be great to have a Greek package that implements proper Greek
uppercasing, but this level of language- and orthography-specific
behavior does not belong in the base format.
JK
More information about the XeTeX
mailing list