[XeTeX] Σχετ: Re: Assignment of codes (particularly \catcode) based on Unicode data

Joseph Wright joseph.wright at morningstar2.co.uk
Thu May 7 12:09:21 CEST 2015


On 07/05/2015 10:56, Jonathan Kew wrote:
> On 7/5/15 09:34, Philip Taylor wrote:
>>
>>
>> Apostolos Syropoulos wrote:
>>
>>> The only mark that remains when making all capitals is the dieredis
>>> (dialytika). All other vanish. This is common knowledge for people who
>>> speak and write Greek.
>>
>> Well, this is not the opinion of (for example) Dr Charalambos Dendrinos,
>> a native Greek speaker and Director of the Hellenic Institute.  This is
>> why I asked whether it was a universally-agreed truism or simply a
>> matter of opinion, and in view of the fact that both Dr Dendrinos (in
>> private correspondence) and Julian Bradfield (on this list) have offered
>> the alternative perspective to your own, it would seem to be a matter of
>> opinion rather than one of fact.  If you look at the opening folio of
>> George Etheridge's Encomium on Henry VIII, addressed to Elizabeth I :
>>
>>     http://hellenic-institute.rhul.ac.uk/research/Etheridge/Electronic-Edition/
>>
>>
>> you will see a number of Greek majuscules with either psilí or daseîa,
>> including the very combination under discussion (GREEK CAPITAL LETTER
>> EPSILON WITH PSILI, on line 2), suggesting that the combination of
>> breathing and majuscule was common at that time.
> 
> I think there may be some confusion as to exactly what this discussion
> is about. Certainly, "the combination of breathing and majuscule" occurs
> in mixed-case polytonic text, as shown in your example. However,
> Apostolos is (I think) addressing the case of all-uppercase text, in
> which case the usual practice is to drop all marks except dieresis.
> 
> See, for example, http://unicode.org/udhr/d/udhr_ell_polytonic.html;
> note the presence of breathing marks on initial capitals within the
> text, but note also their complete absence in the ALL-CAPS title.
> 
> So if a lower-to-uppercase mapping is used just to Capitalize Initial
> Letters, it perhaps should not discard breathing marks; but if it is
> used to turn a passage of text into ALL UPPERCASE, then it probably
> should discard them.
> 
> But things are actually trickier than that. AIUI, the most correct
> polytonic UPPERCASE transform for "μάιος" would be "ΜΑΪΟΣ" -- not only
> is the accent on ά gone, but ι has acquired a dieresis and become Ϊ.
> 
> The \uccode/\lccode tables in (Xe)TeX cannot fully capture this, no
> matter what code assignments are chosen; neither can the per-character
> properties in Unicode. It requires a more powerful approach to case
> transforms.
> 
> So I still maintain that the default code values assigned in formats
> such as xe(la)tex should be based directly on the Unicode properties. It
> would be great to have a Greek package that implements proper Greek
> uppercasing, but this level of language- and orthography-specific
> behavior does not belong in the base format.

Indeed, whilst not what I was after here (which as you say is about
defaults for the formats), in the expl3 code I've written for case
changing the idea of positional dependence is built it. There's no
question that the TeX 1-1 mapping for case changing is not applicable to
many situations, not just the case of Greek text. I'll ask a separate
question about Greek case mapping for the expl3 context later on as it
seems to have people's attention.
--
Joseph Wright




More information about the XeTeX mailing list