[XeTeX] Assignment of codes (particularly \catcode) based on Unicode data

Wed May 6 22:22:26 CEST 2015

On 06/05/2015 16:04, Apostolos Syropoulos wrote:
> Hello,
> 
> I checked a bit the file and I have noticed that 
> 
> 
> \L 1F10 1F18 1F10 % 
> 
> while xgreek.sty defines 
> 
> 
> \global\lccode"1F10="1F10 \global\uccode"1F10="0395
> 
> You see the uppercase of 'GREEK SMALL LETTER EPSILON WITH PSILI'
> is 'GREEK LETTER EPSILON' and not 'GREEK LETTER EPSILON WITH PSILI. 
> 
> Some time ago I reported this to the Unicode people and they told me 
> 
> something like "we cannot change it now" (I do not remember the exact 
> 
> wording but the essence remains the same.) Naturally, all \lccodes and
> \uccodes for Greek letters are wrong and I suspect many more are wrong. 

This is slightly at a tangent from my original question (whether we are
processing the Unicode data in the right way), but is worth
consideration. It also has some impact on expl3 code related to case
changing (which does not use \lccode/\uccode).

I guess one could imagine deviating from the Unicode data but there are
issues. First, the current position is at least easy to explain. Second,
the current approach is the same position taken by I guess many other
pieces of software, so is cross-compatible with other stuff. Third, as a
non-Greek I can't comment on the technical correctness of what you say!
Is there some place I could see this discussed in detail? (I'm a bit
confused as to what 'GREEK CAPITAL LETTER EPSILON WITH PSILI' represents
if it's not the upper case of 'GREEK SMALL LETTER EPSILON WITH PSILI': I
notice in xgreek you map U+1F18 to U+0395 for upper casing and U+1F10
for lower casing.)
--
Joseph Wright