[XeTeX] Case changing for Greek

Jonathan Kew jfkthame at gmail.com
Thu May 7 15:26:38 CEST 2015


On 7/5/15 13:22, Joseph Wright wrote:
> Hello all,
>
> The question of case changing in Greek has come up in another thread.
> Whilst the details here aren't XeTeX (or even TeX) specific, given the
> interest by members of the list I hope I can take advantage to ask about
> the area.
>
> For work on LaTeX3/expl3 we've put together an approach to case changing
> in XeTeX (and LuaTeX) that is not tied to a 1-1 mapping.
>
> One of the design ideas behind the code was to allow a way to tackle
> context- and language-dependent changes. At the same time, to date we
> have used the Unicode docs to define case mappings. Thus the 'standard'
> mappings follow those in UnicodeData.txt (1-1 lower/title/upper) and
> SpecialCasing.txt (more complex cases).
>
> Included in that 'standard' set up is the final sigma rule for Greek
> text. For performance reasons that code has been set up to assume that a
> sigma is final if it is followed by a space, a control sequence or a
> character from the list
>
>      ) ] } . : ; , ! ? ' "
>
> Other potential additions are welcome as is testing of what we have
> done. (There seem to be a lot of edge cases. For example, what happens
> if a sigma is immediately followed by a number, say in a computational
> identifier.)
>
> What has not been covered at all to date is any special handling of
> accents. As indicated in the other thread, it seems that the handling of
> accents in Greek is non-trivial. Notable, we have an implementation
> which separates out title case from upper case and have the idea of
> language-dependent mappings. Thus it would be perfectly possible to have
> logic 'Retain accents on the first letter of a word when title casing;
> remove them when upper casing'. Similarly, I wonder if there are
> differences in practice related to the nature of the text: modern
> writing vs. historical text, etc. Again, this can be added if there is a
> clear set of rules to follow.
>
> Detailed information is most welcome.

FWIW, we've done some work on this in Mozilla in the past few years, to 
provide language-appropriate behavior for CSS features like 
text-transform:uppercase and font-variant:small-caps. You might like to 
review the discussion in bug reports such as

   https://bugzilla.mozilla.org/show_bug.cgi?id=231162
   https://bugzilla.mozilla.org/show_bug.cgi?id=307039
   https://bugzilla.mozilla.org/show_bug.cgi?id=740120
   https://bugzilla.mozilla.org/show_bug.cgi?id=740477

In particular, bug 307039 has a lot to say about uppercasing Greek. The 
details of actual code patches will obviously not be relevant, but the 
comments describing desired/implemented behavior may be helpful.

JK



More information about the XeTeX mailing list