[XeTeX] Fwd: Re: Case changing for Greek

BPJ melroch at gmail.com
Mon May 11 22:30:20 CEST 2015


Watching this discussion and thinking about the various contexts which
should or should not trigger σ → ς substitution are surely the Greek
number signs:

-   U+0374 GREEK NUMERAL SIGN:'ʹ'
-   U+0375 GREEK LOWER NUMERAL SIGN:'͵'


I have just been fortunate to not run into -- or unfortunate not 
to notice! -- that yet.

So my substitution regular expression, which defines the following 
context negatively,
should rather be

     s/(?<= \p{Script=Greek} ) (?<= \pL ) ( \pM* ) σ (?! \pL | 
[-ʹ͵] )/$1ς/gx

There certainly are edge cases which this doesn't handle.  The one 
which immediately comes to mind is `(s)` and similar, which should 
be sensitive to what comes before and after the parentheses.

on Thu, 07 May 2015 I wrote:

> Den 2015-05-07 16:02, Jonathan Kew skrev:
> > Would it be feasible to define this negatively instead --
> > something like "a sigma is final if it is NOT followed by another
> > letter"?
> >
> > A possible refinement is that a lone sigma, neither preceded nor
> > followed by another letter, should probably be lowercased as σ
> > rather than ς.

> I have used this Perl regular expression substitution to change σ into
> ς for some years, with satisfactory results so far,

>     s/(?<= \p{Script=Greek} ) (?<= \pL ) σ (?! \pL | - )/ς/gx

> That is: change a σ into ς if it is preceded by a Greek letter
> and not followed by a letter or a hyphen. NB that this
> substitution as written above only works with NFC text. For NFD
> you would need to use the following, since the perl regex engine
> doesn't support variable-length lookbehind:

>     s/(?<= \p{Script=Greek} ) (?<= \pL ) ( \pM* ) σ (?! \pL | - )/$1ς/gx

> I guess that when intersection character classes are possible one
> should change the negative lookahead into "when not followed by a
> Greek letter or a hyphen.

>     (?! (?[ \p{Script=Greek} & \pL ]) | - )

> /bpj




More information about the XeTeX mailing list