[luatex] hyphenating ancient Greek

Hans Hagen pragma at wxs.nl
Sun Jun 13 16:24:57 CEST 2010


On 13-6-2010 3:29, Reinhard Kotucha wrote:
> On 12 June 2010 Robin Fairbairns wrote:
>
>   >  Reinhard Kotucha<reinhard.kotucha at web.de>  wrote:
>   >
>   >  >  It would be much more convenient if case folding wouldn't depend on
>   >  >  the language, i.e. the Turkish "i" had a separate code point.
>   >
>   >  interesting the difference in attitude as between different languages.
>   >  for example, the glyph capital A appears in english, greek and russian
>   >  -- three different alphabets --- with the same sound (to first order).
>   >
>   >  yet a significantly different pair of glyphs (i in english and turkish)
>   >  apparently occupy the same code point.  (unfortunately, in the russian-
>   >  greek-english i speak from some knowledge of all three languages, but i
>   >  know no turkish at all, beyond its typographic issues and what spam in
>   >  turkish looks like...).
>   >
>   >  >  However, I think it's solvable.
>   >
>   >  it would be interesting to know how.
>
> I suppose that the data will be stored in a Lua table (associative
> array).  Values in tables are not restricted to strings or numbers.
> They can also be functions or other tables.
>
>    uccode = {} -- create a table
>
>    uccode[0x0061] = 0x0041  -- a
>    uccode[0x0062] = 0x0042  -- b
>    uccode[0x0063] = 0x0043  -- c
>    ...
>    uccode[0x0069] = { default = 0x0049, turkish = 0x0130 } -- i
>    ...
>
> When TeX's uccode array is initialized (yes, I think TeX can't use
> this table directly), the default value is used if the type of a table
> entry is a table instead of a number.  If you switch from English to
> Turkish, you have to change TeX's \uccode and restore it afterwards
> (at the same time you switch between hyphenation patterns).

future luatex versions will store the lccodes with the language (as in 
etex) so then this switching will happen automatically

> This is _one_ approach.  Others are thinkable but this one might work
> without changing the TeX engine too much.  The only requirement is
> that TeX's \uccode array can have more than 256 entries, but I suppose
> this is already the case in LuaTeX.

om this respect there is no difference between regular etex and luatex 
(given that lccodes are stored with th elanguag ewhich is in on the agenda)

initializing lc codes when loading the format can be done using the 
lccode primitive in the usual way (on the agenda is a tex.setlccode 
function but it will not be more efficient than \lccode n = m)

such things are mostly a macro package issue

(related upper/lowercas issues are another matter and definitely a macro 
package issue; tex's lowercase / uppercase is hardly useful for more 
complex cases anyway)

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
                                              | www.pragma-pod.nl
-----------------------------------------------------------------


More information about the luatex mailing list