[XeTeX] Syriac abbreviations, and issues with polyglossia, fontspec and bidi
jfkthame at googlemail.com
Tue Dec 22 19:16:38 CET 2009
On 22 Dec 2009, at 18:05, Ross Moore wrote:
> Hi Gareth, and Fr Michael,
> On 23/12/2009, at 2:17 AM, Fr. Michael Gilmary wrote:
>> Gareth Hughes wrote:
>>> I know most syriacists working on Macs end up using Mellel as their Syriac word processor. Does anyone happen to know how that handles complex Unicode layouts?
>> FWIW --- although I'm no Syriac scholar --- I tried your sample text with the SAM (as you say) in Mellel ... and all the fonts come out the same: no extension of the abbreviation marker.
> Me neither, but I get what looks to be the same result with XeTeX on a Mac.
> (see attached image)
>> <Picture 1.png>
> Gareth, can you please post images of
> a. the *correct* appearance of these characters in a word;
> b. an example of the incorrect rendering that you would
> like improved. (e.g. from XeTeX, on ... )
> Without this, there are very few people who can possibly look
> into what is the root cause of your issue.
I don't have a handy way to produce the correct rendering, but I can describe it: there should be a line over the top of the word, from the place where the Syriac Abbreviation Mark (SAM) occurs until the end of the word.
In other words, SAM shouldn't be rendered as an individual glyph at all; it's more like markup that applies to the following string of characters. Implementing this requires a pretty advanced font system, or else special-case code in the rendering engine. I guess Uniscribe provides that. XeTeX doesn't, at least currently. So what you're seeing in the examples posted here is simply the "placeholder" glyph for SAM, which should disappear and be replaced by the overline on the following letters.
One approach would be to implement this in AAT or OT fonts by using glyph substitution: the SAM glyph would be deleted, but would trigger a contextual replacement of the following glyphs with overlined versions. However, I haven't seen a font that actually does this; the assumption seems to be that text engines will handle this with special-case code.
An issue with supporting SAM is that (last time I checked) it's defined as applying "until the end of the word", but this begs the question of how exactly one defines the "end of a word" in Syriac script. Obviously spaces, etc., would act as terminators; but there are likely to be some edge cases (what about invisible control characters such as join controls? directional controls? discretionary breaks? arbitrary diacritics? etc) that I have not seen clearly defined anywhere.
More information about the XeTeX