[XeTeX] Syriac abbreviations, and issues with polyglossia, fontspec and bidi

Sargon Hasso sargon.hasso at gmail.com
Wed Dec 23 19:38:17 CET 2009


Yes, the abbreviation mark (SAM Unicode is U070F) is implemented by the shaping engine in Uniscribe, otherwise there is no way to implement this in the font itself. The abbreviation mark in traditional Syriac, is a line above the last few letters of a word. The way it works in Windows is like this: when you type
StartSAM one or more characters EndSAM
The StartSAM acts as trigger and creates a context. As you type further letters, the engine adds SAM (zero-width character) automatically until you hit the EndSAM character which breaks the context and instructs the shaping engine to stop extending the line above subsequent letters.
In Windows, this works like this: To enter the line, hit the ` (single left quote) key before typing the letters upon which you want to place the line, then start typing; the line will extend until you type a SPACE, ENTER or a punctuation mark.
Keep in mind:
1) the SAM is a zero-width character
2) the shape does not really matter: I have seen it both as a straight line or the dotted line (i.e. --.--). I have Syriac manuscripts and books where both usages are seen [in the manuscripts that I looked at, cf. Hatch's An Album of Dated Syriac Manuscripts, the line above abbreviated word has one dot only, but in Windows implementation it has 3 dots.] 
3) the SAM character is stored much like a diacritic in the backing store following each character, for example if you type:
StartSAM Alef Beth Gammal EndSAM
Your text in the backing store is:
0710 070F 0712 070F 0713 070F.


Sargon

  

-----Original Message-----
From: xetex-bounces at tug.org [mailto:xetex-bounces at tug.org] On Behalf Of Gareth Hughes
Sent: Wednesday, December 23, 2009 10:47 AM
To: Unicode-based TeX for Mac OS X and other platforms
Subject: Re: [XeTeX] Syriac abbreviations, and issues with polyglossia, fontspec and bidi

As Jonathan said, SAM should be an overline stretching over the last few
letters of a word (to the left edge of the word), and hyphenation isn't
an issue. SAM is supposed to be entered at the beginning of the line
(right-hand edge of line), which may well be mid-word. Sometimes the
line is broken with a dot at each end and one in the middle, but that's
a matter of style. Numerals, which are traditionally written with
letters, often get this mark too (in which case, the line extends over
the entire numeral), to stop people trying to read them as words.

Fr. Michael Gilmary wrote:
> FWIW --- although I'm no Syriac scholar --- I tried your sample text
> with the SAM (as you say) in Mellel ... and all the fonts come out
> the same: no extension of the abbreviation marker.

It's interesting to know that Mellel can't handle SAM either. Thanks for
testing it.

Jonathan Kew wrote:
> One approach would be to implement this in AAT or OT fonts by using
> glyph substitution: the SAM glyph would be deleted, but would trigger
> a contextual replacement of the following glyphs with overlined
> versions. However, I haven't seen a font that actually does this; the
> assumption seems to be that text engines will handle this with
> special-case code.
> 
> An issue with supporting SAM is that (last time I checked) it's
> defined as applying "until the end of the word", but this begs the
> question of how exactly one defines the "end of a word" in Syriac
> script. Obviously spaces, etc., would act as terminators; but there
> are likely to be some edge cases (what about invisible control
> characters such as join controls? directional controls? discretionary
> breaks? arbitrary diacritics? etc) that I have not seen clearly
> defined anywhere.

The font Serto Jerusalem does have a combining overline that is used for
this purpose. The end of the line is always a space or punctuation; I
don't think other cases apply. I would imagine one could make the SAM
character active in XeTeX and set it to overline following text until it
meets a space or punctuation.

Jonathan Kew wrote:
> This might be possible, though I'm not sure whether the SAM
> necessarily occurs at a location where the cursive joining of the
> Syriac letters is interrupted; if not, there'd be a problem (in
> xetex, at least) of how to get correct shaping/joining across the box
> edge (and other commands, etc), as OpenType shaping only applies
> within a contiguous sequence of characters.
> 
> But I don't know enough about Syriac to really judge whether this is
> feasible purely at the macro level.

Ah, yes. My example put SAM in required cursive break, but it could
easily occur between two characters that should join. The problem is
that there is plenty of leeway in writing abbreviations in Syriac: I
could always try and start the line at a cursive break, but there won't
always be one, and it might look a little odd.

Gareth.

-- 
Gareth Hughes

Department of Eastern Christianity
Oriental Institute
Pusey Lane
Oxford
OX1 2LE

+44 (0)1865 615331



More information about the XeTeX mailing list