[XeTeX] Contextual shaping

Wed Nov 27 14:05:53 CET 2013

On 27/11/13 12:46, Khaled Hosny wrote:
> On Wed, Nov 27, 2013 at 09:10:02PM +0900, Simon Cozens wrote:
>> This is possibly a daft question, but...
>>
>> In traditional TeX, character tokens are processed and put into boxes
>> individually, with fairly primitive ligature tables. Obviously XeTeX doesn't
>> do this, using Harfbuzz (or ICU or whatever) to do the shaping and layout.
>>
>> My question is, if you're not "showing" individual characters to the shaping
>> engine for it to consider, what defines how big a string of characters to
>> shape at a time? Does XeTeX break at the "word" level and then shape a word,
>> and if so what defines a word? (Chinese has no word breaks!) Or does it
>> shape an entire paragraph of text at a time (!) and then box up the glyphs
>> individually? Or...?
>
> XeTeX shapes words one at a time, a word is basically any consecutive
> sequence of character nodes (using the same font) after TeX has done its
> macro expansion and is ready to typeset the material. The AAT code,
> additionally, tries to merge word sequences separated by spaces into one
> node.
>

In particular, in case it's not sufficiently clear from the above, note 
that <space>s, being glue nodes, are NOT part of such a "consecutive 
sequence of character nodes". And therefore a known limitation of xetex 
is that OpenType lookups that try to match the <space> glyph will not 
work. Shaping happens only within a run of non-space characters in a 
given font.

Most fonts are not affected by this, but it is an issue for certain 
fonts that want to do complex multi-word ligatures, or contextual forms 
that depend on the adjacent <space> glyph.

JK