[XeTeX] Contextual shaping

Wed Nov 27 13:46:46 CET 2013

On Wed, Nov 27, 2013 at 09:10:02PM +0900, Simon Cozens wrote:
> This is possibly a daft question, but...
> 
> In traditional TeX, character tokens are processed and put into boxes
> individually, with fairly primitive ligature tables. Obviously XeTeX doesn't
> do this, using Harfbuzz (or ICU or whatever) to do the shaping and layout.
> 
> My question is, if you're not "showing" individual characters to the shaping
> engine for it to consider, what defines how big a string of characters to
> shape at a time? Does XeTeX break at the "word" level and then shape a word,
> and if so what defines a word? (Chinese has no word breaks!) Or does it
> shape an entire paragraph of text at a time (!) and then box up the glyphs
> individually? Or...?

XeTeX shapes words one at a time, a word is basically any consecutive
sequence of character nodes (using the same font) after TeX has done its
macro expansion and is ready to typeset the material. The AAT code,
additionally, tries to merge word sequences separated by spaces into one
node.

> (I've tried starting at layoutChars in XeTeXLayoutInterface.cpp and working
> backwards but I can't understand where I end up: measure_native_node shapes
> a node, but what's a node?)

measure_native_node is called by the WEB code (called set_native_metrics
there), check xetex.web for "collect_native:", that is where bulk of the
work is done. Check also "@<Merge sequences of words using AAT".

Regards,
Khaled