[XeTeX] Contextual shaping

Wed Nov 27 13:10:02 CET 2013

This is possibly a daft question, but...

In traditional TeX, character tokens are processed and put into boxes 
individually, with fairly primitive ligature tables. Obviously XeTeX doesn't 
do this, using Harfbuzz (or ICU or whatever) to do the shaping and layout.

My question is, if you're not "showing" individual characters to the shaping 
engine for it to consider, what defines how big a string of characters to 
shape at a time? Does XeTeX break at the "word" level and then shape a word, 
and if so what defines a word? (Chinese has no word breaks!) Or does it shape 
an entire paragraph of text at a time (!) and then box up the glyphs 
individually? Or...?

(I've tried starting at layoutChars in XeTeXLayoutInterface.cpp and working 
backwards but I can't understand where I end up: measure_native_node shapes a 
node, but what's a node?)