[XeTeX] Word wrapping in Lao

Jonathan Kew jfkthame at googlemail.com
Fri Apr 16 10:53:26 CEST 2010


On 16 Apr 2010, at 08:15, Brian Wilson wrote:

> Lao, Thai and Khmer space at the phrasal level and not the word level. I was not getting any word wrapping in Lao (haven't tried Thai or Khmer yet) until a friend suggested that I add the following in the preamble 
> 
> \renewcommand{\|}{\hspace{0pt}}
> 
> and then insert \| at each potential word break.This works, but with more than 1000 pages of text in these languages, manually inserting this command at every word would get old quickly.

You can improve on THAT, at least, by writing a simple script in any text-processing language (preferably one with good Unicode support) to recognize the relevant places and insert \| (or the Unicode zero-width space, U+200B, which can then be made \active in xetex and configured to permit a line-break).

> A friend has suggested that I go back and use movable type as it probably wouldn't be any slower than doing it this way. My main reason for leaving In Design was because I didn't want to have to manually enter in the word spaces for each line. It seems that I am back where I started.  Wouldn't it be easier to tell "TeX" where spaces can't occur and then let it have freedom to make its breaks everywhere else? I think I know all of the break rules where breaks can not occur at the syllable level and a dictionary of non-breakable words could be added. 

The first thing to try would be a couple of xetex commands at the beginning of your document:

  \XeTeXlinebreaklocale "en"
  \XeTeXlinebreakskip = 0pt plus 2pt

This will activate xetex's line-breaking support based on the Unicode breaking rules. I'm not sure exactly how this will work with Lao, but I'd expect it to break at the character cluster level, which may or may not be good enough to be useful to you.

For Thai, you can set

  \XeTeXlinebreaklocale "th"

to activate support for the Thai locale, using a dictionary-based line breaker, but I don't believe ICU (which xetex is using to implement this) has any specific support for Lao beyond whatever is provided by the Unicode properties.

(The best way to improve the support for Lao, then, would be via the ICU project: file a bug report requesting full Lao line-breaking support, or better yet, submit a patch to add it.)

JK





More information about the XeTeX mailing list