[XeTeX] Unicode hyphens etc. and Xe(La)TeX

BPJ bpj at melroch.se
Sun Oct 31 23:09:00 CET 2010


I'm trying to find out if and how Xe(La)TeX does
or can be made to treat the following characters
different frem each other and/or in a 'smart' way:

	1) U+002D HYPHEN-MINUS
	2) U+00AD SOFT HYPHEN
	3) U+2010 HYPHEN
	4) U+2011 NON-BREAKING HYPHEN

Specifically I'd like to get the correct behavior for
Swedish so that a linebreak may occur after an ASCII hyphen
but not after a Unicode non-breaking hyphen. While globally
replacing every Unicode soft hyphen with \- is easy you
cannot, unfortunately, globally replace every ASCII hyphen
with some command which would do the right thing (whatever
that command may be) as the ASCII hyphen may occur in
command arguments which I've already inserted, and which are
not to be interpreted as text. (Though I think that such would 
typically be followed by a digit rather than a letter...)

I also have sort of the same thoughts about

	5) U+00B7 MIDDLE DOT
	6) U+2027 HYPHENATION POINT

or rather I would want some way to distinguish between a
middle dot after which a linebreak may occur and one after
which it may not.

I guess I'm basically looking for a \maylinebreak command!

/bpj


More information about the XeTeX mailing list