[XeTeX] Unicode hyphens etc. and Xe(La)TeX
Roland Kuhn
rk at rkuhn.info
Mon Nov 1 13:53:41 CET 2010
Generally, I recommend using the correct unicode characters in the TeX source and then define the behavior you want for them. In this case, this is fairly straight-forward:
1) TeX inserts empty discretionaries after each occurrence of the \hyphenchar (a per-font property which is usually equal to `-), which takes care of your first point quite nicely.
2) The soft hyphen can be made active and defined to yield “\-” (the only drawback to this character is that it is not very nicely displayed inside Terminal on MacOS):
\catcode` =\active
\def {\-}
3) The unicode hyphen "2010 can be made active and defined to yield “-” (ASCII hyphen), which is the right choice within TeX by construction:
\catcode`‐=\active
\def‐{-}
4) The non-breaking hyphen can also be made active and defined to yield “\hbox{-}” (the box prevents the discretionary after the ASCII hyphen from escaping, \nobreak does not help here):
\catcode`‑=\active
\def‑{\hbox{-}}
Where those characters are encountered does not matter much in my experience, but you can always include macros for disabling these activations, akin to
\catcode` =12
\catcode`‐=12
\catcode`‑=12
Given these, you should be able to adapt the procedure to solve the case with the middle dots.
Regards,
Roland
On Oct 31, 2010, at 23:09 , BPJ wrote:
> I'm trying to find out if and how Xe(La)TeX does
> or can be made to treat the following characters
> different frem each other and/or in a 'smart' way:
>
> 1) U+002D HYPHEN-MINUS
> 2) U+00AD SOFT HYPHEN
> 3) U+2010 HYPHEN
> 4) U+2011 NON-BREAKING HYPHEN
>
> Specifically I'd like to get the correct behavior for
> Swedish so that a linebreak may occur after an ASCII hyphen
> but not after a Unicode non-breaking hyphen. While globally
> replacing every Unicode soft hyphen with \- is easy you
> cannot, unfortunately, globally replace every ASCII hyphen
> with some command which would do the right thing (whatever
> that command may be) as the ASCII hyphen may occur in
> command arguments which I've already inserted, and which are
> not to be interpreted as text. (Though I think that such would typically be followed by a digit rather than a letter...)
>
> I also have sort of the same thoughts about
>
> 5) U+00B7 MIDDLE DOT
> 6) U+2027 HYPHENATION POINT
>
> or rather I would want some way to distinguish between a
> middle dot after which a linebreak may occur and one after
> which it may not.
>
> I guess I'm basically looking for a \maylinebreak command!
>
> /bpj
>
>
> --------------------------------------------------
> Subscriptions, Archive, and List information, etc.:
> http://tug.org/mailman/listinfo/xetex
--
I'm a physicist: I have a basic working knowledge of the universe and everything it contains!
- Sheldon Cooper (The Big Bang Theory)
More information about the XeTeX
mailing list