[XeTeX] Whitespace in input

Chris Travers chris.travers at gmail.com
Tue Nov 15 11:39:46 CET 2011

On Tue, Nov 15, 2011 at 2:27 AM, Keith J. Schultz <keithjschultz at web.de> wrote:
> Hi all,
> I agree that XeTeX should support all printable characters.
Given your definition I would say all visible printed characters.
Invisible characters are a problem in a programming language.

> A non.breaking space is to me a printable character, in so far that
> it is important and must be used to distinguish between word space, et all.

As long as this is an option which defaults to off, again I have no
problem with this.   I mean by this definition, carriage returns and
line feeds are also printable characters, and these are supported by
options which are turned on rather than on by default.
> To go back in history, one of my pet peeves in LaTeX was that I had to
> enter the German characters öäüß as \"o, \"a, etc and later the
> short cut forms "s, "u, etc. later with inputenc I finally, could just enter
> öäüß.But I had trouble, (actually just needed to convert) my files to and from
> apple to windows (so that editing was possible on windows).
> Yet, I still had trouble with quoting, so I was force to use \quote, et al.
> to have a simple method of quoting properly in english, german and french
> in one document! I even modified them to suite some requirements I need and
> I had one command.
> Unicode has thankfully change all this. I can forget about using all those TeX
> commands for the characters I need. I just type away.
> The only problem is now is the keyboard equivalents and how the editor of choice
> displays them.

But here you have a problem.  An editor can display a non-breaking
space as its semantic value (i.e. with a special glyph, but this is
not without problems.  For example, we could also display line feeds
as the paragraph symbol but now that's also U+00B6, so now you have
ambiguity issues-- is it a unicode character or is it a line feed).
or you can color code, but this is problematic for a large number of
other reasons.

So I am not sure these are simple problems that admit of simple solutions.

My recommendation is:

1)  Default to handling all white space as it exists now.
2)  Provide some sort of switch, whether to the execution of XeTeX or
to the document itself, to turn on handling of special unicode
3)  If that switch is enabled, then treat the whitespaces according to
unicode meanings.  If not, treat them as standard whitespace.

The advantage of this approach is that people who don't want to worry
about what sort of whitespace is in text files they are inputting
don't have to worry about it, and that those who do have an easy way
of determining if a layout issue is caused by non-breaking spaces.

Best Wishes,
Chris Travers

More information about the XeTeX mailing list