[XeTeX]   in XeTeX

Keith J. Schultz keithjschultz at web.de
Mon Nov 14 09:21:00 CET 2011


Hi Everybody,

Slow down a bit. Sorry if I sound high headed here!


There seems to be a misunderstanding what exactly a
PLAIN TEXT FILE is.

Computing has evolved since I started using computers.
When I started out a plain text file was a file just holding 
7-bit ASCII or EBCDIC, or the like without control characters, except
EOF, CR,or LF! No, FF or TAB (sometimes allowed) and the others.

Eventually, files with 8-bit coding became plain text.

I guess we can consider in this modern day and age Unicode plain text.
Though, to be fair Unicode encodes glyphs, and can signal RTL usage.
So, Unicode needs an editor to be displayed correctly. But, the question is 
philosophical.

Now, for the youngsters XML, TeX, HTML are per definition plain text files.
WHAT, they do contain are commands in plain text that describe how the
information inside is to be display. Yet, a human can still read the text inside and
understand what is going on. 

Again, with unicode coming into the picture things do get somewhat more complicated
as the glyphs have to be displayed properly, so that a human can properly read it.
This is do to the vastness of Unicode.

Now, to the problem of copying and pasting. What does happen! 
I will take the HTML case! When you copy text from a browser with
&nnsp. Do you get '&nnsb', a simple blank, or a true no blocking space!
Most likely you will get a simple blank, it depends. 
If you do get a true non-blocking space what happens if you paste it into
a different editor? Chances ore good you get a funny character displayed.

So, it boils down to the tools you use. 

That said, we come to how do we display all these great glyphs. Most are easy enough,
white space is very hard for humans to read, they are just that white. 
Some the different types of white space should be displayed differently. The same could be
said of glyphs that are composed instead of being just one represented by one glyph.
The problem is how to do it so that it does not look ugly or very confusing!

In other words, we have to live with some compromises! That is easy discern ability or ease of readability.


regards
	Keith.   


Am 13.11.2011 um 19:46 schrieb Tobias Schoel:

> 
> 
> Am 13.11.2011 20:25, schrieb Zdenek Wagner:
>> (La)TeX source file is not a plain text. Every LaTeX document nowadays
>> starts with \documentclass but such text is not present in the output.
> 
> Of course, the preamble isn't plain text, but mostly macros. I thought of the body of the document. I think, it's common practice for larger documents to have a main latex file, which reads \documentclass … \begin{document}\input{first_chapter}\input{second_chapter}…\end{document}
> In these cases, the input documents are more or less plain text (depending on the subject).
> 
>> Even XML is not plain text, you can use entities as ,' and
>> many more. Of course, if (La)TeX is used for automatic processing of
>> data extracted from a database that can contain a wide variety of
>> Unicode character, it is a valid question how to handle such input.
> Or if the content is copy-pasted, from let's say HTML. But who would do that …




More information about the XeTeX mailing list