[XeTeX]   in XeTeX

Keith J. Schultz keithjschultz at web.de
Mon Nov 14 13:10:38 CET 2011


Hi Zdenek, all,

	I was to lazy to list all those encodings.

	I will be more precise know for those not reading carefully.
	
	There is a difference between what is considered plain text in the computer
	world and what its content is.

	Basically, plain text is just that text no matter what its content is. That is in the
	computer world you can have PLAIN TEXT FILES and have content that is
	TeX, et all, HTML, XML, SGML and for most programming languages.
	That is the source for these languages are in a more or less human readable format
	--- TEXT.

	Whether, the due to the syntax of the language represented a character can be used
	directly or an "Escape" character to represent them is irrelevant. Naturally, you have to
	know the syntax to understand the text. Still it is plain text according to the standards that
	define the content of the files.

	As I said I guess Unicode should be considered as plain text. Yet, unicode is special
	in the same way as ASCII and Extended ASCII, 7-bit ASCII and 8-bit ASCII depending
	on the fonts you used you got different results when output on the screen or printer.

	The problem there are not any fonts that implement the FULL unicode set. What is what is needed.

	On the other side, until we have OSes that truly fully support unicode, unicode can be truly considered to be 
	plain text. 

	As far a TeX is concerned it was not designed to handle unicode or even 8-bit. It has been though fragmented to
	handle them. It has come at a cost. It would be time to redesign it. refractor if you will.

	regards
		Keith.
  
Am 14.11.2011 um 11:07 schrieb Zdenek Wagner:

> It's not the encoding that determines whether it is a plain text.
> Texts in ISO 8859-1, CP852, UTF-8, UTF-16, BIG-5 can be plain texts.
> LTR/RTL is no problem in modern editors, I can easily combine
> Czech/English/Hindi/Urdu (uses arabic script) in a single document,
> the languages/scripts may even be mixed within a paragraph. What
> determines whether it is or is not a plain text is the presence or
> absence of control characters or commands no matter whether the file
> can be viewed and/or edited in a plain text editor such as vim or
> notepad. If I type < I wish it to mean "less that" but in XML it marks
> the element tag, If I need such a character in XML or SGML, I have to
> write < no matter what editor I use. If it were plain text, <
> would mean ampersand followed by the letters lt and a semicolon. If I
> type & in a plain text, it means "and". If I type it in a TeX file, it
> is a special character for \halign (unless \catcode is changed), in
> XML and SGML it means that all following characters up to the first
> semicolon is an entity name. If I have to insert an ampersand, I have
> to write \& in TeX or & in XML and SGML. There are different
> methods how to enter A, eg ^^41 in TeX or A in XML and SGML. As
> Phil wrote, there is a clearly defined MIME type for a plain text.
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/xetex/attachments/20111114/0badf214/attachment-0001.html>


More information about the XeTeX mailing list