[XeTeX] TeX in the modern World. (goes OT) Was: Re: Whitespace in input

Keith J. Schultz keithjschultz at web.de
Fri Nov 18 10:16:31 CET 2011


Hi All,

Sorry, I go OT here, but in order to debate it is necessary.
Please forgive.

I have to side more with Philip.

What most are forgetting is what (Xe)TeX is intended for.
It is for most a typesetting program(you do mention this below).
It was not designed to handle different languages or actually truly
do wordprocessing in the modern sense. 

Due to the power of the TeX engine, it evolved to deal with different languages
and newer output methods and encodings. The problem with TeX that the basic 
engine has not been redesigned to handle these new developments well.
The internals need to be completely revamped.

Am 17.11.2011 um 20:36 schrieb Ross Moore:

> Hi Phil,
> 
> On 17/11/2011, at 23:53, Philip TAYLOR <P.Taylor at Rhul.Ac.Uk> wrote:
> 
>> Keith J. Schultz wrote:
>>>> 
>>>> You mention in a later post that you do consider a space as a printable character.
>>>    This line should read as:
>>>          You mention in a later post that you consider a space as a non-printable character.
>> 
>> No, I don't think of it as a "character" at all, when we are talking
>> about typeset output (as opposed to ASCII (or Unicode) input).  
> 
> This is fine, when all that you require of your output is that it be visible on
> a printed page. But modern communication media goes much beyond that.
> A machine needs to be able to tell where words and lines end, reflowing paragraphs when appropriate and able to produce a flat extraction of all the text, perhaps also with some indication of the purpose of that text (e.g. by structural tagging).
	I would agree with you, but TeX was not designed as a communications program, it was designed for creating printed media.
	Furthermore, it may be desirable in the Modern World to have every programs out used as input for another program.
	This ideal is utopia. If you need the output from one program(media) to another then you will need a intermediate program/filter
	in order to reformat/convert the differences. As with all types of communication there will be structures missing/lacking in the other
	system. So a one to one conversion will not be possible. You will need to use some kind of heuristics or in modern terms intelligence.
> 
> In short, what is output for one format should also be able to serve as input for another.
	This assertion is completely idealistic. Then again, it is true. It is possibly, today, to design a system that goes from audio, to TeX, to printed documents
	to audio again. Yet, you will need a lot of effort and most likely the results will be far from perfect. Though it is workable and require considerable
	resources.
> 
> Thus the space certainly does play the role of an output character – though the presence of a gap in the positioning of visible letters may serve this role in many, but not all, circumstances.
	This depends on what you are outputting. For a printed page and is consumed by a human it goes not matter, because humans do not process space characters just space, and they even
	at times ignore them completely, because it is irrelevant for their natural language processing.
	For computers on the other hand the use of a space character can be very relevant.

	In the early days of TeX and LaTeX I have know people to create their e-mail with TeX. So you can see TeX is capable of outputting character based output.
	Furthermore, TeX could be used to produce any form of character based formats as its output. 
> 
>> Clearly
>> it is a character on input, but unless it generates a glyph in the
>> output stream (which TeX does not, for normal spaces) then it is not
>> a character (/qua/ character) on output but rather a formatting
>> instruction not dissimilar to (say) end-of-line.
> 
> But a formatting instruction for one program cannot serve as reliable input for another.
> A heuristic is then needed, to attempt to infer that a programming instruction must have been used, and guess what kind of instruction it might have been. This is not 100% reliable, so is deprecated in modern methods of data storage and document formats.
	Are you not contradicting yourself here! See above.
> XML based formats use tagging, rather that programming instructions. This is the modern way, which is used extensively for communicating data between different software systems.
	True it is used, for communicating data. Yet, you are misconceived in thinking that it truly solves any of the problems involved different data types or content!
	You can get a parse tree of the data, yet if a program can not understand or process the data/content it is useless. 
	Agreed the XML file contains information about it structure and is human readable, yet it does NOTHING, for convert from one format to another. You still need a parser/filter to 
	convert into another format. 
	Do not forget you can put practically anything in an XML file; a program, image, TeX file, PDF, etc. Though I would not advise it.
> 
>> 
>> ** Phil.
> 
> TeX's strength is in its superior ability to position characters on the page for maximum visual effect. This is done by producing detailed programming instructions within the content stream of the PDF output. However, this is not enough to meet the needs of formats such as EPUB, non-visual reading software, archival formats, searchability, and other needs.
	You are probably a little young to know this, but TeX's original output format was a dvi file. Only more recent engines produce PDF. It is possible to create engines that output EPUB. If your TeX skills are adequate enough you
	do not even need to create a new engine. TeX has the ability to output files in any format if you know how to do it. 

> Tagged PDF can be viewed as Adobe's response to address these requirements as an extension of the visual aspects of the PDF format. It is a direction in which TeX can (and surely must) move, to stay relevant within the publishing industry of the future.
	TeX used to be a industry standard. The innovations of processing power has evolved that the use of it in the publishing industry has made it inefficient and other system are
	easier and faster for humans to operate.

	That TeX has survived this long is amazing. Yet, it remains one of the most powerful and cheapest typesetting systems to date. 

regards
	Keith.



	

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/xetex/attachments/20111118/0570c1c5/attachment.html>


More information about the XeTeX mailing list