[XeTeX] Whitespace in input

Zdenek Wagner zdenek.wagner at gmail.com
Fri Nov 18 00:32:08 CET 2011

2011/11/17 Ross Moore <ross.moore at mq.edu.au>:
> Hello Zdenek,
> On 18/11/2011, at 7:49 AM, Zdenek Wagner wrote:
>>> But a formatting instruction for one program cannot serve as reliable input
>>> for another.
>>> A heuristic is then needed, to attempt to infer that a programming
>>> instruction must have been used, and guess what kind of instruction it might
>>> have been. This is not 100% reliable, so is deprecated in modern methods of
>>> data storage and document formats.
>>> XML based formats use tagging, rather that programming instructions. This is
>>> the modern way, which is used extensively for communicating data between
>>> different software systems.
>> Yes, that's the point. The goal of TeX is nice typographical
>> appearance. The goal of XML is easy data exchange. If I want to send
>> structured data, I send XML, not PDF.
> These days people want both.
>>> ** Phil.
>>> TeX's strength is in its superior ability to position characters on the page
>>> for maximum visual effect. This is done by producing detailed programming
>>> instructions within the content stream of the PDF output. However, this is
>>> not enough to meet the needs of formats such as EPUB, non-visual reading
>>> software, archival formats, searchability, and other needs.
>>> Tagged PDF can be viewed as Adobe's response to address these requirements
>>> as an extension of the visual aspects of the PDF format. It is a direction
>>> in which TeX can (and surely must) move, to stay relevant within the
>>> publishing industry of the future.
>>> Hope this helps,
>>>     Ross
>> No, it does not help. Remember that tha last (almost) portable version
>> of PDF is 1.2. If you are to open tagged PDF or even PDF with a
>> toUnicode map or a colorspace other than RGB or CMYK in Acrobat Reader
>> 3, it displays a fatal error and dies. I reported it to Adobe in March
>> 2001 and they did nothing.
> What else would you expect?
> AR is at version 10 now.
> On Linux it is at version 9 now, indeed 9.4.6 is current.
For OS/2 (now eComStation) the latest AR is at version 3 with known
bugs not fixed.

> You don't expect TeX formats prior to TeX3 to handle non-ascii
> characters, so why would you expect other people's older software
> versions to handle documents written for later formats?
>> I even reported another fatal bug in
>> January 2001. I sent sample files but nothing happened, Adobe just
>> stopped development of Acrobat Reader at buggy version 3 for some
>> operating systems.
> Why should they support OSs that have a limited life-time?
> Industry moves on. A new computer is very cheap these days,
> with software that can do things your older one never could do.
Yes, since that time OS/3 evolved from version 2 through 3, Warp
Connesct, 4, 4.5, eComstation 1.0, eComStation 1.1 to eComStation 2.0,
yet AR remained and version 3.

> By all means keep the old one while it still does useful work,
> but you get another to do things that the older cannot handle.
If I compare multitasking of OS/2 on my old Celeron 333 MHz with Linux
running on quad core Intel 4.3 Ghz, the winner is still OS/2. If I
have a single thread in mind, 4.3 GHz is of course faster but
multitasking and multithreading is made much better in OS/2. A few
years ago I made a comparison with a long numerical calculation on
OS/2 (Celeron 333 MHz) and Windows XP (Intel 250 MHz). The program
took 16 hours on OS/2 running Apache server at the same time and 240
hours on Windows running only this program. I am not sure that I find
the very same program now but judging form similar programs I would
expect 6 hours on quad core 4.3 GHz with Linux. Are you surprised that
I am not satisfied with progress in HW and OS?

>> Why do you so much rely on Adobe? When exchanging
>> structured documents I will always do it in XML and never create
>> tagged PDF because ...
> PDF, as a published standard, is not maintained by Adobe itself
> these days, yet Adobe continues to provide a free reader, at least
> for the visual aspects. That makes documents in PDF viewable by
> everyone (who is only interested in the visual aspect).
> It is an ISO standard, which publishers will want to use.
> Most of the people who use (La)TeX are academics or others
> who need to do a fair amount of publishing, of one kind
> or another.
> TeX can be modified to become capable of producing Tagged PDF.
>     (See the videos of my talks.)
> Free software (Poppler) is being developed to handle most aspects
> of PDF content, though it hasn't yet progressed enough to support
> structure tagging. It's surely on the list of things to do.
Yes, it is good for extraction even on OS/2 (I do not know whether
people compiled poppler, but xpdf binaries are available).

>>  ... I know that some users will be unable to read them
>> by Adobe Acrobat Reader.
> Why not?
> It is not Adobe Reader that is holding them back.
>> I do not wish to make them dependent on
>> ghostscript and similar tools.
> You'll have to give some more details of who you are
> referring to her, and why their economic circumstances
> require them to have access to XML-transmitted data,
> but preclude them from access to other kinds of standard
> computing software and devices.
I hope that people, who are aware of structured documents, will be
able to use XML.
>> --
>> Zdeněk Wagner
>> http://hroch486.icpf.cas.cz/wagner/
>> http://icebearsoft.euweb.cz
> Hope this helps,
>        Ross
> ------------------------------------------------------------------------
> Ross Moore                                       ross.moore at mq.edu.au
> Mathematics Department                           office: E7A-419
> Macquarie University                             tel: +61 (0)2 9850 8955
> Sydney, Australia  2109                          fax: +61 (0)2 9850 8114
> ------------------------------------------------------------------------
> --------------------------------------------------
> Subscriptions, Archive, and List information, etc.:
>  http://tug.org/mailman/listinfo/xetex

Zdeněk Wagner

More information about the XeTeX mailing list