[XeTeX] Whitespace in input

maxwell maxwell at umiacs.umd.edu
Fri Nov 18 17:46:21 CET 2011


On Fri, 18 Nov 2011 13:52:56 +0100, Zdenek Wagner
<zdenek.wagner at gmail.com>
wrote:
> 2011/11/18 Philip TAYLOR <P.Taylor at rhul.ac.uk>:
>> Is it safe to assume that these "code listings"
>> are restricted to the ASCII character set ?  If
>> so, yes, spaces are likely to be a problem, but
>> if the code listing can also include ligature-
>> digraphs, then these are likely to prove even
>> more problematic.
>>
> If the code listing is typeset in a fixed width font, it is usually no
> problem. I copied a few code samples from books in PDF, most of them
> were typeset by TeX. If I want to copy text in Devanagari, it is
> almost impossible. 

Besides TeX, Dr. Knuth also invented Literate Programming.  In our own
project, we use LP to extract the code listings from the original source
code, rather than from the PDF.  One advantage is that in addition to the
re-ordering at the character level (mentioned in part of Zdenek's email
that I didn't copy over), this allows re-ordering at any arbitrary level,
even entire sections of program code.  (We happen to be using XML to
contain the source of both our text and our programming language
constructs, but that's a different issue.)

I agree that it would be nice to be able to reliably copy Unicode text
from the PDF, but (a) that issue isn't confined to program listings, and
(b) that would only solve the character ordering part of the problem.

   Mike Maxwell


More information about the XeTeX mailing list