Zdenek Wagner zdenek.wagner at gmail.com
Fri Nov 18 13:52:56 CET 2011

2011/11/18 Philip TAYLOR <P.Taylor at rhul.ac.uk>:
> Is it safe to assume that these "code listings"
> are restricted to the ASCII character set ?  If
> so, yes, spaces are likely to be a problem, but
> if the code listing can also include ligature-
> digraphs, then these are likely to prove even
> more problematic.
If the code listing is typeset in a fixed width font, it is usually no
problem. I copied a few code samples from books in PDF, most of them
were typeset by TeX. If I want to copy text in Devanagari, it is
almost impossible. If I take just a simple Hindi work किताब, the best
result I can get will be िकताब (you should se a dotted circle which is
not visible in PDF). The reason is that the first two letters are
U+0915, U+093F but visually the latter is displayed first. After
copying you get the reversed order U+093F, U+0915. This is just one of
many problems with Devanagari. The toUnicode map does not help much
with Indian scripts. I have never tried to copy Arabic from PDF. Or
even the combination of LTR and RTL within a paragraph.

> Ulrike Fischer wrote:
>> One question which pops up regularly in the TeX-groups is "how can I
>> insert a code listing in my pdf so that it can be copied and pasted
>> reliably".
>> Currently this is not easy as the heuristics of the readers can
>> easily loose spaces, you can't encode tabs or a specific number of
>> spaces.
>> Real space characters in the pdf (instead of only visible space)
>> would help here a lot.
