# [XeTeX] turn off special characters in PDF

Ross Moore ross.moore at mq.edu.au
Mon Dec 30 00:45:39 CET 2013

Hi Joe,

On 30/12/2013, at 8:12 AM, Joe Corneli wrote:

> http://tex.stackexchange.com/a/5419/4357
>
> Is there a way to turn off *all* special characters (e.g. small caps)
> and just get ASCII characters in the copy-and-paste level of the PDF?

In short, no!
— because this is against the idea of making more use of Unicode,
across all computing platforms.

Certainly a ligature can have an /ActualText replacement consisting
of the separate characters, but this requires the PDF producer
to have supplied this within the PDF, as it is being generated.

I've played a lot with this kind of thing, and think that this
is the wrong approach. One should use /ActualText to provide
the correct Unicode replacement, when one exists. Thus one
can extract textual information reliably, even when the PDF
uses legacy fonts that may not contain a /ToUnicode resource,
or if that resource is inadequate in special situations.

Besides, do you really mean *all* special characters?
What about simple symbols like: ß∑∂√∫Ω  and all the other
myriad foreign/accented letters and mathematical symbols?

If you want these to Copy/Paste as TeX coding (\beta  \Sum \delta
\sqrt etc.) within documents that you write yourself, then I wrote
a package called  mmap  where this is an option for the original
Computer Modern fonts.

Alternatively, a PDF reader might supply a filtering mode that
converts the ligatures back to separate characters. Then the
user ought to be able to choose whether or not to use this filter.
I don't know of any that actually do this.
(In any case, you would want such a tool to allow you to specify
which characters to replace, and which to preserve.)

Your best option is surely to (get someone else to) write such
a filter that meets your needs, and use it to post-process the text
extracted via Copy/Paste or with other text-extraction tools.

Of course this is no use if your aim is to create documents for
which others get the desired result via Copy/Paste.
For this, the /ActualText approach is what you need.

Hope this helps,

Ross

------------------------------------------------------------------------
Ross Moore                                       ross.moore at mq.edu.au
Mathematics Department                           office: E7A-206
Macquarie University                             tel: +61 (0)2 9850 8955
Sydney, Australia  2109                          fax: +61 (0)2 9850 8114
------------------------------------------------------------------------

-------------- next part --------------
A non-text attachment was scrubbed...
Name: logo.png
Type: image/png
Size: 5257 bytes
Desc: not available
URL: <http://tug.org/pipermail/xetex/attachments/20131230/1cd53845/attachment.png>
-------------- next part --------------