[XeTeX] turn off special characters in PDF

Ross Moore ross.moore at mq.edu.au
Mon Dec 30 00:45:39 CET 2013


Hi Joe,

On 30/12/2013, at 8:12 AM, Joe Corneli wrote:

> This answer talks about how to turn off litgatures:
> http://tex.stackexchange.com/a/5419/4357
> 
> Is there a way to turn off *all* special characters (e.g. small caps)
> and just get ASCII characters in the copy-and-paste level of the PDF?

In short, no!
 — because this is against the idea of making more use of Unicode,
across all computing platforms.

Certainly a ligature can have an /ActualText replacement consisting
of the separate characters, but this requires the PDF producer
to have supplied this within the PDF, as it is being generated.

I've played a lot with this kind of thing, and think that this
is the wrong approach. One should use /ActualText to provide
the correct Unicode replacement, when one exists. Thus one
can extract textual information reliably, even when the PDF
uses legacy fonts that may not contain a /ToUnicode resource,
or if that resource is inadequate in special situations.


Besides, do you really mean *all* special characters?
What about simple symbols like: ß∑∂√∫Ω  and all the other 
myriad foreign/accented letters and mathematical symbols?

If you want these to Copy/Paste as TeX coding (\beta  \Sum \delta  
\sqrt etc.) within documents that you write yourself, then I wrote 
a package called  mmap  where this is an option for the original 
Computer Modern fonts.


Alternatively, a PDF reader might supply a filtering mode that
converts the ligatures back to separate characters. Then the
user ought to be able to choose whether or not to use this filter.
I don't know of any that actually do this.
(In any case, you would want such a tool to allow you to specify
which characters to replace, and which to preserve.)


Your best option is surely to (get someone else to) write such 
a filter that meets your needs, and use it to post-process the text 
extracted via Copy/Paste or with other text-extraction tools.

Of course this is no use if your aim is to create documents for
which others get the desired result via Copy/Paste.
For this, the /ActualText approach is what you need.



Hope this helps,

	Ross

------------------------------------------------------------------------
Ross Moore                                       ross.moore at mq.edu.au 
Mathematics Department                           office: E7A-206      
Macquarie University                             tel: +61 (0)2 9850 8955
Sydney, Australia  2109                          fax: +61 (0)2 9850 8114
------------------------------------------------------------------------

-------------- next part --------------
A non-text attachment was scrubbed...
Name: logo.png
Type: image/png
Size: 5257 bytes
Desc: not available
URL: <http://tug.org/pipermail/xetex/attachments/20131230/1cd53845/attachment.png>
-------------- next part --------------



More information about the XeTeX mailing list