[XeTeX] turn off special characters in PDF
Ross Moore
ross.moore at mq.edu.au
Tue Jan 7 00:06:36 CET 2014
Hi Joe,
On 04/01/2014, at 8:43 AM, Joe Corneli wrote:
> Hi All:
>
> I'm glad my message sparked some discussion. My M[N]WE for my
> specific use case on tex.stackexchange.com has not gotten much
> attention - I recently attached a +200 bounty.
>
> http://tex.stackexchange.com/questions/151835/actualtext-in-small-cap-hyperlinks
>
> I figured I should put in a plug for that here. I already got a reply
> from one of the main authors of hyperref, but patching \href at the
> necessary level is beyond me. Finally, I realize a detailed
> discussion of this issue is probably not germane to this list, so if
> you feel that way, please direct further comments there, or to me off
> list.
No, it is quite germane for this list, and relates to
a very recent thread.
The attached PDF is a variant of your example.
Copy/Paste the text using Adobe Reader or Acrobat Pro.
You should get:
Old: Sexy tex: .
New: Sexy tex: sxe .
Apples's Preview (at least within TeXshop) doesn't seem to recognise
the /ActualText tagging.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: accsupp-href.pdf
Type: application/pdf
Size: 7978 bytes
Desc: not available
URL: <http://tug.org/pipermail/xetex/attachments/20140107/0ecfe66c/attachment-0001.pdf>
-------------- next part --------------
To achieve this I had to do several things.
Here are the relevant definitions:
\newcommand*{\hrefnew}[2]{%
\hrefold{#1}{\BeginAccSupp{method=pdfstringdef,unicode,ActualText={#2}}#2\EndAccSupp{}}}
\AtBeginDocument{%
\let\hrefold\href
\let\href\hrefnew
}
Notes:
1. Use \BeginAccSupp and \EndAccSupp as tightly
as possible around the text needing to be tagged.
2. You want the method=pdfstringdef option.
(It is pdfstringdef not pdfstring .)
This results in appropriate strings for the /ActualText value;
either ASCII if possible (as here) or UTF16 strings with BOM.
3. Delay the rebinding of \href to \AtBeginDocument .
This way you do not interfere with any other package making
its own redefinition of what \href does.
What follows is highly technical and of no real concern to anyone
just wanting to use /ActualText tagging.
Rather it is about implementing this (and more general kinds of)
tagging in the most efficient way.
The result of the above coding is to adjust the PDF page stream
to include:
q
1 0 0 1 129.04 -82.56 cm
/Span<</ActualText(sxe)>>BDC
Q BT /F1 11.955 Tf 129.04 -82.56 Td[<095e09630950>]TJ ET q
1 0 0 1 145.89 -82.56 cm EMC
Q
where you can see the /Span tagging of the content between BDC and EMC.
This works, but is excessive, to my mind, by duplicating some operations.
Now the xdvipdfmx processor allows an alternative form for
the \special used to place the tagging.
It can be invoked with the following redefinition of internals
from the accsupp.sty package:
\makeatletter
\def\ACCSUPP at bdc{\special {pdf:literal direct \ACCSUPP at span BDC}}
\def\ACCSUPP at emc{\special {pdf:literal direct EMC}}
\makeatother
This gives a much more efficient PDF stream:
...>6<0059001b>]TJ ET
/Span<</ActualText(sxe)>>BDC
BT /F1 11.955 Tf 129.04 -82.56 Td[<095e09630950>]TJ ET
EMC
BT /F1 11.955 Tf ...
in which the irrelevant coordinate/matrix changes (using 'cm')
no longer occur.
But even this could possibly be improved further to avoid the
extra BT ... ET :
...>6<0059001b>]TJ
/Span<</ActualText(sxe)>>BDC
/F1 11.955 Tf 129.04 -82.56 Td[<095e09630950>]TJ
EMC
/F1 11.955 Tf ...
In the experimental version of pdfTeX there is a
keyword 'noendtext' that can be used with the new
\pdfstartmarkedcontent primitive:
\pdfstartmarkedcontent attr{<attributes>} noendtext ...
which is designed with this aim in mind.
Use of this keyword sets a flag so that the matching
\pdfendmarkcontent can keep the BT/ET nesting consistent.
>
> Thank you!
>
> Joe
Hope this helps,
Ross
------------------------------------------------------------------------
Ross Moore ross.moore at mq.edu.au
Mathematics Department office: E7A-206
Macquarie University tel: +61 (0)2 9850 8955
Sydney, Australia 2109 fax: +61 (0)2 9850 8114
------------------------------------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: logo.png
Type: image/png
Size: 5257 bytes
Desc: not available
URL: <http://tug.org/pipermail/xetex/attachments/20140107/0ecfe66c/attachment-0001.png>
-------------- next part --------------
More information about the XeTeX
mailing list