[XeTeX] turn off special characters in PDF

Ross Moore ross.moore at mq.edu.au
Tue Jan 7 00:06:36 CET 2014


Hi Joe,

On 04/01/2014, at 8:43 AM, Joe Corneli wrote:

> Hi All:
> 
> I'm glad my message sparked some discussion.  My M[N]WE for my
> specific use case on tex.stackexchange.com has not gotten much
> attention - I recently attached a +200 bounty.
> 
> http://tex.stackexchange.com/questions/151835/actualtext-in-small-cap-hyperlinks
> 
> I figured I should put in a plug for that here.  I already got a reply
> from one of the main authors of hyperref, but patching \href at the
> necessary level is beyond me.  Finally, I realize a detailed
> discussion of this issue is probably not germane to this list, so if
> you feel that way, please direct further comments there, or to me off
> list.

No, it is quite germane for this list, and relates to
a very recent thread.

The attached PDF is a variant of your example.
Copy/Paste the text using Adobe Reader or Acrobat Pro.
You should get:

Old: Sexy tex: .
New: Sexy tex: sxe .

Apples's Preview (at least within TeXshop) doesn't seem to recognise
the /ActualText  tagging.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: accsupp-href.pdf
Type: application/pdf
Size: 7978 bytes
Desc: not available
URL: <http://tug.org/pipermail/xetex/attachments/20140107/0ecfe66c/attachment-0001.pdf>
-------------- next part --------------


To achieve this I had to do several things.
Here are the relevant definitions:

\newcommand*{\hrefnew}[2]{%
\hrefold{#1}{\BeginAccSupp{method=pdfstringdef,unicode,ActualText={#2}}#2\EndAccSupp{}}}
\AtBeginDocument{%
 \let\hrefold\href 
 \let\href\hrefnew
}

Notes:
  1. Use \BeginAccSupp and \EndAccSupp  as tightly
     as possible around the text needing to be tagged.

  2. You want the  method=pdfstringdef   option.
     (It is  pdfstringdef  not  pdfstring .)
     This results in appropriate strings for the /ActualText value;
     either ASCII if possible (as here) or UTF16 strings with BOM.

 3.  Delay the rebinding of \href  to \AtBeginDocument .
     This way you do not interfere with any other package making
     its own redefinition of what \href does.



What follows is highly technical and of no real concern to anyone
just wanting to use /ActualText tagging.
Rather it is about implementing this (and more general kinds of)
tagging in the most efficient way.


The result of the above coding is to adjust the PDF page stream 
to include:

  q 
  1 0 0 1 129.04 -82.56 cm 
  /Span<</ActualText(sxe)>>BDC
  Q BT /F1 11.955 Tf 129.04 -82.56 Td[<095e09630950>]TJ ET q 
  1 0 0 1 145.89 -82.56 cm EMC 
  Q

where you can see the /Span tagging of the content between BDC and EMC.
This works, but is excessive, to my mind, by duplicating some operations.

Now the xdvipdfmx processor allows an alternative form for
the \special  used to place the tagging.
It can be invoked with the following redefinition of internals
from the  accsupp.sty  package:

\makeatletter
 \def\ACCSUPP at bdc{\special {pdf:literal direct \ACCSUPP at span BDC}}
 \def\ACCSUPP at emc{\special {pdf:literal direct EMC}}
\makeatother


This gives a much more efficient PDF stream:

   ...>6<0059001b>]TJ ET
   /Span<</ActualText(sxe)>>BDC 
   BT /F1 11.955 Tf 129.04 -82.56 Td[<095e09630950>]TJ ET 
   EMC
   BT /F1 11.955 Tf ...

in which the irrelevant coordinate/matrix changes (using 'cm')
no longer occur.


But even this could possibly be improved further to avoid the
extra BT ... ET :

   ...>6<0059001b>]TJ 
   /Span<</ActualText(sxe)>>BDC 
   /F1 11.955 Tf 129.04 -82.56 Td[<095e09630950>]TJ 
   EMC
   /F1 11.955 Tf ...


In the experimental version of  pdfTeX  there is a
keyword 'noendtext' that can be used with the new 
 \pdfstartmarkedcontent  primitive:

  \pdfstartmarkedcontent attr{<attributes>} noendtext ...

which is designed with this aim in mind.
Use of this keyword sets a flag so that the matching  
 \pdfendmarkcontent  can keep the BT/ET nesting consistent.


> 
> Thank you!
> 
> Joe


Hope this helps,

	Ross

------------------------------------------------------------------------
Ross Moore                                       ross.moore at mq.edu.au 
Mathematics Department                           office: E7A-206      
Macquarie University                             tel: +61 (0)2 9850 8955
Sydney, Australia  2109                          fax: +61 (0)2 9850 8114
------------------------------------------------------------------------

-------------- next part --------------
A non-text attachment was scrubbed...
Name: logo.png
Type: image/png
Size: 5257 bytes
Desc: not available
URL: <http://tug.org/pipermail/xetex/attachments/20140107/0ecfe66c/attachment-0001.png>
-------------- next part --------------



More information about the XeTeX mailing list