[XeTeX] turn off special characters in PDF

Joe Corneli holtzermann17 at gmail.com
Fri Jan 3 22:43:58 CET 2014


Hi All:

I'm glad my message sparked some discussion.  My M[N]WE for my
specific use case on tex.stackexchange.com has not gotten much
attention - I recently attached a +200 bounty.

http://tex.stackexchange.com/questions/151835/actualtext-in-small-cap-hyperlinks

I figured I should put in a plug for that here.  I already got a reply
from one of the main authors of hyperref, but patching \href at the
necessary level is beyond me.  Finally, I realize a detailed
discussion of this issue is probably not germane to this list, so if
you feel that way, please direct further comments there, or to me off
list.

Thank you!

Joe

On Wed, Jan 1, 2014 at 10:34 PM, Zdenek Wagner <zdenek.wagner at gmail.com> wrote:
> 2014/1/1 Ross Moore <ross.moore at mq.edu.au>:
>> Hi Zdeněk,
>>
>> On 02/01/2014, at 2:14 AM, Zdenek Wagner wrote:
>>
>>> 2014/1/1 Ross Moore <ross.moore at mq.edu.au>:
>>
>>>> In the example PDF that I attached to my previous message, each mathematical
>>>> character is mapped to a big-endian UTF-16 hexadecimal string, with Plane-1
>>>> alphanumerics expressed using surrogate pairs.
>>>>
>>> Thank you, now I see it. The book where I read about /ActualText did
>>> not mention that I can use UTF16 if I start the string with BOM.
>>
>> Fair enough; this I had to discover for myself.
>> The PDF Reference Manual (e.g. for ISO 32000) has no such examples,
>> so I had to experiment with different ways to specify strings requiring
>> non-ascii characters. UTF16 is the most elegant, and avoids the messiness
>> of using escape characters and octal codes, even for some non-letter
>> ASCII characters.
>>
>>> Can I
>>> see the source of the PDF? It could help me much to see how you do all
>>> these things.
>>
>> Each piece of mathematics is captured, saved to a file, converted to MathML,
>> then run through my Perl script to create alternative (La)TeX source.
>> This is done to be able to create a fully-tagged PDF description of the
>> mathematical content, using a special version of  pdftex  that Han The Thanh
>> created for me (and others) --- still in experimental stage.
>>
>> You should not need all of this machinery, but I'm happy to answer
>> any questions you may have.
>>
>> I've attached a couple of examples of the output from my Perl script,
>> in which you can see how the /ActualText  replacement strings
>> are specified, using a macro \SMC -- which ultimately expands to use
>> the  \pdfstartmarkedcontent  primitive.
>>
>>
> Thank you.
>>
>>
>> Without the special primitives, you should be able to use  \pdfliteral
>> to insert the tagging needed for just using  /ActualText .
>>
>>>>
>>>> I see no reason why Indic character strings could not be done similarly.
>>>> You probably need some on-the-fly preprocessing to work out the required
>>>> strings to use.
>>
>>
>> I'm not sure whether there is a LaTeX package that allows you to get the
>> literal bits into the correct place without upsetting other fine
>> details of the typesetting with Indic characters.
>> This certainly should be possible, at least when using  pdfLaTeX .
>> Not sure of the details using XeTeX -- but you work with the source code,
>> so can devise anything that is needed, right?
>>
> Typesetting depends on HarfBuzz and font features, no package is
> needed (fontspec and polyglossia just save work that could be done by
> primitives), any code can be sent to xdvipdfmx by \special{pdf: code
> ...} similarly as by \pdfliteral in pdftex. I already know how to do
> it.
>
>>>
>>> --
>>> Zdeněk Wagner
>>> http://hroch486.icpf.cas.cz/wagner/
>>> http://icebearsoft.euweb.cz
>>
>>
>>
>> Hope this helps,
>>
>>         Ross
>>
>> ------------------------------------------------------------------------
>> Ross Moore                                       ross.moore at mq.edu.au
>> Mathematics Department                           office: E7A-206
>> Macquarie University                             tel: +61 (0)2 9850 8955
>> Sydney, Australia  2109                          fax: +61 (0)2 9850 8114
>> ------------------------------------------------------------------------
>>
>>
>>
>>
>>
>>
>> --------------------------------------------------
>> Subscriptions, Archive, and List information, etc.:
>>   http://tug.org/mailman/listinfo/xetex
>>
>
>
>
> --
> Zdeněk Wagner
> http://hroch486.icpf.cas.cz/wagner/
> http://icebearsoft.euweb.cz
>
>
>
> --------------------------------------------------
> Subscriptions, Archive, and List information, etc.:
>   http://tug.org/mailman/listinfo/xetex



More information about the XeTeX mailing list