[XeTeX] turn off special characters in PDF
Joe Corneli
holtzermann17 at gmail.com
Fri Jan 3 22:43:58 CET 2014
Hi All:
I'm glad my message sparked some discussion. My M[N]WE for my
specific use case on tex.stackexchange.com has not gotten much
attention - I recently attached a +200 bounty.
http://tex.stackexchange.com/questions/151835/actualtext-in-small-cap-hyperlinks
I figured I should put in a plug for that here. I already got a reply
from one of the main authors of hyperref, but patching \href at the
necessary level is beyond me. Finally, I realize a detailed
discussion of this issue is probably not germane to this list, so if
you feel that way, please direct further comments there, or to me off
list.
Thank you!
Joe
On Wed, Jan 1, 2014 at 10:34 PM, Zdenek Wagner <zdenek.wagner at gmail.com> wrote:
> 2014/1/1 Ross Moore <ross.moore at mq.edu.au>:
>> Hi Zdeněk,
>>
>> On 02/01/2014, at 2:14 AM, Zdenek Wagner wrote:
>>
>>> 2014/1/1 Ross Moore <ross.moore at mq.edu.au>:
>>
>>>> In the example PDF that I attached to my previous message, each mathematical
>>>> character is mapped to a big-endian UTF-16 hexadecimal string, with Plane-1
>>>> alphanumerics expressed using surrogate pairs.
>>>>
>>> Thank you, now I see it. The book where I read about /ActualText did
>>> not mention that I can use UTF16 if I start the string with BOM.
>>
>> Fair enough; this I had to discover for myself.
>> The PDF Reference Manual (e.g. for ISO 32000) has no such examples,
>> so I had to experiment with different ways to specify strings requiring
>> non-ascii characters. UTF16 is the most elegant, and avoids the messiness
>> of using escape characters and octal codes, even for some non-letter
>> ASCII characters.
>>
>>> Can I
>>> see the source of the PDF? It could help me much to see how you do all
>>> these things.
>>
>> Each piece of mathematics is captured, saved to a file, converted to MathML,
>> then run through my Perl script to create alternative (La)TeX source.
>> This is done to be able to create a fully-tagged PDF description of the
>> mathematical content, using a special version of pdftex that Han The Thanh
>> created for me (and others) --- still in experimental stage.
>>
>> You should not need all of this machinery, but I'm happy to answer
>> any questions you may have.
>>
>> I've attached a couple of examples of the output from my Perl script,
>> in which you can see how the /ActualText replacement strings
>> are specified, using a macro \SMC -- which ultimately expands to use
>> the \pdfstartmarkedcontent primitive.
>>
>>
> Thank you.
>>
>>
>> Without the special primitives, you should be able to use \pdfliteral
>> to insert the tagging needed for just using /ActualText .
>>
>>>>
>>>> I see no reason why Indic character strings could not be done similarly.
>>>> You probably need some on-the-fly preprocessing to work out the required
>>>> strings to use.
>>
>>
>> I'm not sure whether there is a LaTeX package that allows you to get the
>> literal bits into the correct place without upsetting other fine
>> details of the typesetting with Indic characters.
>> This certainly should be possible, at least when using pdfLaTeX .
>> Not sure of the details using XeTeX -- but you work with the source code,
>> so can devise anything that is needed, right?
>>
> Typesetting depends on HarfBuzz and font features, no package is
> needed (fontspec and polyglossia just save work that could be done by
> primitives), any code can be sent to xdvipdfmx by \special{pdf: code
> ...} similarly as by \pdfliteral in pdftex. I already know how to do
> it.
>
>>>
>>> --
>>> Zdeněk Wagner
>>> http://hroch486.icpf.cas.cz/wagner/
>>> http://icebearsoft.euweb.cz
>>
>>
>>
>> Hope this helps,
>>
>> Ross
>>
>> ------------------------------------------------------------------------
>> Ross Moore ross.moore at mq.edu.au
>> Mathematics Department office: E7A-206
>> Macquarie University tel: +61 (0)2 9850 8955
>> Sydney, Australia 2109 fax: +61 (0)2 9850 8114
>> ------------------------------------------------------------------------
>>
>>
>>
>>
>>
>>
>> --------------------------------------------------
>> Subscriptions, Archive, and List information, etc.:
>> http://tug.org/mailman/listinfo/xetex
>>
>
>
>
> --
> Zdeněk Wagner
> http://hroch486.icpf.cas.cz/wagner/
> http://icebearsoft.euweb.cz
>
>
>
> --------------------------------------------------
> Subscriptions, Archive, and List information, etc.:
> http://tug.org/mailman/listinfo/xetex
More information about the XeTeX
mailing list