[XeTeX] turn off special characters in PDF

Zdenek Wagner zdenek.wagner at gmail.com
Wed Jan 1 23:34:46 CET 2014


2014/1/1 Ross Moore <ross.moore at mq.edu.au>:
> Hi Zdeněk,
>
> On 02/01/2014, at 2:14 AM, Zdenek Wagner wrote:
>
>> 2014/1/1 Ross Moore <ross.moore at mq.edu.au>:
>
>>> In the example PDF that I attached to my previous message, each mathematical
>>> character is mapped to a big-endian UTF-16 hexadecimal string, with Plane-1
>>> alphanumerics expressed using surrogate pairs.
>>>
>> Thank you, now I see it. The book where I read about /ActualText did
>> not mention that I can use UTF16 if I start the string with BOM.
>
> Fair enough; this I had to discover for myself.
> The PDF Reference Manual (e.g. for ISO 32000) has no such examples,
> so I had to experiment with different ways to specify strings requiring
> non-ascii characters. UTF16 is the most elegant, and avoids the messiness
> of using escape characters and octal codes, even for some non-letter
> ASCII characters.
>
>> Can I
>> see the source of the PDF? It could help me much to see how you do all
>> these things.
>
> Each piece of mathematics is captured, saved to a file, converted to MathML,
> then run through my Perl script to create alternative (La)TeX source.
> This is done to be able to create a fully-tagged PDF description of the
> mathematical content, using a special version of  pdftex  that Han The Thanh
> created for me (and others) --- still in experimental stage.
>
> You should not need all of this machinery, but I'm happy to answer
> any questions you may have.
>
> I've attached a couple of examples of the output from my Perl script,
> in which you can see how the /ActualText  replacement strings
> are specified, using a macro \SMC -- which ultimately expands to use
> the  \pdfstartmarkedcontent  primitive.
>
>
Thank you.
>
>
> Without the special primitives, you should be able to use  \pdfliteral
> to insert the tagging needed for just using  /ActualText .
>
>>>
>>> I see no reason why Indic character strings could not be done similarly.
>>> You probably need some on-the-fly preprocessing to work out the required
>>> strings to use.
>
>
> I'm not sure whether there is a LaTeX package that allows you to get the
> literal bits into the correct place without upsetting other fine
> details of the typesetting with Indic characters.
> This certainly should be possible, at least when using  pdfLaTeX .
> Not sure of the details using XeTeX -- but you work with the source code,
> so can devise anything that is needed, right?
>
Typesetting depends on HarfBuzz and font features, no package is
needed (fontspec and polyglossia just save work that could be done by
primitives), any code can be sent to xdvipdfmx by \special{pdf: code
...} similarly as by \pdfliteral in pdftex. I already know how to do
it.

>>
>> --
>> Zdeněk Wagner
>> http://hroch486.icpf.cas.cz/wagner/
>> http://icebearsoft.euweb.cz
>
>
>
> Hope this helps,
>
>         Ross
>
> ------------------------------------------------------------------------
> Ross Moore                                       ross.moore at mq.edu.au
> Mathematics Department                           office: E7A-206
> Macquarie University                             tel: +61 (0)2 9850 8955
> Sydney, Australia  2109                          fax: +61 (0)2 9850 8114
> ------------------------------------------------------------------------
>
>
>
>
>
>
> --------------------------------------------------
> Subscriptions, Archive, and List information, etc.:
>   http://tug.org/mailman/listinfo/xetex
>



-- 
Zdeněk Wagner
http://hroch486.icpf.cas.cz/wagner/
http://icebearsoft.euweb.cz



More information about the XeTeX mailing list