[XeTeX] turn off special characters in PDF

Ross Moore ross.moore at mq.edu.au
Wed Jan 1 22:37:55 CET 2014


Hi Zdeněk,

On 02/01/2014, at 2:14 AM, Zdenek Wagner wrote:

> 2014/1/1 Ross Moore <ross.moore at mq.edu.au>:

>> In the example PDF that I attached to my previous message, each mathematical
>> character is mapped to a big-endian UTF-16 hexadecimal string, with Plane-1
>> alphanumerics expressed using surrogate pairs.
>> 
> Thank you, now I see it. The book where I read about /ActualText did
> not mention that I can use UTF16 if I start the string with BOM.

Fair enough; this I had to discover for myself.
The PDF Reference Manual (e.g. for ISO 32000) has no such examples,
so I had to experiment with different ways to specify strings requiring
non-ascii characters. UTF16 is the most elegant, and avoids the messiness
of using escape characters and octal codes, even for some non-letter
ASCII characters.

> Can I
> see the source of the PDF? It could help me much to see how you do all
> these things.

Each piece of mathematics is captured, saved to a file, converted to MathML,
then run through my Perl script to create alternative (La)TeX source.
This is done to be able to create a fully-tagged PDF description of the 
mathematical content, using a special version of  pdftex  that Han The Thanh
created for me (and others) --- still in experimental stage.

You should not need all of this machinery, but I'm happy to answer
any questions you may have.

I've attached a couple of examples of the output from my Perl script, 
in which you can see how the /ActualText  replacement strings
are specified, using a macro \SMC — which ultimately expands to use
the  \pdfstartmarkedcontent  primitive.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 2013-Assign2-soln-inline-2-tags.tex
Type: application/octet-stream
Size: 4498 bytes
Desc: not available
URL: <http://tug.org/pipermail/xetex/attachments/20140102/f442355e/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 2013-Assign2-soln-inline-1-tags.tex
Type: application/octet-stream
Size: 752 bytes
Desc: not available
URL: <http://tug.org/pipermail/xetex/attachments/20140102/f442355e/attachment-0001.obj>
-------------- next part --------------


Without the special primitives, you should be able to use  \pdfliteral 
to insert the tagging needed for just using  /ActualText .

>> 
>> I see no reason why Indic character strings could not be done similarly.
>> You probably need some on-the-fly preprocessing to work out the required
>> strings to use.


I'm not sure whether there is a LaTeX package that allows you to get the
literal bits into the correct place without upsetting other fine
details of the typesetting with Indic characters.
This certainly should be possible, at least when using  pdfLaTeX .
Not sure of the details using XeTeX — but you work with the source code,
so can devise anything that is needed, right?

> 
> -- 
> Zdeněk Wagner
> http://hroch486.icpf.cas.cz/wagner/
> http://icebearsoft.euweb.cz



Hope this helps,

	Ross

------------------------------------------------------------------------
Ross Moore                                       ross.moore at mq.edu.au 
Mathematics Department                           office: E7A-206      
Macquarie University                             tel: +61 (0)2 9850 8955
Sydney, Australia  2109                          fax: +61 (0)2 9850 8114
------------------------------------------------------------------------

-------------- next part --------------
A non-text attachment was scrubbed...
Name: logo.png
Type: image/png
Size: 5257 bytes
Desc: not available
URL: <http://tug.org/pipermail/xetex/attachments/20140102/f442355e/attachment.png>
-------------- next part --------------



More information about the XeTeX mailing list