[XeTeX] potential new feature: \XeTeXgenerateactualtext

Jonathan Kew jfkthame at gmail.com
Tue Feb 23 16:50:33 CET 2016


On 23/2/16 15:29, Adam Twardoch (List) wrote:
> Jonathan,
>
> is there any method in XeTeX to explicitly emit "ActualText" or override the automatic content generated by the new option?

Not currently. What you get is the Unicode text of each "word" 
(consecutive run of non-space characters in a given font).

>
> Or could you envision such a method? How would one need to approach it?
>
> (I'm not saying you should try implement it right away). :)

For a document that wants some other kind of "ActualText", there's going 
to need to be pretty detailed markup in the source, I think. (E.g. each 
word, or similar unit, will need to be tagged to provide the desired 
ActualText that goes with it.) At that point, I wonder if turning off 
\XeTeXgenerateactualtext and just doing it "manually" with macros that 
generate \special{}s would be the most reasonable way forward.

I suppose it's possible you might want automatic ActualText for most of 
the content, but custom overrides for certain fragments. At this point, 
there's no support for that -- \XeTeXgenerateactualtext is a switch that 
takes effect at \shipout time, so in effect it is "global" for all the 
content on a page -- but perhaps we could make it scoped, so that you 
could toggle it on/off at will within the text.

That probably wouldn't be hard to do; I'll give it a bit more thought.

JK

>
> A.
>
> Sent from my mobile phone.
>
>> On 23.02.2016, at 16:00, Jonathan Kew <jfkthame at gmail.com> wrote:
>>
>>> On 23/2/16 14:52, Adam Twardoch (List) wrote:
>>> Jonathan,
>>>
>>> this is splendid. Adding support for the PDF "ActualText" tagging layer
>>> is a huge step.
>>>
>>> I wonder — what happens in case of mathematical formulae?
>>
>> At this point, nothing in particular. :)
>>
>>> I think it would be rather clever to embed the TeX notation or even, huh
>>> huh, MathML into the ActualText layer for the math mode — per equation,
>>> of course :) .
>>
>> I think these are ideas that could usefully be explored/implemented at the macro level, rather than being built in to the engine.
>>
>> JK
>>
>> Or use the "Unicode math linear format" as proposed by
>>> Microsoft:
>>> http://www.unicode.org/notes/tn28/UTN28-PlainTextMath-v3.pdf
>>>
>>> A.
>>>
>>> Sent from my mobile phone.
>>>
>>> On 23.02.2016, at 15:43, Jonathan Kew <jfkthame at gmail.com
>>> <mailto:jfkthame at gmail.com>> wrote:
>>>
>>>> The code for the \XeTeXgenerateactualtext feature (it's an integer
>>>> parameter; set it to 1 to get ActualText added to the PDF, for better
>>>> copy/paste and search in Acrobat) is now on sourceforge, in an
>>>> "actualtext" branch, for anyone who wants to try building and
>>>> experimenting with it.
>>>>
>>>> Note that this requires a new version of xdvipdfmx, as it uses a new
>>>> DVI opcode. The patch for xdvipdfmx is attached here (based on the
>>>> current TeXLive svn source).
>>>>
>>>> Akira, if you could check that the patch seems OK, that would be
>>>> great. I've not really looked at dvipdfm-x code in a long time. I
>>>> haven't pushed this it to TL yet, as it's all rather experimental, but
>>>> I hope we can safely include it for TL'16.
>>>>
>>>> JK
>>>> <xdvipdfmx-for-xetex-0_99995.patch>
>>>>
>>>>
>>>> --------------------------------------------------
>>>> Subscriptions, Archive, and List information, etc.:
>>>> http://tug.org/mailman/listinfo/xetex
>>>
>>>
>>>
>>>
>>> --------------------------------------------------
>>> Subscriptions, Archive, and List information, etc.:
>>>    http://tug.org/mailman/listinfo/xetex
>>
>>
>>
>> --------------------------------------------------
>> Subscriptions, Archive, and List information, etc.:
>> http://tug.org/mailman/listinfo/xetex
>
>
>
> --------------------------------------------------
> Subscriptions, Archive, and List information, etc.:
>    http://tug.org/mailman/listinfo/xetex
>



More information about the XeTeX mailing list