[XeTeX] potential new feature: \XeTeXgenerateactualtext

ShreeDevi Kumar shreeshrii at gmail.com
Thu Feb 25 06:11:27 CET 2016


Jonathan,

This is a really useful feature and I look forward to using it once it is
released in TLY2016.

Since how well the search and copy paste features work could also be font
dependent, I would like to test some more PDFs in unicode devanagari
created by this new feature using other fonts. I usually use Siddhanta and
Sanskrit2003 font.

I would appreciate if you or other members who have this feature installed
can provide a few more sample PDFs in devanagari  for testing.

Thanks!

- sent from my phone. excuse the brevity.
On 24-Feb-2016 3:37 pm, "Jonathan Kew" <jfkthame at gmail.com> wrote:

> On 24/2/16 09:22, ShreeDevi Kumar wrote:
>
>> Testing dev-actualtext.pdf sent by JK
>>
>>   * Adobe Acrobat Reader XI on Windows 10
>>       o Does not highlight text fully
>>       o SEARCH finds words and word parts correctly but usually
>>         highlights only beginning of the word containing the letter
>>       o COPY paste to NOTEPAD++, OPENOFFICE WRITER works correctly,
>>       o Save as TXT file does not work correctly - only saves ... in it,
>>         not the actual unicode text which can be copied
>>
>
> So it looks like Acrobat makes use of the ActualText for Search and Copy,
> but sadly its "Save as Text" doesn't support Unicode.
>
> I'm pleasantly surprised to see the Gmail previewer also handles it.
>
> The others (Foxit, Edge) sound like they're just working from the glyph
> stream, which is basically doomed to failure.
>
> For a further data point, I tried Evince (Document Viewer) on Ubuntu
> 15.10, and found that Copy and Search work well; it looks like it is using
> the ActualText correctly. This is thanks to the poppler library, I believe.
> The (poppler-based) "pdftotext" tool was also able to extract the Unicode
> text correctly from the PDF, although "pdftohtml" didn't do so well.
>
> One issue with Evince is that drag-selecting text to highlight it (as for
> Copy/Paste) looks bad: the highlighting completely obscures the selected
> text, although it will end up being copied correctly. Interestingly, its
> highlighting of search results doesn't suffer from this problem, and it
> even makes a fair attempt (not completely accurate) at highlighting
> specific letters within a word, not just entire words.
>
> JK
>
>
>   * Foxit Reader 7.3 on Windows 10
>>       o Highlights text fully,
>>       o smallest highlight unit is word,
>>       o COPY paste to notepad++ as well as SEARCH does NOT work
>>         correctly as Unicode text is not fully correct.
>>
>>             ूय
>>
>>             िनकोड क्या ह ? ै
>>
>>       o
>>         ​Save as TXT file does not work correctly - saves the unicode
>>         text with same problems as in copy and paste​
>>
>>   *
>>     ​Microsoft Edge Viewer on Windows 10
>>       o
>>>>         Highlights text fully,
>>       o COPY paste to notepad++ as well as SEARCH does NOT work
>>         correctly as Unicode text is not fully correct.
>>
>>                     य ूिनकोड क्या है?
>>
>>   *
>>>>     Previewing from within gmail in Chrome on Windows 10 -
>>       o Highlights text fully,
>>       o smallest highlight unit is word,
>>       o COPY paste to NOTEPAD++, OPENOFFICE WRITER works correctly,
>>       o (highlights only first letter of first word in
>>         paragraph यू rather than full word यूनिकोड)
>>       o there is NO SEARCH feature
>>       o there is no save as TXT file feature
>>   * Same as above while Previewing from within gmail in Internet
>>     Explorer on Windows 10
>>
>>
>> ShreeDevi
>> ____________________________________________________________
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> On Tue, Feb 23, 2016 at 11:30 PM, Jonathan Kew <jfkthame at gmail.com
>> <mailto:jfkthame at gmail.com>> wrote:
>>
>>     On 23/2/16 17:39, Philip Taylor wrote:
>>
>>         Using Akira-san's "actest.pdf" as sample, Adobe Acrobat Pro 7.1
>>         allows
>>         me to select only half of the text whereas Adobe Reader DC
>>         allows me to
>>         select it all; neither allows me to select individual kanji.
>>
>>
>>     Ah, right... as there are no spaces between the kanji, they'll end
>>     up in the same text object. That's a shortcoming of how the current
>>     implementation works, for scripts that don't use inter-word spaces.
>>
>>     In either case, copy&paste actually gives you the whole text, even
>>     though AAPro only highlights half of it, I guess?
>>
>>     JK
>>
>>
>>
>>
>>     --------------------------------------------------
>>     Subscriptions, Archive, and List information, etc.:
>>     http://tug.org/mailman/listinfo/xetex
>>
>>
>>
>>
>>
>>
>> --------------------------------------------------
>> Subscriptions, Archive, and List information, etc.:
>>    http://tug.org/mailman/listinfo/xetex
>>
>>
>
>
> --------------------------------------------------
> Subscriptions, Archive, and List information, etc.:
>  http://tug.org/mailman/listinfo/xetex
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/xetex/attachments/20160225/8fe23212/attachment-0001.html>


More information about the XeTeX mailing list